Old code vs bold code

hiersd＠gmail.com

Dec. 17, 2009

10:29 a.m.

I've been snooping around our production systems, and the base code version for everything that we run in the call path is between 2 and 3 years old. It is patched to a fare-thee-well, and the stuff runs quite well. We use only top-tier vendors, yet can't recall ever being happy on code that is less than 1 year old. Too many bleeding edge bugs for my current medication level. How 'bout you guys? How long do you let a codebase steep before you're happy with running it in production? Thanks, David

Show replies by date

scott＠sberkman.net

December 2009

10:52 a.m.

Unless I have a very good lab environment I try to stick with what we know works unless a major issue shows up or the vendor refuses to support the product on a given code. I always prefer stability to bleeding edge. That said sometimes there really are new features that will provide new revenue opportunities that will make it worth upgrading. How many of us, especially smaller carriers, have true lab environments that can test all the features and functions of our Class4/5 switches for something like a major code upgrade without effecting the production network at all? This would include carrier A links and ISUP trunks (at least simulated), LNP/CNAM dips, etc. -Scott -----Original Message----- From: voiceops-bounces at voiceops.org [mailto:voiceops-bounces at voiceops.org] On Behalf Of David Hiers Sent: Thursday, December 17, 2009 11:29 AM To: VoiceOps at voiceops.org Subject: [VoiceOps] Old code vs bold code I've been snooping around our production systems, and the base code version for everything that we run in the call path is between 2 and 3 years old. It is patched to a fare-thee-well, and the stuff runs quite well. We use only top-tier vendors, yet can't recall ever being happy on code that is less than 1 year old. Too many bleeding edge bugs for my current medication level. How 'bout you guys? How long do you let a codebase steep before you're happy with running it in production? Thanks, David _______________________________________________ VoiceOps mailing list VoiceOps at voiceops.org https://puck.nether.net/mailman/listinfo/voiceops

carlos＠televolve.com

11:24 a.m.

On 12/17/09 9:29 AM, David Hiers wrote:

...

I've been snooping around our production systems, and the base code version for everything that we run in the call path is between 2 and 3 years old. It is patched to a fare-thee-well, and the stuff runs quite well.

That's interesting to hear, because people look at me like I have three heads when I say our main production switching is still on Asterisk 1.2. I can't see upgrading production systems with any major revs (we have applied patches because of security/stability fixes). I haven't found a compelling business case for buying new servers just to load up the new version until recently. Right now we are migrating to Asterisk 1.6.2 because we want to move to database-driven rather then config files, as well as integrate fax and video phones within the same servers. We've been working with the 1.6x software for nearly a year on semi-production systems with the gracious help of some of our more beta-friendly customers. This week we've just started migrating some other customers to it, and hope to be in full production by year end.

...

How 'bout you guys? How long do you let a codebase steep before you're happy with running it in production?

Upgrading servers "because it's new" is never a good idea. I can't say I have a fixed schedule, but I look at cost:benefit:risk and make the decision based on that. Although 1.6.2 is very new, there's a huge amount of benefit in using it, and we've mitigated the risk by doing long-term pre-production testing. The cost of the upgrade will be the same now or later (all labor of course, we're using free Asterisk). A lot depends on the vendor, also. Asterisk has a long history of buggy releases, but in the last couple years Digium has made a very strong and effective effort to put quality ahead of everything else. I now feel confident that new releases from them are going to be very stable, and we get honest opinions from their development team on whether it's usable for our production environment. -- Carlos Alvarez TelEvolve 602-889-3003 Advanced phone services simplified

abalashov＠evaristesys.com

12:30 p.m.

On 12/17/2009 12:24 PM, Carlos Alvarez wrote:

...

On 12/17/09 9:29 AM, David Hiers wrote:

...
I've been snooping around our production systems, and the base code version for everything that we run in the call path is between 2 and 3 years old. It is patched to a fare-thee-well, and the stuff runs quite well.

That's interesting to hear, because people look at me like I have three heads when I say our main production switching is still on Asterisk 1.2. I can't see upgrading production systems with any major revs (we have applied patches because of security/stability fixes). I haven't found a compelling business case for buying new servers just to load up the new version until recently.

I know people who still run 1.2.x adamantly on the same basic premise. I do look at them like they're crazy now, but even as recently as a year and a half ago, I would not have necessarily. There seems to be an informal consensus that 1.4.18 was the first "serious" stable 1.4.x release. I've run into enough problems with 1.6.x crashing that I have no faith in it as of yet. Why, I just had an installation in production mysteriously segfault this morning... -- Alex -- Alex Balashov - Principal Evariste Systems Web : http://www.evaristesys.com/ Tel : (+1) (678) 954-0670 Direct : (+1) (678) 954-0671

zavoid＠gmail.com

12:34 p.m.

is there a real need to use 1.6 in a production environment for you? On Dec 17, 2009, at 1:30 PM, Alex Balashov wrote:

...

On 12/17/2009 12:24 PM, Carlos Alvarez wrote:

...
On 12/17/09 9:29 AM, David Hiers wrote:

...
I've been snooping around our production systems, and the base code version for everything that we run in the call path is between 2 and 3 years old. It is patched to a fare-thee-well, and the stuff runs quite well.

That's interesting to hear, because people look at me like I have three heads when I say our main production switching is still on Asterisk 1.2. I can't see upgrading production systems with any major revs (we have applied patches because of security/stability fixes). I haven't found a compelling business case for buying new servers just to load up the new version until recently.

I know people who still run 1.2.x adamantly on the same basic premise. I do look at them like they're crazy now, but even as recently as a year and a half ago, I would not have necessarily. There seems to be an informal consensus that 1.4.18 was the first "serious" stable 1.4.x release.

I've run into enough problems with 1.6.x crashing that I have no faith in it as of yet. Why, I just had an installation in production mysteriously segfault this morning...

-- Alex

-- Alex Balashov - Principal Evariste Systems Web : http://www.evaristesys.com/ Tel : (+1) (678) 954-0670 Direct : (+1) (678) 954-0671 _______________________________________________ VoiceOps mailing list VoiceOps at voiceops.org https://puck.nether.net/mailman/listinfo/voiceops

carlos＠televolve.com

1:48 p.m.

On 12/17/09 11:34 AM, Colin wrote:

...

is there a real need to use 1.6 in a production environment for you?

Yes, no version prior to 1.6.2 can do everything we need. We started using the 1.6.2 alpha versions some time ago to flesh out the new system, then put some non-critical users on them. -- Carlos Alvarez TelEvolve 602-889-3003 Advanced phone services simplified

anorexicpoodle＠gmail.com

12:30 p.m.

Typically there are two things that drive me to move code versions, the version I am on is declared end of life, or the new version offers a feature we need to stay competitive, outside of those things I am typically inclined to leave well enough alone and focus my energies on other things. I don't really have a pre-determined time I like to let things idle in the market before I use them, but my general rule of thumb is if I am expanding an older project or adding elements to an already in-production environment I will keep the additions in-line with the version of what is already there. If I am building something new all together I will generally start on the newest there is, and by the time the project is done and out of beta the newest at the time of build will have matured quite a bit. Out of curiosity how many out there do actually have full labs of their environments? I know this is an arena where the full open-source guys have an advantage over those of us using big name vendors, since getting the budget for a pair of SBC's can be hard enough, let alone getting budget for another one just to play with. On Thu, 2009-12-17 at 08:29 -0800, David Hiers wrote:

...

I've been snooping around our production systems, and the base code version for everything that we run in the call path is between 2 and 3 years old. It is patched to a fare-thee-well, and the stuff runs quite well.

We use only top-tier vendors, yet can't recall ever being happy on code that is less than 1 year old. Too many bleeding edge bugs for my current medication level.

How 'bout you guys? How long do you let a codebase steep before you're happy with running it in production?

Thanks,

David _______________________________________________ VoiceOps mailing list VoiceOps at voiceops.org https://puck.nether.net/mailman/listinfo/voiceops

BrandonB＠netins.com

12:46 p.m.

We do for both Sylantro (now Broadsoft) Synergy and Metaswitch with Metasphere. The labs are generally lower-powered versions as they don't need to horsepower the production system does and we only have a single Acme SBC in the lab instead of an HA-pair, but other than that, they're pretty inline with each other. However, I believe a lab can never fully replicate a production environment due to lack of traffic (generators don't quite get the same results) and that you can never underestimate a user's ability to wreak havoc and do something that no one ever thought of or a sane person would do with equipment. ---- Brandon Buckner Switching Technician / VoIP Admin Iowa Network Services brandonb at netins.com -----Original Message----- From: voiceops-bounces at voiceops.org [mailto:voiceops-bounces at voiceops.org] On Behalf Of anorexicpoodle Sent: Thursday, December 17, 2009 12:30 PM To: David Hiers Cc: VoiceOps at voiceops.org Subject: Re: [VoiceOps] Old code vs bold code Out of curiosity how many out there do actually have full labs of their environments? I know this is an arena where the full open-source guys have an advantage over those of us using big name vendors, since getting the budget for a pair of SBC's can be hard enough, let alone getting budget for another one just to play with.

carlos＠televolve.com

1:44 p.m.

On 12/17/09 11:46 AM, Brandon Buckner wrote:

...

However, I believe a lab can never fully replicate a production environment due to lack of traffic (generators don't quite get the same results) and that you can never underestimate a user's ability to wreak havoc and do something that no one ever thought of or a sane person would do with equipment.

And of course we have the follies of SIP and NAT, particularly for those of us who have a bring-your-own-internet policy. Recently we've had a rash of customers who upgraded their routers and the upgrade came with a SIP ALG that broke something weird. The weirdest was one where attended intra-office transfer broke, but not outside transfer nor intra-office blind transfer. We don't consider anything production-ready until some customers have beaten on it. We migrate them in the order of beta-aware, not aware but systems we manage and can move back ourselves, and then the rest. -- Carlos Alvarez TelEvolve 602-889-3003 Advanced phone services simplified

carlos＠televolve.com

1:59 p.m.

On 12/17/09 11:30 AM, anorexicpoodle wrote:

...

Out of curiosity how many out there do actually have full labs of their environments? I know this is an arena where the full open-source guys have an advantage over those of us using big name vendors, since getting the budget for a pair of SBC's can be hard enough, let alone getting budget for another one just to play with.

We're one of the open source types. I don't know if I'd call it a "lab" but we have a VMware VI3 cluster where we keep VMs of all the major Asterisk releases and can clone/test in a few minutes. We also have some spare machines in each colo where we can do whatever we want. Quite often we test something in a VM and then deploy on hardware. Or deploy in the VM cluster if it's not something with heavy load. -- Carlos Alvarez TelEvolve 602-889-3003 Advanced phone services simplified

zavoid＠gmail.com

2:01 p.m.

yeah we do somthing similar but i've not had any luck with asterisk on VM's.. i need raw cpu and power.. just replaced some of my old D1950's with new m610 blades and my load average per asterisk went down to 2 from 60. On Dec 17, 2009, at 2:59 PM, Carlos Alvarez wrote:

...

On 12/17/09 11:30 AM, anorexicpoodle wrote:

...
Out of curiosity how many out there do actually have full labs of their environments? I know this is an arena where the full open-source guys have an advantage over those of us using big name vendors, since getting the budget for a pair of SBC's can be hard enough, let alone getting budget for another one just to play with.

We're one of the open source types. I don't know if I'd call it a "lab" but we have a VMware VI3 cluster where we keep VMs of all the major Asterisk releases and can clone/test in a few minutes. We also have some spare machines in each colo where we can do whatever we want. Quite often we test something in a VM and then deploy on hardware. Or deploy in the VM cluster if it's not something with heavy load.

-- Carlos Alvarez TelEvolve 602-889-3003

Advanced phone services simplified _______________________________________________ VoiceOps mailing list VoiceOps at voiceops.org https://puck.nether.net/mailman/listinfo/voiceops

carlos＠televolve.com

2:03 p.m.

On 12/17/09 1:01 PM, Colin wrote:

...

yeah we do somthing similar but i've not had any luck with asterisk on VM's.. i need raw cpu and power.. just replaced some of my old D1950's with new m610 blades and my load average per asterisk went down to 2 from 60.

It certainly depends on load. For lightly-loaded servers, we've done fine on VMware. Not the free version though, it doesn't handle timing anywhere near as well as the enterprise version. -- Carlos Alvarez TelEvolve 602-889-3003 Advanced phone services simplified

daryl＠introspect.net

2:46 p.m.

The "free version" (ESXi/vSphere) is identical to the version you pay for, minus the manageability and failover/redundancy features. Are you talking about the nearly defunct VMWare server that runs on top of a base OS? On Dec 17, 2009, at 3:03 PM, Carlos Alvarez wrote:

...

It certainly depends on load. For lightly-loaded servers, we've done fine on VMware. Not the free version though, it doesn't handle timing anywhere near as well as the enterprise version.

nicholas.hatch＠gmail.com

3:29 p.m.

On Thu, Dec 17, 2009 at 12:46 PM, Daryl G. Jurbala <daryl at introspect.net>wrote:

...

The "free version" (ESXi/vSphere) is identical to the version you pay for, minus the manageability and failover/redundancy features. Are you talking about the nearly defunct VMWare server that runs on top of a base OS?

I've certainly noticed network latency issues on ESXi when system I/O increases. (Hundreds of ms jitter.) I figured this was the same with ESX as well, but there might be architectural differences between the two platforms which would make a difference. Or, maybe there's something else going on. Many ESXi users do use the local datastores, so maybe that's where the "enterprise has less timing issues" perception comes from -- if you're paying for licensing, you're probably using a SAN and vice versa. If using local datastores affects network I/O, "enterprise" users might sidestep this issue entirely. -Nick

daryl＠introspect.net

3:49 p.m.

On Dec 17, 2009, at 4:29 PM, nick hatch wrote:

...

Or, maybe there's something else going on. Many ESXi users do use the local datastores, so maybe that's where the "enterprise has less timing issues" perception comes from -- if you're paying for licensing, you're probably using a SAN and vice versa. If using local datastores affects network I/O, "enterprise" users might sidestep this issue entirely.

Timing issues have nothing to do with what datastore type you are using. They have everything to do with how VMWare handles interrupts. VMWare server is a disaster, ESXi and ESX, free or not, do a much better job without having to go in and do ridiculous things to kernel parameters. I guess I'm just not understanding your point here. I've been using VMWare Server/ESX/ESXi both free and licensed versions since long before VMWare Server was even a product (it used to be GSX). Asterisk runs fine on it, other than some stuttering issues under load on GSX/Server. The problem with running Asterisk under any VMWare is that it is a CPU heavy endeavor, which negates any cost savings or benefits of VMWare. Unless you are running some asterisk setup that you can't properly partition/multi-tennant, I'm just not sure what VMWare gets you (other than dev/staging boxes).

nicholas.hatch＠gmail.com

8:43 p.m.

On Thu, Dec 17, 2009 at 1:49 PM, Daryl G. Jurbala <daryl at introspect.net>wrote:

...

On Dec 17, 2009, at 4:29 PM, nick hatch wrote:

...
Or, maybe there's something else going on. Many ESXi users do use the

local datastores, so maybe that's where the "enterprise has less timing issues" perception comes from -- if you're paying for licensing, you're probably using a SAN and vice versa. If using local datastores affects network I/O, "enterprise" users might sidestep this issue entirely.

Timing issues have nothing to do with what datastore type you are using. They have everything to do with how VMWare handles interrupts.

Indeed, that makes sense, and is what I was getting at. However, if you have a local datastore, you're generating hardware interrupts on the box above what one would see with remote storage, no? It would seem to make sense that under heavy load, when interrupts matter, not running a RAID controller of your own could be an advantage. Gigabit NICs have sane methods of dealing with interrupts, like coalescence. I could be wrong, however. My point was just an attempt to explain why someone might think timing issues are related to "non-enterprise" VMWare. -Nick

daryl＠introspect.net

1:30 p.m.

I agree that local storage MAY be an issue....but in a call processing box you shouldn't be hitting the disk all that much. Also, the free version of ESXi supports iSCSI and NFS....so.....I have to go back to people not actually knowing what they are talking about and repeating anecdotes they've read. And as we all know, it's much easier to find someone spouting off about a problem they are having rather than just checking in to say "everything's fine here". On Dec 17, 2009, at 9:43 PM, nick hatch wrote:

...

Indeed, that makes sense, and is what I was getting at. However, if you have a local datastore, you're generating hardware interrupts on the box above what one would see with remote storage, no? It would seem to make sense that under heavy load, when interrupts matter, not running a RAID controller of your own could be an advantage. Gigabit NICs have sane methods of dealing with interrupts, like coalescence.

I could be wrong, however. My point was just an attempt to explain why someone might think timing issues are related to "non-enterprise" VMWare.

-Nick _______________________________________________ VoiceOps mailing list VoiceOps at voiceops.org https://puck.nether.net/mailman/listinfo/voiceops

carlos＠televolve.com

4:19 p.m.

On 12/17/09 2:29 PM, nick hatch wrote:

...

On Thu, Dec 17, 2009 at 12:46 PM, Daryl G. Jurbala <daryl at introspect.net <mailto:daryl at introspect.net>> wrote:

The "free version" (ESXi/vSphere) is identical to the version you pay for, minus the manageability and failover/redundancy features. Are you talking about the nearly defunct VMWare server that runs on top of a base OS?

I've certainly noticed network latency issues on ESXi when system I/O increases. (Hundreds of ms jitter.) I figured this was the same with ESX as well, but there might be architectural differences between the two platforms which would make a difference.

Or, maybe there's something else going on. Many ESXi users do use the local datastores, so maybe that's where the "enterprise has less timing issues" perception comes from -- if you're paying for licensing, you're probably using a SAN and vice versa. If using local datastores affects network I/O, "enterprise" users might sidestep this issue entirely.

I've never used ESXi, so I can't address it at all. Presumably it's the same kernel as ESX/VI3, but without actually trying it I can't say anything about it. Our server cluster consists of Dell 1850 blades with the ESX OS stored locally (36GB 15k RAID 1 on a dedicated controller) and the VMs on an EMC iSCSI SAN. Running on local SATA storage is far more demanding and wasteful of resources. -- Carlos Alvarez TelEvolve 602-889-3003 Advanced phone services simplified

5917

Age (days ago)

5921

Last active (days ago)

List overview

Download

17 comments

9 participants

participants (9)

abalashov＠evaristesys.com
anorexicpoodle＠gmail.com
BrandonB＠netins.com
carlos＠televolve.com
daryl＠introspect.net
hiersd＠gmail.com
nicholas.hatch＠gmail.com
scott＠sberkman.net
zavoid＠gmail.com