
Right, that would impact existing established calls and subscriptions. However, larger problem is that in a failover condition endpoints would begin sending SIP Invites to a SBC (pair) that does not have a valid registration cached, and depending on the configuration, SIP port map. Depending on the re-registration interval, this could result in outbound call failures for anywhere for a couple minutes to an hour or more. In the reverse direction the inside SBC IP would be unique at each site. This would render the metaswitch unable to contact the endpoint for PSTN -> VoIP calls for the period of re-registration as well. -- Jason Nesheim ----- Original Message ----- From: "Alex Balashov" <abalashov at evaristesys.com> To: "Jason L. Nesheim" <jnesheim at cytek.biz> Cc: "Brad Anouar" <Brad at broadcore.com>, voiceops at voiceops.org Sent: Sunday, September 6, 2009 5:39:46 PM Subject: Re: [VoiceOps] Acme SBC geographic redundancy Jason L. Nesheim wrote:
If you're going to use SRV records in an Access deployment with registration caching on the Acme SBC one thing to keep in mind is that non-HA SBCs (or two separate pairs) will not share their registration and NAT databases.
I would imagine they don't share transaction and dialog state either. -- Alex Balashov - Principal Evariste Systems Web : http://www.evaristesys.com/ Tel : (+1) (678) 954-0670 Direct : (+1) (678) 954-0671 Mobile : (+1) (678) 237-1775

Jason L. Nesheim wrote:
Right, that would impact existing established calls and subscriptions. However, larger problem is that in a failover condition endpoints would begin sending SIP Invites to a SBC (pair) that does not have a valid registration cached, and depending on the configuration, SIP port map. Depending on the re-registration interval, this could result in outbound call failures for anywhere for a couple minutes to an hour or more. In the reverse direction the inside SBC IP would be unique at each site. This would render the metaswitch unable to contact the endpoint for PSTN -> VoIP calls for the period of re-registration as well.
From a conventional IT/IP data person's perspective, I think these problems are all an acceptable tradeoff given the relatively marginal scenario under which an Acme would fail or a facility would fail, as a whole. But people with LEC backgrounds won't put up with this way of thinking, typically, because they're used to being regulated and under legal mandate to provide high availability, and used to resilience built into big-iron technologies that have remained relatively unchanged for a long time. VoIP operators are somewhere in the middle at this point, but the tendency seems to be to increase their regulatory burden, in principle. The reality is that to get this HA setup going with geographically redundant endpoints - including getting the routing stuff right - will require that you spend 90% of your resources to get the last 2% of reliability. Even then, if anyone harbours delusions of fast, seamless protection switching, well, prepare to be disappointed. It's not going to happen. Nor should it happen. It's just not economical, and not worth doing. As long as you can get away with it from a legal perspective - and so far, as a VoIP operator, it seems that you can - I would advise everyone to calm down, stop hyperventilating about the critical distinction between four-9s and five-9s, and just put in the redundancy measures to meet enough requirements to survive commercially. -- Alex -- Alex Balashov - Principal Evariste Systems Web : http://www.evaristesys.com/ Tel : (+1) (678) 954-0670 Direct : (+1) (678) 954-0671 Mobile : (+1) (678) 237-1775

On Sun, 6 Sep 2009, Alex Balashov wrote:
The reality is that to get this HA setup going with geographically redundant endpoints - including getting the routing stuff right - will require that you spend 90% of your resources to get the last 2% of reliability. Even then, if anyone harbours delusions of fast, seamless protection switching, well, prepare to be disappointed. It's not going to happen. Nor should it happen. It's just not economical, and not worth doing.
That's an interesting statement. I agree with your thoughts on resource allocation, but your blanket comment about whether or not it should happen and is worth doing is another and probably deserves more consideration. There is a status quo, and as carriers large and small shift to VoIP and start to change the status quo implicitly (ie, we're no longer going to offer the type of resiliency we've traditionally had in the voice network), we do a disserve to the "masses", so to speak, that harbor expectations to the contrary and don't differentiate no matter how many contracts you wave in their face. Particularly somebody else is buying the service on behalf on a large organization that has no idea of what's going on. The "cheap" voice providers get to cut corners (or just don't bother) to save costs. The "expensive" voice providers end up caught between regulatory and technical costs and demand to follow suit so they don't lose their user base. One day, a totally unexpected outage occurs and suddenly that last 2% starts to loom unexpectedly large for all involved. I think it's good that conversations like this happen to make the implicit considerations more explicit. Having lived through my fair share of disasters here in NYC and watching networks fail in unexpected and grandiose ways, I find that lack of perspective is not doing the VoIP any favors. Cheers, David.
participants (3)
-
abalashov@evaristesys.com
-
davidb@pins.net
-
jnesheim@cytek.biz