Ideas for Building Inbound Redundancy

I've searched back in the VoiceOps archive for a discussion about redundancy, and it has been a while. I wanted to spark a useful discussion for the list, and hopefully figure out one issue we're having. We use a SIP provider that connects to our IP-PBX in our datacenter. We have standard redundancy within our PBX infrastructure, but we want to build out redundancy for if our SIP provider has an issue reaching our PBX (could be DNS issues, network issues, primary & backup datacenters down, etc.). If our SIP provider is unable to reach our PBX, they offer a 'failover route' where it can route to a standard 11-digit DID. For individual employees' DIDs, we're just routing directly to their cell number. The issue is routing our high-volume DIDs -- the main IVR, the call groups, etc -- to simultaneously ring a group of cell numbers.
From our research, the best bet seems to be to utilize eVoice <https://www.evoice.com/feature/included/simultaneous-ring>, and setup up a new (unpublished) DID to ring ~15 cell phones using up their 'Simultaneous Ring,' and then forward our high-volume DIDs to the new eVoice DID.
Does anybody have experience with this? Am I going about this in an asinine way? Is there a better vendor? Are there other ways that you've built out inbound redundancy for VoIP solutions you've deployed? Thanks, Jacob P.S. To add to discussion - While our SIP provider were to be disconnected from our PBX, lost inbound faxes are also a concern. I'd then use something like eFax <https://www.efax.com.au/web-fax-pricing> to take the inbound faxes to a distribution group email (after testing it works while forwarded from our SIP provider, who does use T.38), by setting a backup route for our fax DIDs to hit our new eFax DID. These two measures would ensure minimal missed calls/faxes while critical infrastructure was down. eVoice Link: https://www.evoice.com/feature/included/simultaneous-ring eFax Link: https://www.efax.com/features

Without knowing the PBX type and how your users connect to it (SIP?), it?s a bit of a guess, but here is what we do for our cloud contact centre solution (and for our clients who run their own PBXs): - Your provider should be able to failover to another IP endpoint fairly easily (in addition or instead to routing to a 11 digit number). - With that, you can set up another instance of your PBX ? at a different data center (keep configurations replicated) - Have your user accounts register into both (Line 1 to DC A, Line 2 to DC B). - If there is an outage in data center A, calls get routed to Data Center B. - User phones will still ring, the queues will work, etc. No one should notice that there was an outage at DC A. Best Regards, Ivan Kovacevic Vice President, Client Services Star Telecom | www.startelecom.ca | SIP Based Services for Contact Centers | LinkedIn <https://www.linkedin.com/company/star-telecom-www-startelecom-ca-?trk=biz-co...> *From:* VoiceOps [mailto:voiceops-bounces at voiceops.org] *On Behalf Of *Voip Jacob *Sent:* February 2, 2017 8:38 AM *To:* voiceops at voiceops.org *Subject:* [VoiceOps] Ideas for Building Inbound Redundancy I've searched back in the VoiceOps archive for a discussion about redundancy, and it has been a while. I wanted to spark a useful discussion for the list, and hopefully figure out one issue we're having. We use a SIP provider that connects to our IP-PBX in our datacenter. We have standard redundancy within our PBX infrastructure, but we want to build out redundancy for if our SIP provider has an issue reaching our PBX (could be DNS issues, network issues, primary & backup datacenters down, etc.). If our SIP provider is unable to reach our PBX, they offer a 'failover route' where it can route to a standard 11-digit DID. For individual employees' DIDs, we're just routing directly to their cell number. The issue is routing our high-volume DIDs -- the main IVR, the call groups, etc -- to simultaneously ring a group of cell numbers.
From our research, the best bet seems to be to utilize eVoice <https://www.evoice.com/feature/included/simultaneous-ring>, and setup up a new (unpublished) DID to ring ~15 cell phones using up their 'Simultaneous Ring,' and then forward our high-volume DIDs to the new eVoice DID.
Does anybody have experience with this? Am I going about this in an asinine way? Is there a better vendor? Are there other ways that you've built out inbound redundancy for VoIP solutions you've deployed? Thanks, Jacob P.S. To add to discussion - While our SIP provider were to be disconnected from our PBX, lost inbound faxes are also a concern. I'd then use something like eFax <https://www.efax.com.au/web-fax-pricing> to take the inbound faxes to a distribution group email (after testing it works while forwarded from our SIP provider, who does use T.38), by setting a backup route for our fax DIDs to hit our new eFax DID. These two measures would ensure minimal missed calls/faxes while critical infrastructure was down. eVoice Link: https://www.evoice.com/feature/included/simultaneous-ring eFax Link: https://www.efax.com/features

Thanks for the response Ivan -- My apologies if I wasn't clear. We are using SIP to connect to a 3CX PBX, and we already have the redundancy with a secondary PBX in another data center already. The case that we're trying to protect against would be if both PBXs at both data centers were unreachable for our SIP provider (DNS issues, internal network routing issues, routing issues between SIP provider & datacenters, etc.). We'd have the failover hit user cell phones for direct DIDs, the problem in this scenario is routing the call queues/groups to a batch of cell phones phones (assuming that our entire PBX/desk phones infrastructure is down). Thanks, Jacob On Thu, Feb 2, 2017 at 8:48 AM, Ivan Kovacevic < ivan.kovacevic at startelecom.ca> wrote:
Without knowing the PBX type and how your users connect to it (SIP?), it?s a bit of a guess, but here is what we do for our cloud contact centre solution (and for our clients who run their own PBXs):
- Your provider should be able to failover to another IP endpoint fairly easily (in addition or instead to routing to a 11 digit number).
- With that, you can set up another instance of your PBX ? at a different data center (keep configurations replicated)
- Have your user accounts register into both (Line 1 to DC A, Line 2 to DC B).
- If there is an outage in data center A, calls get routed to Data Center B.
- User phones will still ring, the queues will work, etc. No one should notice that there was an outage at DC A.
Best Regards,
Ivan Kovacevic
Vice President, Client Services
Star Telecom | www.startelecom.ca | SIP Based Services for Contact Centers | LinkedIn <https://www.linkedin.com/company/star-telecom-www-startelecom-ca-?trk=biz-co...>
*From:* VoiceOps [mailto:voiceops-bounces at voiceops.org] *On Behalf Of *Voip Jacob *Sent:* February 2, 2017 8:38 AM *To:* voiceops at voiceops.org *Subject:* [VoiceOps] Ideas for Building Inbound Redundancy
I've searched back in the VoiceOps archive for a discussion about redundancy, and it has been a while. I wanted to spark a useful discussion for the list, and hopefully figure out one issue we're having.
We use a SIP provider that connects to our IP-PBX in our datacenter. We have standard redundancy within our PBX infrastructure, but we want to build out redundancy for if our SIP provider has an issue reaching our PBX (could be DNS issues, network issues, primary & backup datacenters down, etc.).
If our SIP provider is unable to reach our PBX, they offer a 'failover route' where it can route to a standard 11-digit DID. For individual employees' DIDs, we're just routing directly to their cell number. The issue is routing our high-volume DIDs -- the main IVR, the call groups, etc -- to simultaneously ring a group of cell numbers.
From our research, the best bet seems to be to utilize eVoice <https://www.evoice.com/feature/included/simultaneous-ring>, and setup up a new (unpublished) DID to ring ~15 cell phones using up their 'Simultaneous Ring,' and then forward our high-volume DIDs to the new eVoice DID.
Does anybody have experience with this? Am I going about this in an asinine way? Is there a better vendor? Are there other ways that you've built out inbound redundancy for VoIP solutions you've deployed?
Thanks,
Jacob
P.S. To add to discussion - While our SIP provider were to be disconnected from our PBX, lost inbound faxes are also a concern. I'd then use something like eFax <https://www.efax.com.au/web-fax-pricing> to take the inbound faxes to a distribution group email (after testing it works while forwarded from our SIP provider, who does use T.38), by setting a backup route for our fax DIDs to hit our new eFax DID. These two measures would ensure minimal missed calls/faxes while critical infrastructure was down.
eVoice Link: https://www.evoice.com/feature/included/simultaneous-ring
eFax Link: https://www.efax.com/features

I see, makes sense. Have you looked at Grasshopper? They can support both fax and voice and you can set up ring groups (ring-all, ring in sequence). May be a one stop solution. Best Regards, Ivan Kovacevic Vice President, Client Services Star Telecom | www.startelecom.ca | SIP Based Services for Contact Centers | LinkedIn <https://www.linkedin.com/company/star-telecom-www-startelecom-ca-?trk=biz-co...> *From:* Voip Jacob [mailto:voipjacob at gmail.com] *Sent:* February 2, 2017 10:30 AM *To:* Ivan Kovacevic <ivan.kovacevic at startelecom.ca> *Cc:* voiceops at voiceops.org *Subject:* Re: [VoiceOps] Ideas for Building Inbound Redundancy Thanks for the response Ivan -- My apologies if I wasn't clear. We are using SIP to connect to a 3CX PBX, and we already have the redundancy with a secondary PBX in another data center already. The case that we're trying to protect against would be if both PBXs at both data centers were unreachable for our SIP provider (DNS issues, internal network routing issues, routing issues between SIP provider & datacenters, etc.). We'd have the failover hit user cell phones for direct DIDs, the problem in this scenario is routing the call queues/groups to a batch of cell phones phones (assuming that our entire PBX/desk phones infrastructure is down). Thanks, Jacob On Thu, Feb 2, 2017 at 8:48 AM, Ivan Kovacevic < ivan.kovacevic at startelecom.ca> wrote: Without knowing the PBX type and how your users connect to it (SIP?), it?s a bit of a guess, but here is what we do for our cloud contact centre solution (and for our clients who run their own PBXs): - Your provider should be able to failover to another IP endpoint fairly easily (in addition or instead to routing to a 11 digit number). - With that, you can set up another instance of your PBX ? at a different data center (keep configurations replicated) - Have your user accounts register into both (Line 1 to DC A, Line 2 to DC B). - If there is an outage in data center A, calls get routed to Data Center B. - User phones will still ring, the queues will work, etc. No one should notice that there was an outage at DC A. Best Regards, Ivan Kovacevic Vice President, Client Services Star Telecom | www.startelecom.ca | SIP Based Services for Contact Centers | LinkedIn <https://www.linkedin.com/company/star-telecom-www-startelecom-ca-?trk=biz-co...> *From:* VoiceOps [mailto:voiceops-bounces at voiceops.org] *On Behalf Of *Voip Jacob *Sent:* February 2, 2017 8:38 AM *To:* voiceops at voiceops.org *Subject:* [VoiceOps] Ideas for Building Inbound Redundancy I've searched back in the VoiceOps archive for a discussion about redundancy, and it has been a while. I wanted to spark a useful discussion for the list, and hopefully figure out one issue we're having. We use a SIP provider that connects to our IP-PBX in our datacenter. We have standard redundancy within our PBX infrastructure, but we want to build out redundancy for if our SIP provider has an issue reaching our PBX (could be DNS issues, network issues, primary & backup datacenters down, etc.). If our SIP provider is unable to reach our PBX, they offer a 'failover route' where it can route to a standard 11-digit DID. For individual employees' DIDs, we're just routing directly to their cell number. The issue is routing our high-volume DIDs -- the main IVR, the call groups, etc -- to simultaneously ring a group of cell numbers.
From our research, the best bet seems to be to utilize eVoice <https://www.evoice.com/feature/included/simultaneous-ring>, and setup up a new (unpublished) DID to ring ~15 cell phones using up their 'Simultaneous Ring,' and then forward our high-volume DIDs to the new eVoice DID.
Does anybody have experience with this? Am I going about this in an asinine way? Is there a better vendor? Are there other ways that you've built out inbound redundancy for VoIP solutions you've deployed? Thanks, Jacob P.S. To add to discussion - While our SIP provider were to be disconnected from our PBX, lost inbound faxes are also a concern. I'd then use something like eFax <https://www.efax.com.au/web-fax-pricing> to take the inbound faxes to a distribution group email (after testing it works while forwarded from our SIP provider, who does use T.38), by setting a backup route for our fax DIDs to hit our new eFax DID. These two measures would ensure minimal missed calls/faxes while critical infrastructure was down. eVoice Link: https://www.evoice.com/feature/included/simultaneous-ring eFax Link: https://www.efax.com/features

Our redundant PBX's have (4) trunks to multiple datacenters to which we our core switches know if the trunk is down and will move the incoming call along to the next PBX (Primary vs Secondary). All trunks are always online, so its not DNS based, calls flow to which ever is online. Some carriers will want to round robin all (4) trunks , just to make sure they know all sites are working. It'll be up to you to make an internal routing core to take the round robin calls and direct it to whatever PBX you want to work off of. We then have our redundant pbx's like this https://youtu.be/rJsr4k0RBH8. Aryn H. K. Nakaoka anakaoka at trinet-hi.com Direct: 808.356.2901 Fax : 808.356.2919 Tri-net Solutions 518 Holokahana Lane Suite #200 Honolulu, HI 96817 http://www.trinet-hi.com https://twitter.com/AlohaTone Aloha Tone PBX https://www.youtube.com/watch?v=96YWPY9wCeU Aloha Tone (HA) High Availability http://youtu.be/rJsr4k0RBH8 CONFIDENTIALITY NOTICE: The information contained in this email and any attachments may be privileged, confidential and protected from disclosure. Any disclosure, distribution or copying of this email or any attachments by persons or entities other than the intended recipient is prohibited. If you have received this email in error, please notify the sender immediately by replying to the message and deleting this email and any attachments from your system. Thank you for your cooperation. On Thu, Feb 2, 2017 at 5:49 AM, Ivan Kovacevic < ivan.kovacevic at startelecom.ca> wrote:
I see, makes sense.
Have you looked at Grasshopper? They can support both fax and voice and you can set up ring groups (ring-all, ring in sequence). May be a one stop solution.
Best Regards,
Ivan Kovacevic
Vice President, Client Services
Star Telecom | www.startelecom.ca | SIP Based Services for Contact Centers | LinkedIn <https://www.linkedin.com/company/star-telecom-www-startelecom-ca-?trk=biz-co...>
*From:* Voip Jacob [mailto:voipjacob at gmail.com] *Sent:* February 2, 2017 10:30 AM *To:* Ivan Kovacevic <ivan.kovacevic at startelecom.ca> *Cc:* voiceops at voiceops.org *Subject:* Re: [VoiceOps] Ideas for Building Inbound Redundancy
Thanks for the response Ivan -- My apologies if I wasn't clear. We are using SIP to connect to a 3CX PBX, and we already have the redundancy with a secondary PBX in another data center already.
The case that we're trying to protect against would be if both PBXs at both data centers were unreachable for our SIP provider (DNS issues, internal network routing issues, routing issues between SIP provider & datacenters, etc.). We'd have the failover hit user cell phones for direct DIDs, the problem in this scenario is routing the call queues/groups to a batch of cell phones phones (assuming that our entire PBX/desk phones infrastructure is down).
Thanks,
Jacob
On Thu, Feb 2, 2017 at 8:48 AM, Ivan Kovacevic < ivan.kovacevic at startelecom.ca> wrote:
Without knowing the PBX type and how your users connect to it (SIP?), it?s a bit of a guess, but here is what we do for our cloud contact centre solution (and for our clients who run their own PBXs):
- Your provider should be able to failover to another IP endpoint fairly easily (in addition or instead to routing to a 11 digit number).
- With that, you can set up another instance of your PBX ? at a different data center (keep configurations replicated)
- Have your user accounts register into both (Line 1 to DC A, Line 2 to DC B).
- If there is an outage in data center A, calls get routed to Data Center B.
- User phones will still ring, the queues will work, etc. No one should notice that there was an outage at DC A.
Best Regards,
Ivan Kovacevic
Vice President, Client Services
Star Telecom | www.startelecom.ca | SIP Based Services for Contact Centers | LinkedIn <https://www.linkedin.com/company/star-telecom-www-startelecom-ca-?trk=biz-co...>
*From:* VoiceOps [mailto:voiceops-bounces at voiceops.org] *On Behalf Of *Voip Jacob *Sent:* February 2, 2017 8:38 AM *To:* voiceops at voiceops.org *Subject:* [VoiceOps] Ideas for Building Inbound Redundancy
I've searched back in the VoiceOps archive for a discussion about redundancy, and it has been a while. I wanted to spark a useful discussion for the list, and hopefully figure out one issue we're having.
We use a SIP provider that connects to our IP-PBX in our datacenter. We have standard redundancy within our PBX infrastructure, but we want to build out redundancy for if our SIP provider has an issue reaching our PBX (could be DNS issues, network issues, primary & backup datacenters down, etc.).
If our SIP provider is unable to reach our PBX, they offer a 'failover route' where it can route to a standard 11-digit DID. For individual employees' DIDs, we're just routing directly to their cell number. The issue is routing our high-volume DIDs -- the main IVR, the call groups, etc -- to simultaneously ring a group of cell numbers.
From our research, the best bet seems to be to utilize eVoice <https://www.evoice.com/feature/included/simultaneous-ring>, and setup up a new (unpublished) DID to ring ~15 cell phones using up their 'Simultaneous Ring,' and then forward our high-volume DIDs to the new eVoice DID.
Does anybody have experience with this? Am I going about this in an asinine way? Is there a better vendor? Are there other ways that you've built out inbound redundancy for VoIP solutions you've deployed?
Thanks,
Jacob
P.S. To add to discussion - While our SIP provider were to be disconnected from our PBX, lost inbound faxes are also a concern. I'd then use something like eFax <https://www.efax.com.au/web-fax-pricing> to take the inbound faxes to a distribution group email (after testing it works while forwarded from our SIP provider, who does use T.38), by setting a backup route for our fax DIDs to hit our new eFax DID. These two measures would ensure minimal missed calls/faxes while critical infrastructure was down.
eVoice Link: https://www.evoice.com/feature/included/simultaneous-ring
eFax Link: https://www.efax.com/features
_______________________________________________ VoiceOps mailing list VoiceOps at voiceops.org https://puck.nether.net/mailman/listinfo/voiceops

Am 02.02.2017 um 16:30 schrieb Voip Jacob:
The case that we're trying to protect against would be if both PBXs at both data centers were unreachable for our SIP provider (DNS issues, internal network routing issues, routing issues between SIP provider & datacenters, etc.). [...]
I can't help exactly with the call queues/groups thingie, but here's what I did recently for my inbound infrastructure where DIDs get routed to me from several carriers and routed from me to my customers, via SIP. I believe I have covered all possible outage scenarios with this setup: There are 2 data centers, geographically diverse. I colocate servers + routers in each. Let's call that a "site". Each site has two redundant (with VRRP) routers that speak BGP with the upstream routers. Per site, there are 2 physical servers with CentOS + KVM, and each server hosts 2 VMs: VM 1) Asterisk for SIP/RTP + Galera Cluster for database VM 2) quagga for BGP + kamailio as SIP proxy for load balancing/SIP failover A total of 8 VMs. (To start with) Network: I took a spare /24 IPv4 netblock I had lying around, and quagga running on the 4 quagga/kamailio VMs is announcing this prefix via BGP to the 4 internet-facing routers. Each quagga connects to both the active and standby router local to this site. That means a total of 8 BGP sessions, 4 per site, 2 per router. Announcing this prefix at multiple sites at the same time, where each site uses different upstream providers, results in that IPv4 prefix becoming "anycast'ed", meaning it is visible in the global routing table via multiple paths and the decision at which site IP packets for this prefix ends up on is made by the BGP algorithm (and by the provider where the traffic originates). A single IP address out of that /24 is up on all 4 quagga VMs as an alias address. Yes, the same IP address, four times. You may think this might be broken and cause problems if there is the same IP up in the same VLAN, but, the BGP algorithm on our internet-facing routers will choose one of the VMs and decide where to send all traffic to, the other VM will act as standby. There is no LAN traffic to/from this particular IP, so it just works. Initially I messed around with "keepalived" and some other tools, but it didn't work out, and running rather less userland daemons (which can crash, too) is better. :) Now, we tell the providers that we buy DIDs from: Hey, route all the SIP packets for us to this particular IP only. A single IP is all they need and get from us! No more "Please add our new IP address ...". Database: I use both Asterisk' dialplan and also A2Billing (a "VoIP Softswitch Solution", open source and free of charge) to route DIDs to customers. Because of A2B and because of CDRs we need a MySQL database. This is what Galera Cluster is for, a multi-master active-active replacement for MySQL. There are 4 Asterisk/database VMs, so there are 4 instances running which synchronize each other all the time, thus it does not matter to which instance you are sending your write requests. I simply use the local node for read + write and let Galera take care of the internals. There is also a 9th VM in a country far, far way which only runs Galera arbiter, does not store any MySQL data and simply acts like a decision-making component which is there to prevent split-brain situations because of the even node count. It's good that it's far away so it is aware when a whole site is down due to network issues. SIP + media: kamailio running on the quagga VM is the entry point for all inbound SIP traffic. A simple configuration which basically just says: There are 2 Asterisk servers to distribute the calls to. Check if they're both up. If one of them is down, send all traffic to the remaining server. If both are up, distribute evenly 50/50 so we get load balancing and all of our servers will actually process calls and not just sit idle until disaster comes. We let Asterisk handle all RTP and don't worry about an RTP proxy. Each Asterisk VM has an public IP address that is local to this site and unique. So I tell my customers: Please allow inbound calls from these IPs: 5.5.5.5 (site 1, server 1, VM 1) 5.5.5.6 (site 1, server 2, VM 1) 90.90.90.90 (site 2, server 1, VM 1) 90.90.90.91 (site 2, server 2, VM 1) Now, let's go through the possible disasters and see how this whole thing will react: - Data center lights up in a big ball of fire/upstreams go down/fiber cut etc.: site 1 is down, thanks to BGP anycast all traffic will instantly and without manual intervention go to site 2, and vice versa. - Router dies: Remaining router takes over, BGP sessions to both quaggas on each, BGP sessions to upstreams on each, VRRP between them. Instantly + no manual intervention. - Physical server dies: quagga VM goes down, BGP session disappears, BGP session to quagga VM on remaining physical server takes priority on router. (quagga + kamailio are active all the time on both VMs, on "standby" VM just sit idle until the other quagga disappears) - quagga VM down/reboot: BGP session disappears, remaining VM gets priority. - Asterisk crashes: kamailio detects this and sends all calls to remaining Asterisk. All MySQL data is always everywhere, always write-able, regardless of site blowup, server failure or VM crash. We don't worry about harddrive faults or filesystem redundancy (GlusterFS, ...). Keep it simple. If a server dies, we replace it. (We still use RAID, of course) I make all configuration changes on a single node. A simple script syncs the configuration with the other Asterisk'es and reloads them. Same for web access to A2B, a single node is designated for that, but you could easily make that redundant as well if web access is crucial, again thanks to BGP anycast. The cool thing is that it scales for both load and redundancy count, just add new servers as you please on any site and add them to Galera, kamailio, even BGP if you wish. You could even add more sites in other cities, countries, continents. If you have read until this point you have possibly figured out that if kamailio crashes on the quagga VM that has currently priority, calls will go to a black hole. I was too lazy to setup kamailio-failover...yet :) Also, since there are multiple carriers that deliver DIDs, spread over the world and using different upstreams, anycast really does its job and some traffic arrives at site 1, other traffic at site 2. By luck, it's currently close to a 50/50 distribution. Since I took the couple of days to implement that, I sleep so well again. :) Regards Markus PS: BGP anycast is awesome.

Hi Markus, I'm curious. How are you handling reinvite traffic (And/or changes in SDP) from different sources when using Anycast? I'll give you an example. I have a provider I use for outbound LD. This provider takes my traffic, LCR's it and then steps out of the media path. Meaning, I send SIP to $SIPPROVIDER, They respond with SDP turning up media directly from lets say Level 3 to me. I exchange RTP directly with Level 3. Would there not be a potential issue here with a change in source of RTP? In that, The call could originate from $SITE1, Level3 being closer to $SITE2 Anycasts the inbound RTP traffic to a switch/site that has no active call for the incoming RTP? Now, I've never seen this on inbound calls. But see it all the time on outbound. Perhaps that's the make/break here? Thoughts? Nick Olsen Sr. Network Engineer Florida High Speed Internet (321) 205-1100 x106 ---------------------------------------- From: "Markus" <universe at truemetal.org> Sent: Friday, February 03, 2017 3:02 PM To: voiceops at voiceops.org Subject: Re: [VoiceOps] Ideas for Building Inbound Redundancy Am 02.02.2017 um 16:30 schrieb Voip Jacob:
The case that we're trying to protect against would be if both PBXs at both data centers were unreachable for our SIP provider (DNS issues, internal network routing issues, routing issues between SIP provider & datacenters, etc.). [...]
I can't help exactly with the call queues/groups thingie, but here's what I did recently for my inbound infrastructure where DIDs get routed to me from several carriers and routed from me to my customers, via SIP. I believe I have covered all possible outage scenarios with this setup: There are 2 data centers, geographically diverse. I colocate servers + routers in each. Let's call that a "site". Each site has two redundant (with VRRP) routers that speak BGP with the upstream routers. Per site, there are 2 physical servers with CentOS + KVM, and each server hosts 2 VMs: VM 1) Asterisk for SIP/RTP + Galera Cluster for database VM 2) quagga for BGP + kamailio as SIP proxy for load balancing/SIP failover A total of 8 VMs. (To start with) Network: I took a spare /24 IPv4 netblock I had lying around, and quagga running on the 4 quagga/kamailio VMs is announcing this prefix via BGP to the 4 internet-facing routers. Each quagga connects to both the active and standby router local to this site. That means a total of 8 BGP sessions, 4 per site, 2 per router. Announcing this prefix at multiple sites at the same time, where each site uses different upstream providers, results in that IPv4 prefix becoming "anycast'ed", meaning it is visible in the global routing table via multiple paths and the decision at which site IP packets for this prefix ends up on is made by the BGP algorithm (and by the provider where the traffic originates). A single IP address out of that /24 is up on all 4 quagga VMs as an alias address. Yes, the same IP address, four times. You may think this might be broken and cause problems if there is the same IP up in the same VLAN, but, the BGP algorithm on our internet-facing routers will choose one of the VMs and decide where to send all traffic to, the other VM will act as standby. There is no LAN traffic to/from this particular IP, so it just works. Initially I messed around with "keepalived" and some other tools, but it didn't work out, and running rather less userland daemons (which can crash, too) is better. :) Now, we tell the providers that we buy DIDs from: Hey, route all the SIP packets for us to this particular IP only. A single IP is all they need and get from us! No more "Please add our new IP address ...". Database: I use both Asterisk' dialplan and also A2Billing (a "VoIP Softswitch Solution", open source and free of charge) to route DIDs to customers. Because of A2B and because of CDRs we need a MySQL database. This is what Galera Cluster is for, a multi-master active-active replacement for MySQL. There are 4 Asterisk/database VMs, so there are 4 instances running which synchronize each other all the time, thus it does not matter to which instance you are sending your write requests. I simply use the local node for read + write and let Galera take care of the internals. There is also a 9th VM in a country far, far way which only runs Galera arbiter, does not store any MySQL data and simply acts like a decision-making component which is there to prevent split-brain situations because of the even node count. It's good that it's far away so it is aware when a whole site is down due to network issues. SIP + media: kamailio running on the quagga VM is the entry point for all inbound SIP traffic. A simple configuration which basically just says: There are 2 Asterisk servers to distribute the calls to. Check if they're both up. If one of them is down, send all traffic to the remaining server. If both are up, distribute evenly 50/50 so we get load balancing and all of our servers will actually process calls and not just sit idle until disaster comes. We let Asterisk handle all RTP and don't worry about an RTP proxy. Each Asterisk VM has an public IP address that is local to this site and unique. So I tell my customers: Please allow inbound calls from these IPs: 5.5.5.5 (site 1, server 1, VM 1) 5.5.5.6 (site 1, server 2, VM 1) 90.90.90.90 (site 2, server 1, VM 1) 90.90.90.91 (site 2, server 2, VM 1) Now, let's go through the possible disasters and see how this whole thing will react: - Data center lights up in a big ball of fire/upstreams go down/fiber cut etc.: site 1 is down, thanks to BGP anycast all traffic will instantly and without manual intervention go to site 2, and vice versa. - Router dies: Remaining router takes over, BGP sessions to both quaggas on each, BGP sessions to upstreams on each, VRRP between them. Instantly + no manual intervention. - Physical server dies: quagga VM goes down, BGP session disappears, BGP session to quagga VM on remaining physical server takes priority on router. (quagga + kamailio are active all the time on both VMs, on "standby" VM just sit idle until the other quagga disappears) - quagga VM down/reboot: BGP session disappears, remaining VM gets priority. - Asterisk crashes: kamailio detects this and sends all calls to remaining Asterisk. All MySQL data is always everywhere, always write-able, regardless of site blowup, server failure or VM crash. We don't worry about harddrive faults or filesystem redundancy (GlusterFS, ...). Keep it simple. If a server dies, we replace it. (We still use RAID, of course) I make all configuration changes on a single node. A simple script syncs the configuration with the other Asterisk'es and reloads them. Same for web access to A2B, a single node is designated for that, but you could easily make that redundant as well if web access is crucial, again thanks to BGP anycast. The cool thing is that it scales for both load and redundancy count, just add new servers as you please on any site and add them to Galera, kamailio, even BGP if you wish. You could even add more sites in other cities, countries, continents. If you have read until this point you have possibly figured out that if kamailio crashes on the quagga VM that has currently priority, calls will go to a black hole. I was too lazy to setup kamailio-failover...yet :) Also, since there are multiple carriers that deliver DIDs, spread over the world and using different upstreams, anycast really does its job and some traffic arrives at site 1, other traffic at site 2. By luck, it's currently close to a 50/50 distribution. Since I took the couple of days to implement that, I sleep so well again. :) Regards Markus PS: BGP anycast is awesome. _______________________________________________ VoiceOps mailing list VoiceOps at voiceops.org https://puck.nether.net/mailman/listinfo/voiceops

Hi Nick, Am 06.02.2017 um 17:45 schrieb Nick Olsen:
Hi Markus, I'm curious. How are you handling reinvite traffic (And/or changes in SDP) from different sources when using Anycast? [...] Now, I've never seen this on inbound calls. But see it all the time on outbound. Perhaps that's the make/break here? Thoughts?
I think that is it. I've also never seen it on inbound calls. But in my scenario, wouldn't it be rather *my* endpoint that would send the RE-INVITE? (for whatever reason) Since I don't have a need for that, I just never worried about it. So far, It Just Works. :) If it was a problem, I guess a solution would be to get kamailio somehow step out of the way, probably by sending 300 Redirect? Actually, I tried this at first, but never got it to work properly: https://www.kamailio.org/dokuwiki/doku.php/asterisk:load-balancing-and-ha Or maybe there is even a module available for kamailio that synchronizes active SIP calls between multiple kamailio installations?! The Asterisk'es on all sites don't have anything to do with anycast, they just have public IPs that are local to the corresponding site, so they are unique. Regards Markus

Markus, Thanks for the info!
I think that is it. I've also never seen it on inbound calls.
But in my scenario, wouldn't it be rather *my* endpoint that would send the RE-INVITE? (for whatever reason) Since I don't have a need for that, I just never worried about it. So far, It Just Works. :)
Honestly, I haven't done a packet capture on it. But I don't think it's technically a reinvite. In my scenario, I think it's just passing SDP untouched. So the media IP's are intact and causes the switch to establish RTP directly. (I also see the ULC's User agent (So much sonus)) This can at times be a real pain when trying to capture the traffic on the wire simply because you're playing Russian Roulette with which IP you're actually going to be exchanging RTP with. But I digress. Frankly, For inbound I don't think you'll ever see this. You could probably do something similar (No modification of SDP) with Kamilio I suspect. But I'm not sure what this would really do for you in your config.
Or maybe there is even a module available for kamailio that synchronizes active SIP calls between multiple kamailio installations?!
The Asterisk'es on all sites don't have anything to do with anycast, they just have public IPs that are local to the corresponding site, so they are unique.
Yeah, I meant to say Kamailio in my original email. I know asterisk complains when it gets RTP for a call it doesn't have. No idea if Kamilio is the same. But I imagine it would. Anycast is great for UDP based stuff as long as you can keep it all stitched together. I'd also be curious to see if you run into a route flap problem between a carrier and yourself at some point. Where the route switches between your physical sites. However, You could quickly turn some BGP knobs and mitigate that. Thanks for sharing your Config and thoughts with the community, It's given me some Lab Ideas! _______________________________________________ VoiceOps mailing list VoiceOps at voiceops.org https://puck.nether.net/mailman/listinfo/voiceops

Hello, On Tue, Feb 07, 2017 at 01:58:29PM -0500, Nick Olsen wrote:
Honestly, I haven't done a packet capture on it. But I don't think it's technically a reinvite.
The term "reinvite" refers to one kind of scenario and one kind only: an in-dialog SIP INVITE, as opposed to an initial SIP INVITE request. What we experience and user agents present as a "call" is a SIP "dialog" underneath. This is established by the initial INVITE transaction and confirmed by the end-to-end ACK. The RFC 3261 ? 12 rendition is rather nebulous, but conveys the idea that it's a piece of short-term state, akin to a dynamic "circuit" instantiated in a circuit-switched call: A key concept for a user agent is that of a dialog. A dialog represents a peer-to-peer SIP relationship between two user agents that persists for some time. The dialog facilitates sequencing of messages between the user agents and proper routing of requests between both of them. The dialog represents a context in which to interpret SIP messages. In-dialog requests are scoped to the current dialog, and are indicated as such by the presence of a To 'tag' attribute. This includes requests such as REFERs, BYEs (hangups), and in-dialog INVITEs (reinvites). Reinvites are a little special because they are considered "target refresh requests". They can update certain dialog parameters (e.g. the remote target URI, aka the Contact URI of either peer) as well as renegotiate media attributes with a new SDP offer-answer exchange. This is where the terminological confusion most often comes in. Reinvites don't intrinsically have anything to do with media, nor does the nature of a media renegotiation within a dialog inherently prescribe or prohibit "direct" or "peer-to-peer" media flows. Reinvites are used to, among other things, "hand off" media in this manner, but they can also be used to switch codecs, alter packetisation duration, switch calls to a different media gateway, etc. The term "reinvite" is not synonymous with, and has no inherent connection to, direct media exchange. Reinvites are just reinvites. They do lots of things.
In my scenario, I think it's just passing SDP untouched. So the media IP's are intact and causes the switch to establish RTP directly. (I also see the ULC's User agent (So much sonus)) This can at times be a real pain when trying to capture the traffic on the wire simply because you're playing Russian Roulette with which IP you're actually going to be exchanging RTP with.
Assuming you don't have this problem solved in some other fashion, consider sngrep: https://github.com/irontec/sngrep When I discovered it in 2016, I realised my entire professional life prior to that point was just a giant waste.
But I digress. Frankly, For inbound I don't think you'll ever see this. You could probably do something similar (No modification of SDP) with Kamilio I suspect.
Kamailio is a SIP proxy, and does not modify SDP, by its very nature. Exceptions to that rule generally involve the various outboard RTP relays that can be used with it, such as rtpproxy and RTPEngine.
Or maybe there is even a module available for kamailio that synchronizes active SIP calls between multiple kamailio installations?!
There are numerous ways to accomplish that. Keep in mind that Kamailio's statefulness is really at the transaction level?that's what "stateful SIP proxy" means. Its 'dialog' module provides dialog awareness, but there isn't really much to synchronise. Synchronising knowledge of "calls" among Kamailio installations is not an especially challenging technical problem, but nor does it provide the redundancy outcomes you may hope for. Kamailio will happily pass in-dialog requests and their replies statelessly anyway, based on fundamental SIP message properties. It doesn't check whether they correspond to a dialog it knows about. Synchronising in-flight _transactions_ is hard.
Yeah, I meant to say Kamailio in my original email. I know asterisk complains when it gets RTP for a call it doesn't have. No idea if Kamilio is the same. But I imagine it would.
Kamailio is a SIP proxy, and does not receive or handle RTP in any way. Kamailio + anycast is a well-explored problem. It mostly works, except for cases where messages within in-flight _transactions_ (not dialogs!) go to a different Kamailio instance than the one that processed the transaction-initiating request. -- Alex -- Alex Balashov | Principal | Evariste Systems LLC Tel: +1-706-510-6800 (direct) / +1-800-250-5920 (toll-free) Web: http://www.evaristesys.com/, http://www.csrpswitch.com/

On Tue, Feb 07, 2017 at 05:08:51PM -0500, Alex Balashov wrote:
Synchronising knowledge of "calls" among Kamailio installations is not an especially challenging technical problem, but nor does it provide the redundancy outcomes you may hope for. Kamailio will happily pass in-dialog requests and their replies statelessly anyway, based on fundamental SIP message properties. It doesn't check whether they correspond to a dialog it knows about.
I should add that RTPEngine can synchronise RTP flow state via Redis, and this is used in production today. So if the goal is fairly seamless RTP failover, that's a solved problem in FOSS land. Freeswitch has a 'recover' facility that does something similar for dialog+media flow state. -- Alex -- Alex Balashov | Principal | Evariste Systems LLC Tel: +1-706-510-6800 (direct) / +1-800-250-5920 (toll-free) Web: http://www.evaristesys.com/, http://www.csrpswitch.com/
participants (7)
-
abalashov@evaristesys.com
-
anakaoka@trinet-hi.com
-
ivan.kovacevic@startelecom.ca
-
nick@flhsi.com
-
pete@fiberphone.co.nz
-
universe@truemetal.org
-
voipjacob@gmail.com