
I tend to see a 503 as a symptom of a critical situation (per cpu/cps/license threshold breach). And I would consider 503 spikes a decent canary for a sip trunk coal mine. Others view 503s as business as usual, specifically in LCR arrangements, and don't alarm/study them What's the general idea behind industry best practice? E.g. 503 simply signifies another route should be taken, or 503 is cause for a remedy? Sent from my Windows 10 phone

Hello, We frequently encounter carriers that use the 503 for anything they cannot classify otherwise. So, generally, unless it is OUR system, we treat 503 as a "Progress to another route". In our system, we use our best efforts to classify in a more detailed way, using other codes where they can be used. Hope this helps, *Glenn @ VDO* On Tue, Dec 13, 2016 at 5:21 PM, <slocoach at gmail.com> wrote:
I tend to see a 503 as a symptom of a critical situation (per cpu/cps/license threshold breach). And I would consider 503 spikes a decent canary for a sip trunk coal mine. Others view 503s as business as usual, specifically in LCR arrangements, and don't alarm/study them
What's the general idea behind industry best practice? E.g. 503 simply signifies another route should be taken, or 503 is cause for a remedy?
Sent from my Windows 10 phone
_______________________________________________ VoiceOps mailing list VoiceOps at voiceops.org https://puck.nether.net/mailman/listinfo/voiceops

On Tue, Dec 13, 2016 at 06:04:43PM -0800, Glenn Geller (VDOPh) wrote:
We frequently encounter carriers that use the 503 for anything they cannot classify otherwise.
So, generally, unless it is OUR system, we treat 503 as a "Progress to another route".
My $.02 is that there is no industry "best practice", as widespread use of 503 is a "worst" (or "pessimal") practice. As far as I can tell, 503 (and to a lesser extent, alarmingly, 603) has got to be a catch-all epithet for any sort of call completion failure. The short duration-orientated service providers are the worst offenders, mostly because--as someone pointed out--they aim to obscure any and all possible upstream vendors' failure cause codes with a blanket 503 in the interest of secrecy. However, even relatively respectable members of the food chain will send 503 rather generically in a variety of cases. I would definitely treat it as a "advance to next route" cause. It is very seldom that a 503 is caused by anything from which it can reasonably follow that the destination cannot be reached through any other carrier, and in any case, that's not what 503 is for. As a Class 4/LCR software platform vendor, we've tried to behave correctly and send nuanced SIP cause codes that speak as closely as possible to the genuine nature of the failure, sticking to ISUP mappings where possible and returning 503 only in cases where the call is not processed for business reasons (e.g. unprofitable routes) or some other miscellaneous reason. What we have found is that many customers balk at this, though the short duration & call centre traffic constituencies' protests have been especially howling. Apparently they deal with a lot of customer call centre equipment that expects a 503 in almost all cases and can't properly route-advance on most other codes. So, at least to some extent, as this bad behaviour has crystallised into established practice, the expectation on the endpoint side has also evolved to expect opaque 503s in case of almost all kinds of failure. Dumb, dumb, dumb. Alas, regressions to the stupidest possible common denominator are common in the technology world. -- Alex -- Alex Balashov | Principal | Evariste Systems LLC Tel: +1-706-510-6800 (direct) / +1-800-250-5920 (toll-free) Web: http://www.evaristesys.com/, http://www.csrpswitch.com/

+1 on this response. We watch our metrics closely so we can measure when something begins to move in the wrong direction. We ultimately had to stop paying attention to 503's because of their casual interpretation of the RFC's intention of 5xx responses. It seems like we need a new "Not Now I Have a Headache" response code. In the meantime, we route advance. On Dec 14, 2016, at 00:17, Alex Balashov <abalashov at evaristesys.com> wrote: On Tue, Dec 13, 2016 at 06:04:43PM -0800, Glenn Geller (VDOPh) wrote:
We frequently encounter carriers that use the 503 for anything they cannot classify otherwise.
So, generally, unless it is OUR system, we treat 503 as a "Progress to another route".
My $.02 is that there is no industry "best practice", as widespread use of 503 is a "worst" (or "pessimal") practice. As far as I can tell, 503 (and to a lesser extent, alarmingly, 603) has got to be a catch-all epithet for any sort of call completion failure. The short duration-orientated service providers are the worst offenders, mostly because--as someone pointed out--they aim to obscure any and all possible upstream vendors' failure cause codes with a blanket 503 in the interest of secrecy. However, even relatively respectable members of the food chain will send 503 rather generically in a variety of cases. I would definitely treat it as a "advance to next route" cause. It is very seldom that a 503 is caused by anything from which it can reasonably follow that the destination cannot be reached through any other carrier, and in any case, that's not what 503 is for. As a Class 4/LCR software platform vendor, we've tried to behave correctly and send nuanced SIP cause codes that speak as closely as possible to the genuine nature of the failure, sticking to ISUP mappings where possible and returning 503 only in cases where the call is not processed for business reasons (e.g. unprofitable routes) or some other miscellaneous reason. What we have found is that many customers balk at this, though the short duration & call centre traffic constituencies' protests have been especially howling. Apparently they deal with a lot of customer call centre equipment that expects a 503 in almost all cases and can't properly route-advance on most other codes. So, at least to some extent, as this bad behaviour has crystallised into established practice, the expectation on the endpoint side has also evolved to expect opaque 503s in case of almost all kinds of failure. Dumb, dumb, dumb. Alas, regressions to the stupidest possible common denominator are common in the technology world. -- Alex -- Alex Balashov | Principal | Evariste Systems LLC Tel: +1-706-510-6800 (direct) / +1-800-250-5920 (toll-free) Web: http://www.evaristesys.com/, http://www.csrpswitch.com/ _______________________________________________ VoiceOps mailing list VoiceOps at voiceops.org https://puck.nether.net/mailman/listinfo/voiceops

In my experience, the answer is yes to both. It depends who or what you are dealing with and the expectations for that particular service. Overall I'd say if you have backup carriers, first route advance then decide if it's worth your time to seek a remedy. There's also a difference between immediate 503 and 503 with only a few seconds ring time. The latter is usually ticket worthy since you wouldn't typically route advance after a ring. Here are some examples of what I've seen: A small/medium enterprise product, like a SIP trunk or hosted PBX. There's an expectation that calls to any valid number will connect or return busy, so a 503 would be worthy of a trouble ticket to determine the cause. This assumes the service provider can reliably send things like 404 Not Found and 486 Busy Here when appropriate. A wholesale conversational SIP trunk might 503 anything that isn't a 200 OK to protect you from downstream carriers who return 404, 486, etc. when they shouldn't, and to keep you from seeing things like "Insufficient Balance" or "Payment Required" coming back from their LCR. You might open a ticket if they are your carrier of choice and it's worth your time, otherwise route advance. Short duration/dialer products are expected to produce a lot of 503, and a handful of failed calls isn't going to impress anyone. If your overall ASR through a particular carrier drops with no other explanation then it could be ticket worthy. On Tue, Dec 13, 2016 at 5:22 PM <slocoach at gmail.com> wrote:
I tend to see a 503 as a symptom of a critical situation (per cpu/cps/license threshold breach). And I would consider 503 spikes a decent canary for a sip trunk coal mine. Others view 503s as business as usual, specifically in LCR arrangements, and don't alarm/study them
What's the general idea behind industry best practice? E.g. 503 simply signifies another route should be taken, or 503 is cause for a remedy?
Sent from my Windows 10 phone
_______________________________________________ VoiceOps mailing list VoiceOps at voiceops.org https://puck.nether.net/mailman/listinfo/voiceops

One of the issues with SIP in an ISUP world is that the SIP Status-Codes are clumsy and inarticulate for real-world telephony. Take a look at RFC 3398 Section 7.2.4.1 https://tools.ietf.org/html/rfc3398#section-7.2.4.1 There you will see all sorts of things map to 404 and 503. Yes, there is a Reason header which can return q.931 Cause Codes back to SIP, but most sip routing equipment does not look that deep into a response when making the "what-to-do-next" decision. Generally, I want "quality" providers to use 404 /only/ for ISDN Cause 1. For just about anything else /other than /Cause 17 I would want a 503, and then I can route advance. Calvin E. wrote below about the difference between immediate 503 and 503 with only a few seconds ring time. IMNSHO, after any sort of alerting indication (18x, with or without SDP) it is generally unforgivable to to return anything /other than/ 200 OK after a Connected indication, unless calling side sends a Cancel, in which case the call is Abandoned, and the appropriate response is Request Terminated. In a real world environment, under what conditions would "Service" suddenly be "Unavailable" after the far end was allegedly Alerting? Microsoft Lync (now Skype for Business) used to return premature "alerting" indications to mask outrageous PDD in their systems. That had the bizarre effect of causing strange things like "ring tone" (or audible ring for your AT&T types) /followed by/ busy signal. There is another potential gotcha' especially with some of the bottom-tier service providers. In addition to frequency of 503 response from your providers, watch the /average /and /peak/ interval from *100 TRYING* from a provider and 503 Service unavailable. I have seen some providers take up to 20 seconds to decide that they just can't handle a call. If a provider can't handle a call, it shouldn't take very long for them to "figure that out." My US$ 0.02. John S. Robinson Communichanic Consultants, Inc. On 12/13/2016 20:34, Calvin E. wrote:
In my experience, the answer is yes to both. It depends who or what you are dealing with and the expectations for that particular service. Overall I'd say if you have backup carriers, first route advance then decide if it's worth your time to seek a remedy.
There's also a difference between immediate 503 and 503 with only a few seconds ring time. The latter is usually ticket worthy since you wouldn't typically route advance after a ring.
Here are some examples of what I've seen:
A small/medium enterprise product, like a SIP trunk or hosted PBX. There's an expectation that calls to any valid number will connect or return busy, so a 503 would be worthy of a trouble ticket to determine the cause. This assumes the service provider can reliably send things like 404 Not Found and 486 Busy Here when appropriate.
A wholesale conversational SIP trunk might 503 anything that isn't a 200 OK to protect you from downstream carriers who return 404, 486, etc. when they shouldn't, and to keep you from seeing things like "Insufficient Balance" or "Payment Required" coming back from their LCR. You might open a ticket if they are your carrier of choice and it's worth your time, otherwise route advance.
Short duration/dialer products are expected to produce a lot of 503, and a handful of failed calls isn't going to impress anyone. If your overall ASR through a particular carrier drops with no other explanation then it could be ticket worthy.
On Tue, Dec 13, 2016 at 5:22 PM <slocoach at gmail.com <mailto:slocoach at gmail.com>> wrote:
I tend to see a 503 as a symptom of a critical situation (per cpu/cps/license threshold breach). And I would consider 503 spikes a decent canary for a sip trunk coal mine. Others view 503s as business as usual, specifically in LCR arrangements, and don't alarm/study them
What's the general idea behind industry best practice? E.g. 503 simply signifies another route should be taken, or 503 is cause for a remedy?
Sent from my Windows 10 phone
_______________________________________________ VoiceOps mailing list VoiceOps at voiceops.org <mailto:VoiceOps at voiceops.org> https://puck.nether.net/mailman/listinfo/voiceops
_______________________________________________ VoiceOps mailing list VoiceOps at voiceops.org https://puck.nether.net/mailman/listinfo/voiceops

On Tue, Dec 13, 2016 at 09:14:13PM -0600, John S. Robinson wrote:
There is another potential gotcha' especially with some of the bottom-tier service providers. In addition to frequency of 503 response from your providers, watch the /average /and /peak/ interval from *100 TRYING* from a provider and 503 Service unavailable. I have seen some providers take up to 20 seconds to decide that they just can't handle a call. If a provider can't handle a call, it shouldn't take very long for them to "figure that out."
Aye, this kind of bad behaviour--sending a 100 Trying and then sitting on the call for a while--is unfortunately common from the bottom-feeders. In some cases it can indicate that a provider is being DoS'd and don't have the capacity to process the call, but often it's just poor service. It seems to be especially common on garbage international routes. When they're not giving you a lump of FAS, they send 100 Trying and then give you tons of PDD and uncertainty as they fail through 80 upstream vendors on their side, only to ultimately tell you they can't complete the call. A lot of their upstreams are equally trashy and do the same thing, resulting in a rather worst-case scenario,. The only solution there is to use aggressive PDD-based failover. Don't let any vendor send you 100 Trying without some sort of progress indication feedback after ~3 sec, and at most ~5 sec. If that happens, CANCEL the call leg and move on. This is something that came to be supported in our product at the request of many customers, even those (the majority) who do traditional wholesale SIP termination and/or retail or enterprise SIP trunking. -- Alex -- Alex Balashov | Principal | Evariste Systems LLC Tel: +1-706-510-6800 (direct) / +1-800-250-5920 (toll-free) Web: http://www.evaristesys.com/, http://www.csrpswitch.com/
participants (6)
-
abalashov@evaristesys.com
-
calvine@gmail.com
-
ggeller@vdo-ph.com
-
jsr@communichanic.com
-
peeip989@gmail.com
-
slocoach@gmail.com