[VoiceOps] Plivo Offline, Domain Expired; Out of Business?

April 24, 2017

      When you know that there is a customer-affecting outage and you can (a) get
ahead of it or prevent it, (b) manage it, and (c) sometimes work around it,
I'd argue that the value to your customers is priceless.

The cost of NOT doing so is also priceless, or at least to your market
value at the time before the issue to the time either (a) everyone forgets
about the failure or (b) you reach $0.

I look at this more as nice-to-have for SLA/Contract arguments but
primarily in place to deliver a great customer experience, anticipating and
measuring failure proactively.

But indeed, we are arguing semantics, and pricelessness is difficult to
measure. ;-)

Beckman

On Sun, 23 Apr 2017, Alex Balashov wrote:
...
Yes, it is good to know when things are down. Where I see people getting
trouble with self-monitoring is making SLA claims to their vendors, or
other financial or contract claims that require ironclad attribution of
blame.
Any piece of information like this is just a tool. It has epistemic
limits. It may be valuable, but it is definitely not priceless.
-- Alex
...
On Apr 23, 2017, at 6:42 PM, Peter Beckman <beckman at angryox.com> wrote:
Still priceless. If our application servers cannot access critical
resources from a carrier, regardless of the cause (network, application
outage on their end, domain went un-registered), I now know it isn't
working.
That still is priceless. Monitoring should tell you what, not why. Blackbox
monitoring (measure the experience customers experience) is wildly more
valuable than whitebox monitoring (a DB query took more than 10 seconds,
once).
Your metrics system should help you answer why, once you know that
something is wrong.
Beckman
Good reading:
https://docs.google.com/document/d/199PqyG3UsyXlwieHaqbGiWVa8eMWi8zzAn0YfcAp...
From a prior Google SRE
...
On Sun, 23 Apr 2017, Alex Balashov wrote:
With regard to the so-called priceless information, I would be careful.
Monitoring vendors onesself is good, but it can lead to the
misapprehension that the blame for an outage lies with the vendor where
it in fact does not, but is instead caused by intermediate routing or
whatever.
In my experience, if used with conceit, such monitoring can cause as many
problems as it solves. It certainly isn't priceless. That depends on the
specific situation.
...
On April 23, 2017 4:27:00 PM EDT, Alexander Lopez <alex.lopez at opsys.com> wrote:
The information you gather by monitoring your vendors is priceless, if
you are able to determine within a short period of time that your
vendor is the problem,you have just saved precious time in identifying
the problem. You can then reroute around the issue, instead of spending
time in looking at your platform
On Apr 23, 2017 3:10 PM, Peter Beckman <beckman at angryox.com> wrote:
Yep, we monitor our vendors. In some cases, better than they monitor
themselves. It's frustrating that they don't/can't/won't, but wanting
them
to change doesn't help our customer experience. So we do it for
them proactively.
Too many outages on our vendor services have been caused by their
ineptitude, such as this Plivo outage. We decided as a result of this
Plivo
ouatge that, while it was only merely annoying to our Operations team
and
didn't result in any customer-facing issues, we could have seen this
coming
and maybe averted this disaster for everyone by adding two config lines
in
our monitoring platform and a process to ensure notification to,
followup
with and closure of the issue with Plivo.
The cost to our Operations team is small, the automated monitoring
costs
nothing, but the impact of knowing that our vendors are having (or will
have) issues before they tell us or know themselves, AND before our
customers complain, improves our customer experience and operational
excellence despite our vendor's failings.
It also gives us a chance to write defensive code to handle the
situations
where the vendor is not meeting their contractually obligated level of
service.
Beckman
...
On Sun, 23 Apr 2017, Keln Taylor wrote:
Just to clarify, you are saying that you monitor the domain and SSL
cert of
your vendors so you can notify them?
That's cool.
Sincerely,
Keln Taylor
870-204-2121
kelntaylor at gmail.com
On Sun, Apr 23, 2017 at 12:31 PM, Peter Beckman <beckman at angryox.com>
wrote:
...
We should all strive to NOT do that. We integrated a once a day
check into
our Monitoring platform that starts warning Operations 30 days
before the
domain expires, and actually pages people starting at 9am on
Weekdays 7
days before if it hasn't been renewed. We had to tweak it for how
our
registrar publishes that information, and we automated renewals so
it
rarely goes off, but when it does we can get in front of it.
We have the same thing in place for our public and internal SSL/TLS
Certificates.
If you are running a business on the web and don't automate
monitoring of
critical infrastructure, you get outages like this. Heck, we started
monitoring the domain and SSL certs of our critical-path dependent
services/vendors since another outage many years ago after an SSL
cert
expired.
Plivo wasn't in our mix, as they aren't critical-path, but they are
now,
and they are still in alarm.  Operations now will be automatically
notified
when we can actually see Plivo again.
Beckman
On Sun, 23 Apr 2017, Gavin Henry wrote:
On 23 April 2017 at 17:31, Alex Balashov <abalashov at evaristesys.com>
> wrote:
>
>>
>> I've done it. Just got distracted, got to fiddling around and
thinkin'
>> bout things, and before I know it, the domain's expired and I'm no
longer
>> on the register of respectable party guests...
>>
>
> Embarrassing. I'm sure I'll do it at some point. :-)
>
>
------------------------------------------------------------
---------------
Peter Beckman
Internet Guy
beckman at angryox.com
http://www.angryox.com/
------------------------------------------------------------
---------------
_______________________________________________
VoiceOps mailing list
VoiceOps at voiceops.org
https://puck.nether.net/mailman/listinfo/voiceops
---------------------------------------------------------------------------
Peter Beckman                                                  Internet
Guy
beckman at angryox.com
http://www.angryox.com/
---------------------------------------------------------------------------
_______________________________________________
VoiceOps mailing list
VoiceOps at voiceops.org
https://puck.nether.net/mailman/listinfo/voiceops
-- Alex
--
Principal, Evariste Systems LLC (www.evaristesys.com)
Sent from my Google Nexus.
_______________________________________________
VoiceOps mailing list
VoiceOps at voiceops.org
https://puck.nether.net/mailman/listinfo/voiceops
---------------------------------------------------------------------------
Peter Beckman                                                  Internet Guy
beckman at angryox.com                                 http://www.angryox.com/
---------------------------------------------------------------------------
---------------------------------------------------------------------------
Peter Beckman                                                  Internet Guy
beckman at angryox.com                                 http://www.angryox.com/
---------------------------------------------------------------------------

[VoiceOps] Plivo Offline, Domain Expired; Out of Business?

beckman＠angryox.com