[VoiceOps] Adtran TA900 timing problems

Nov. 4, 2013

      On Mon, 4 Nov 2013, Jay Hennigan wrote:
...
On 11/4/13 9:41 AM, Matt Yaklin wrote:
...
I forgot to mention that the cisco T1s on the DS3 use line
timing in my setup. So line and line on both sides. The T1
and the Adtran.
Don't do that then.  :-)
Every T1 span must have exactly one source of clocking.
Yea, I saw that as I was typing out my email but did not
change it. I just did now. Why it was set that way? I am unsure.
Probably because a mistake was made and then duplicated. Then
we probably followed the good old if it aint broken dont fix it.

I am sure I will be punished with a few support calls over the
next day for touching it. ;-)

But... see below...
...
My recommendation is to configure the T3 controller on the 7206VXR to
internal clocking on all of its T1s, and clock the Adtrans from T1 0/1
(line).
If both sides are set to internal, you'll get clock slips as the
oscillators drift slightly with respect to each other.
If both sides are set to line, then there's no reference clock.  The
line will probably sync up initially as the free-running T1 will be
close enough to sync.  Over time the frequency will drift up or down
until one side can no longer sync.  In some cases they'll recover on
their own, and in others you need to shut/no shut or otherwise force a
resync.
So, don't do that.
But things get even more complicated then that if you wish to be
a clocking purist. Here we are talking about timing over a DS3. To
some that is ? humurous ?.

This is a post from the Taqua mailing list from Dave Long. Pretty
interesting if you are talking about timing and DS3s! I obviously had
a mistake in my config but even when corrected it is still, technically,
wrong?

I would love to know what people's opinions are of the post below.
I bet Paul has read it a few times now.

------------------------------------------------------------------------------
david.long at taqua.com

Springtime for Switching

Ahhh, its springtime!  In the switch supplier business, that means two 
things:
1) More release upgrades (since the holiday season is well behind us, we 
get larger numbers of upgrade requests)
2) GR-303/PRI issues/outages due to network clocking problems.

Number 2 is somewhat related to number 1, but not always...  But, that's 
what I'm here to talk about...

When we deployed some of our first switches, we saw network clocking 
reference problems due to not getting a valid clock reference to the 
switch.   We changed our installation process to check for this early in 
the deployment process.  The T7000 has a lot of logic to constantly 
monitor and qualify multiple clock sources, and to switch to secondary and 
tertiary references, issue logs/events/alarms as a result.  So, we can 
spot if we don't have a valid timing reference coming to the switch.  We 
will typically recommend a BITS clock if we see issues.

We now have 6 years of experience with the Broadband Interface Card (BIC). 
And with the BIC came a different deployment model.  Not only were the 
DS3s deployed to the carrier side of the switch, they were also deployed 
on the access side as well.  The Digital Loop Carriers (DLCs) are 
typically in remote areas, take DS1s muxed up with other DS1s with M13s to 
a DS3, many times this transported via fiber/SONET transport to the 
central office, where the DS3 is demuxed to DS1 and recombined with other 
DS1s, remuxed to DS3s and connected to the T7000 BIC card.  The DLCs will 
then take their network clocking reference off of the DS1 that leads 
(ultimately) to the T7000.

  So, what's the problem?  It's that a DS3 is not a valid clock transport. 
That includes DS1's embedded within DS3's.  Now, this is typically the 
time where the confusion sets in...  This is not a "Taqua rule" or a 
"T7000 limitation."  This is clearly laid out by Telcordia in the network 
clocking standards.  But, the Telcordia rule is not just some bureaucratic 
decision, it's basically a statement of engineering and physics.  The 
problem is that a DS3 is not a synchronous transport, it is asynchronous. 
When DS1s are packed into a DS3, transported and unpacked, the integrity 
of the DS1 is not the same on the output as it is on the input.  It is 
close, but from a timing perspective, the process adds jitter and wander. 
Throw in there Fiber/SONET transport, and bigger problems happen (pointer 
adjustment/justification in particular).  If I said "why not transport a 
clocking reference as part of RTP in a VoIP network?"  People would laugh. 
But, DS3s have some of the same characteristics (asynchronous transport). 
(2011 revision...  IEEE 1588 is a protocol that does work over IP networks 
to transport a clock synchronous...  My point is still valid in that even 
1588 does many things to adjust to jitter/wander and was designed to work 
in an asynchronous network, where TDM clocking was not).

Keep in mind that clock reference transport requires much more precision 
than voice traffic transport.   Jitter and wander in the voice path may be 
barely noticeable, but if two "ends" of the span are using this for 
clocking, the results can be unpredictable.  All bets are off when it 
comes to High-Level Data Link Control (HDLC) channels.  Being "a little 
bit off" may mean that HDLC links do not want to sync up at all.  For 
GR303, these links are the TMC and the EOC.  All DLCs need a network 
reference for clocking and that can't be an embedded DS1 from a DS3 
(unless your M13 mux injects a clock signal on the DS1, but these are 
rare).  For PRI, these links are D-channels.

You may be saying "but Dave, I've been clocking from embedded DS1s for 
years and I've never had a problem?"  That is the reason I'm writing this 
note.  A phenomenon called plesiosynchronous timing occurs in many cases. 
This means that the clocking in a network can "coincidently" or "almost" 
be synchronized.  Your network can work PERFECTLY, for years, right until 
it does not work at all.  Then, it seems like nothing you do can get the 
two ends talking to each other again.  Eventually, after power cycling 
muxes, DLCs, switches, etc., finally the ends will talk to each other. 
The blame will typically lay with whatever the last box was that was power 
cycled, kicked, or yelled at rudely.

The other item that conspires in this problem is that DLCs typically have 
little if any clocking qualification logic and circuitry.  So, even if 
they would detect slips or sags in the timing, many are hard pressed to 
report it.

Back at Tekelec, when I had responsibility for R&D on the T9000 and T7000, 
we had an 8 hour T9000 outage of this nature.  We had a team of a dozen 
engineers looking at the software, another dozen TAC engineers looking at 
all the provisioning.  There wasn't anything that seemed to help.  The 
RDTs were power cycled, the switch was rebooted.  Finally some of the 
muxes were rebooted, at the same time someone typed the "ls" (directory 
listing) command and everything synced up and worked like nothing 
happened.  It took weeks to convince this customer that the "ls" command 
was not the reason for the fix.  That customer did not believe that the 
network clocking issue was to blame.  Some months later, after we formed 
the T7000 business unit and I no longer had the T9000, I walked by TAC one 
day when they were fighting the same problem with the same customer.  I 
noticed an almost frantic typing of "ls" by the customer to no avail...

How can this "plesiosynchronous" thing work at all?  Well, if you have one 
of those "atomic clock" alarm clocks or wristwatches, you have a good 
example of how network synchronization works.  Your clock or watch has its 
own quartz oscillator to keep time.  Every so often it looks at the atomic 
clock radio broadcast and resynchronizes.  If the atomic clock signal gets 
a little "loopy" between updates, no one knows.  Think of it as the 
network gets "into a rhythm" when it is plesiosynchronized.  Everything is 
fine until something upsets that rhythm.

"So, what upsets this "rhythm" in my network," you ask?  A number of 
things...

  *   Anytime any piece of gear is power cycled or rebooted (which 
typically happens anytime one of these elements is upgraded).   If an 
upgrade was being performed, the thought is "something in the new software 
load broke my network."

  *   Another cause is anytime significant change happens to the amount of 
traffic over the DS3s.  In particular, putting more DS1s into the DS3 can 
cause disruption.  This changes the packing in the DS3 and will change the 
rhythm.

  *   And, the most hideous of them all is temperature fluctuations in a 
collocation.  Remember the quartz clock I was talking about?  Most/all 
DLCs use quartz oscillators of their own.  The problem is quartz 
oscillators are very susceptible to changing their frequency based on 
temperature.  And what happens in the spring?  Many locations go through a 
heating at night, cooling during the day HVAC cycle, where we see broad 
temperature fluctuations in the office.  There is one particular story 
where timing slips occurred every time the CO tech used the door to go 
outside for a smoke break.

So, while I've talked mostly about DLCs, the same issues can be found with 
any signaling that uses an HDLC channel.  In particular, we've seen this 
with PRI and SS7.  But, mostly the problem is with GR303.

So hopefully, if you've read this far, I've convinced you that this is a 
problem.  So, if you have this situation in your network, what can you do 
about it?

  *   If you are using fiber equipment, many have the special capabilities 
for a separate clocking feed and transport.

  *   Another is to add BITS timing at each of your locations.  There are 
BITS vendors that have a variety of "wireless" references that work very 
well.  Anything from GPS based, to taking a CDMA clock signal over the 
air.  The only downside is that they can be expensive (a few thousand 
dollars) and if you don't have roof access for a GPS and don't have a CDMA 
signal, they won't work.

  *   "Rent" a valid clocking source from your collocation provider. 
Hopefully, there is a reasonable charge for this.

  *   We have heard of an inexpensive network regeneration device that 
takes a DS1 signal coming out of a mux and is able to "reformulate" the 
network reference before feeding into the DLC.  Maybe someone on the 
mailing list has heard of this or uses one and can report on it.

We have worked with a few "timing consultants" that can also analyze your 
network.  If you think have a need for this service, Larry Cooley 
(larry.cooley at taqua.com<mailto:larry.cooley at taqua.com>) can refer 
you to a couple of different shops.

"So Dave, I've read this far and I'm still awake!  Do you have any 
references that I can read to help me with this insomnia thing?"  Why yes, 
I do:

  *   Seriously, the one I like the best is "Engineering Networks for 
Synchronization, CCS7, and ISDN" by P.K. Bhatnagar, IEEE Press, 1997 (Our 
friend Whit Reeve is the Series Editor).  I refer to this one a lot.
  *   If you have access to Bellcore (Telcordia) Standards:
     *   "Clocks for the Synchronized Network: Common Generic Criteria." 
BELLCORE TA-NWT-001244, issue 2, Nov. 1992.
     *   "Digital network synchronization plan."  BELLCORE Technical 
Advisory TA-NWT-000436, issue 2, June 1993.
  *   American National Standard for Telecommunications.  Synchronization 
Interface Standard.  ANSI T1.101-1994.

Hopefully, this information has been useful.  If you have any questions, 
send me email directly or reply to this message.

Thanks,
Dave
------------------------------------------------------------------------
...
--
Jay Hennigan - CCIE #7880 - Network Engineering - jay at impulse.net
Impulse Internet Service  -  http://www.impulse.net/
Your local telephone and internet company - 805 884-6323 - WB6RDV
_______________________________________________
VoiceOps mailing list
VoiceOps at voiceops.org
https://puck.nether.net/mailman/listinfo/voiceops

[VoiceOps] Adtran TA900 timing problems

myaklin＠g4.net