Galaxy IV failure and AT&T Frame Relay outagewriting

educationmediainternet-policylaborlibrariestelecommunicationsrrecommerceforwarded-contentauto-importedrre-post
1998-05-22 · 7 min read · Edit on Pyrite

Source

Automatically imported from: http://commons.somewhere.com:80/rre/1998/Galaxy.IV.failure.and.AT.html

Content

This web service brought to you by Somewhere.Com, LLC.

Galaxy IV failure and AT&T Frame Relay outage

``` ---

This message was forwarded through the Red Rock Eater News Service (RRE). Send any replies to the original author, listed in the From: field below. You are welcome to send the message along to others but please do not use the "redirect" command. For information on RRE, including instructions for (un)subscribing, send an empty message to rre-help@weber.ucsd.edu

---

Date: Wed, 03 Jun 1998 09:19:13 -0400 From: "David S. Isenberg" Subject: DIVERSITY AND RELIABILITY -- SMART Letter #8

[...]

!@#$%^&()!@#$%^&()!@#$%^&()!@#$%^&()!@#$%^&()!@#$%^&()

---

SMART Letter #8 - May 22, 1998 For Friends and Enemies of the Stupid Network Copyright 1998 by David S. Isenberg This document may be redistributed provided that the 11 lines containing this notice accompany it. isen@isen.com -- http://www.isen.com/ -- 1-888-isen-com It takes SMART people to design a Stupid Network

---

!@#$%^&()!@#$%^&()!@#$%^&()!@#$%^&()!@#$%^&()!@#$%^&()

DIVERSITY AND RELIABILITY

Flash! Satellite Outage: " . . . not only were medical professionals, repair people and stock watchers left hung out to dry, but CBS, Reuters, NPR, UPI and other news organizations were left looking for backup. By all accounts, CBS had seamless backup, but NPR was in the middle of broadcasting 'All Things Considered' and had to switch to alternate satellites, ISDN, phone bridges and -- surprise -- RealAudio to get its feed to local stations." (Industry Standard Media Grok, May 21, 1998.)

Two recent network failures serve as "emblematic events" for the new telecommunications: (1) the PanAmSat Galaxy IV satellite outage, which struck at 6PM EDT on May 19, and (2) AT&T's Frame Relay Network shutdown of several weeks ago. Both drew attention to the pervasive, underlying dependence of civilization-as-we-know-it upon single specific communications links.

There are lessons here. "Learn From History, or . . . " kind of lessons. For example: 1. Systems that are carefully crafted and managed by the "reliability, reliability, reliability" crowd to be 24-by-7 and fault-free, aren't. 2. If you have not second-sourced your data transport yet, do it now. 3. The Year 2000 is coming, better make that triple- sourced.

Humanity has an amazing inability to plan. Not too many generations ago, when our relatives lived by hunting and gathering, the inability to plan for the next season meant death. Planners survived. The clueless died. But today, Homo Sapiens eats at McDonalds - for the moment, planning and survival are not strongly linked.

Inability to plan made news during the big United Parcel Service strike of 1997. Anybody that cared to look could have seen the strike coming weeks before it occurred. When it hit, a school supply company made news. Its entire revenue stream hinged on a single end-of-summer shipment. UPS was its only shipper. Yet, in several poignant interviews with the owner, no happy-talk reporter ever asked, "Before the strike, did you ever think what would happen if it occurred during your critical week?"

PHYSICAL DIVERSITY

In the old days, before telecom competition, Ma Bell was into Physical Diversity. Physical Diversity meant that there were several alternate routes, each with different geography, different technology, and different physical infrastructure. For example, a phone call from New York to San Francisco, using the modern technologies of the 80s, might have traveled underground by cable, or hopped line- of-sight from microwave tower to microwave tower, or made one long leap to a satellite and another to Earth again.

Today, in the era of competition, telcos want to be "Low Cost Providers." Physical Diversity is an "unnecessary expense." They want to skinny down, remove redundancy, reduce inventory, refine processes - find the best solution and stick with it. They don't need no steenkin' diversity.

Fiber has become very reliable, and SONET makes it even more so. This raises the reliability coefficient, but does not make it 1.0. Too bad the whole system isn't just rings of passive glass. There are all kinds of hardware and software components to light the fiber, set up calls, parse headers, route data, relay frames, and monitor the system. There are systems for connecting customers, and systems for interconnecting with other networks. There are systems to check on the other systems.

The result is that a complex, linearly engineered system - designed to be 99.9999 percent reliable - ISN'T. It is still a chaotic, adaptive system, even though it wasn't designed to be.

ONUS ON THE CUSTOMER

The onus for Physical Diversity is on the customer. That's why, in the era of telecom competition, business critical operations demand a couple of interexchange carriers, an office phone and a cell phone, a desktop and a laptop, a cable modem and a dial-up, and at least two Internet Service Providers (ISPs). It is a good thing that infrastructure is getting cheaper.

Physical Diversity, along as many dimensions as possible, is the most reliable route to reliability. Note that Galaxy IV's backup system failed too! Any time that parallel systems share components, the event that takes one system out is also likely to bring the second system down. If it's a software bug, and both primary and backup systems are running the same faulty code, too bad. If radio interference is the problem, and both systems use the same frequencies or modulations, sorry. If you rely on satellites and there are solar storms or meteor showers, look out. If you get primary and backup from the same company, and the company fails (or goes on strike, or . . .), remember you read it here first.

When the onus for Physical Diversity is on the customer, the customer needs alternatives. That's a problem when 90% of computers run one company's operating system, no matter how innovative that company might be. And it's a problem when a single telco controls local telephone service, no matter how big the telco's territory.

EXODUS TO THE PROMISED BANDS

Exodus Communications is a Stupid company - they are into over-provisioning and Physical Diversity. They call themselves an "Internet Data Center." Actually Exodus has about 8 Internet Data Centers around the world. They'll buy data feeds - DS-3, OC-3, or more - from any carrier that'll sell them. UUNet, GTE, Sprint - Exodus buys it all. Their customers are ISPs. An ISP gets a cage on the Exodus floor, data feeds to order, and a Chinese menu of add-on services.

In one Exodus customer configuration, the ISP has two redundant racks. Rack #1 gets a primary 100BaseT feed from, say, UUNet and a secondary, totally redundant feed from, say, Sprint. Rack #2 gets its primary from Sprint, and its secondary from UUNet.

Exodus maintains a 200% headroom policy. It attempts to have twice as much bandwidth as it needs in its busy hour. Its 200% and Physical Diversity policies extend to electric power and heating-cooling too. It has contracts with two different power companies, and it has a back-up generator on the roof and another in the basement. The rooftop generator has a different fuel tank than the basement one. There are four air conditioners, one in each corner of the data center. Each of the four electrical feeds supplies one AC. And so on.

A facilities based telco can't do this. (Imagine AT&T advertising that its redundancy is due to secondary facilities by MCI!) Exodus can, because it buys facilities from all comers. Reliability emerges at a different point in the value space.

Exodus is an excellent example of what SMART Person Paul Saffo calls "disinterREmediation." Once upon a time, telcos mediated Physical Diversity for their customer, but competition and the resulting drive to lower costs made it prudent for them to stop. Customers can still buy Physical Diversity in the age of telecom competition, but they have to do it piecemeal . . . one from GTE, one from MCI, etc. Exodus REmediates by providing one-stop shopping for Physical Diversity. The whole process is called "disinterREmediation." Really.

YEAR 2000

Gentle Reader, if you are still asking what is the relationship of the Y2K Problem to The Stupid Network, I don't think I can help further. To unsubscribe, send me a brief message to that effect.

For the rest of us, let's take the lessons of Physical Diversity home. We could ask ourselves now what might happen if our communications, our food, our electricity, our heat, our transportation, our money, our employment are disrupted. Physical Diversity is part of the solution space. It can protect individuals and defined groups from potential technological failures. I wonder whether Exodus will rent cages for living spaces next year :-)

Physical Diversity offers much less protection against the kinds of sociological phenomena that could plausibly occur when physical systems are disrupted. I have no idea where this discussion will lead, but it is time to begin talking . . . .

David I

---

<>

<>

--------------------isen.com---------------------- David S. Isenberg isen@isen.com d/b/a isen.com http://www.isen.com/ 18 South Wickom Drive 888-isen-com (anytime) Westfield NJ 07090 USA 908-875-0772 (direct line) 908-654-0772 (home) --------------------isen.com---------------------- -- Technology Analysis and Strategy -- Rethinking the value of networks in an era of abundant infrastructure. --------------------isen.com---------------------- ```

This web service brought to you by Somewhere.Com, LLC.