When your broadband drops, or your local cell tower decides it wants a nap, it's frustrating. What if you're responsible not just for one broadband connection but a copper, fiber and wireless network that stretches across the US? That's the responsibility of the AT&T Network Disaster Recovery (NDR) team, the group responsible for swooping in after natural disasters and significant network failures and bringing AT&T back online. SlashGear was invited to one of their quarterly disaster simulations recently took place at Houston, Texas to see what's involved.
The core strategy of the NDR team is to have portable versions of installed hardware ready to drop in and quickly replace local offices that have been damaged or destroyed. More than $500m worth of telecoms hardware is stored in warehouses across the US, kept running and updated, so that it can be moved into place, flashed with the settings of a local station and, in effect, "becoming" that part of the AT&T network.
To effect that, AT&T have a range of network equipment, much of which is custom-designed or modified. Units range from Suburban trucks called ECVs (Emergency Communication Vehicles) packed with satellite phone equipment, to COLTs (Cell on Light Trucks) and finally command center trucks, with each aspect of a typical local AT&T station replicated in modular form. The target is to have the network reconnected in 168hrs from call-out; for 9/11, they made it in just 53hrs.
The first vehicle in is the Emergency Communication Vehicles, which use a satellite hook-up to quickly give voice & data connectivity. That's followed by the command center, which takes care of management, staffing and supplies.
On the first day, the NDR camp is more like a building site. Each of the trailers and vehicles are secured on the same groundplate, with its own foundation. On the second day (or the evening of the first day, if a 24hr work program is deemed necessary) cabling is laid; that can take anything from eight to 24hrs, depending on how many trailers are deployed. High capacity cabling has actually made this more straightforward, with a modern fiber cable replacing 72 individual coaxial links.
The team moves in expecting nothing in the way of services. The NDR camp has a single, large generator replicating a commercial power supply, but each trailer also has its own individual generator together with backup batteries. Similarly, recharging staff is handled by a dedicated service, with catering and sanitation RVs.
From then, the units are brought in according to the demands of the situation. Short-term interventions are dealt with from smaller, mobile units that are housed in five warehouses across the US, managed from the central AT&T control center in New Jersey. AT&T showed us how they can use a high-frequency 8GHz radio antenna to bridge broken cabling or network trunks, or reconnect with an isolated cell tower; these saw heavy use following 9/11, as the World Trade Center buildings had bristled with cell tower antennas. Where little or no infrastructure is left to connect, COLTs are used; these basically bridge wired or cellphone connections to satellite. In the aftermath of Hurricane Ike, 32 COLTS were deployed.
AT&T tell us that their engineers are constantly working to modify the hardware on hand, and that the NDR system has been evolving since its inception in 1991. A recent example of this is the Combo-Unit, which marries a COLT and a small command center, together with a 850/1900MHz microcell. These are faster response units than many of the other trailers, and can establish AT&T base camp coverage using private SIMs with a roughly quarter-mile range. It's also possible to unbar the network and provide cellular service to emergency workers, such as fire and rescue teams, and in fact they've also loaned complete units to government agencies. Some larger trailers also have their own workshops, for constructing equipment on-the-fly.
Of increasing importance since 9/11 has been the AT&T NDR Hazmat team, a core group of 30 volunteer staff trained to work in hazardous environments. Four sets of hazmat equipment are in position around the US, with the team able to go in alongside on-site emergency personnel for reconnaissance, restoration and recovery. According to the hazmat team we spoke to, their role doesn't just mean Stay Puft Marshmallow Man suits, but can be as simple as respirators or the proper clothing. Often their role is to go into a damaged AT&T facility and video or otherwise record the impact of the disaster; that lets the engineers time to prepare exactly what replacement systems are needed. The team described one such incident in a quarry, where a train crash released a corrosive chemical - the hazmat team could go in report on what might be fixed and what was damaged beyond repair, together with reporting on what might be dangerous to non-hazmat staff.
Everything is recorded and directed using a standardized incident management system, the same as used by the federal government, ensuring there's a common language between on-site emergency teams. AT&T use a web-based tracking system that can be accessed at the disaster zone but also by staff anywhere in the world, reporting to the company's global command center in New Jersey. The particular command center on site during the simulation was actually the same one used at 9/11 and following the impact of Hurricane Ike on Galveston, Texas.
AT&T also have a "Fly In Recovery" unit, which is basically everything a base-station might need packed into one air-freight sized container. The carrier is currently building similar technology for European, Middle Eastern and Asian markets.
While the technical aspect of the NDR team is incredibly impressive, the AT&T employees themselves deserve a mention. Far from being fast-strike projects, some NDR interventions can last long periods and be highly stressful; the AT&T team members we spoke to told us that 9/11 was probably the toughest job ever, both emotionally but also because of the telecoms demands of Wall Street, in a deployment that lasted five whole weeks. Katrina was the second-longest deployment, at 22 days, with particularly difficult conditions. During the last hurricane season the wireless team were active for a full three months non-stop. After the NDR team moves out, the command center is often left in place for the local AT&T staff to use while their own facility gets repaired or replaced. Everything else - just as happens at the end of the regular simulations - is dismantled and returned to the various warehouses.
If being present for part of the simulation showed us anything, it's the sheer scale of the challenge presented by not only today's communications systems but dealing with five decades of legacy hardware. The AT&T NDR team are responsible not only for getting people's cellphones reconnected, but a voice and data backbone that goes down to lower-level services based on copper connections. Whole trailers are dedicated to aggregating copper to coax, coax to fiber; nine trailers on-site for the simulation alone. That's before the sim-managers start loading in errors, glitches and artificial problems to test out the team's ingenuity and training. As one engineer told us, though, "better to face it here, than for the first time on-site after a disaster."
SlashGear would like to thank AT&T for inviting us to join them on their simulation, and particularly Kelly T. Morrion the NDR technical specialist who accompanied us on the day.