![[University of Arkansas]](./pic/uabanner.gif)
![[Computing Services]](./pic/ua-comp.gif)
In order to facilitate recovery from a disaster which destroys all or part of the machine room in the Administrative Services Building, certain preparations have been made in advance. This document describes what has been done to lay the way for a quick and orderly restoration of the facilities that Computing Services operates.
The following topics are presented in this document:
The first and most obvious thing to do is to have a plan. The overall plan of which this document is a part is that which Computing Services will use in response to a disaster. The extent to which this plan can be effective, however, depends on disaster recovery plans by other departments and units within the University.
For instance, if the Administration Building were to be involved in the same disaster as the Administrative Services Building, the functions of the Business Manager's Office, or more in particular, the Purchasing Office, could be severely affected. Without access to the appropriate procedures, documents, vendor lists, and approval processes, the Computing Services recovery process could be hampered by delays while Purchasing recovers.
Every other business unit within the University should develop a plan on how they will conduct business, both in the event of a disaster in their own building or a disaster at Computing Services that removes their access to data for a period of time. Those business units need means to function while the computers and networks are down, plus they need a plan to synchronize the data that is restored on the central computers with the current state of affairs. For example, if the Payroll Office is able to produce a payroll while the central computers are down, that payroll data will have to be re-entered into the central computers when they return to service. Having a means of tracking all expenditures such as payroll while the central computers are down is extremely important.
Go back to the top of this document
If a central facility operated by Computing Services is destroyed in a disaster, repair or rebuilding of that facility may take an extended period of time. In the interim it will be necessary to restore computer and network services at an alternate site.
The University has a number of options for alternate sites, each having a varying degree of up-front costs.
One of the most critical issues involved in the recovery process is the availability of qualified staff to oversee and carry out the tasks involved. This is often where disaster partnerships can have their greatest benefit. Through cooperative agreement, if one partner loses key personnel in the disaster, the other partner can provide skilled workers to carry out recovery and restoration tasks until the disabled partner can hire replacements for its staff. Of course, to be completely fair to all parties involved, the disabled partner should fully compensate the assisting partners for use of their workers unless there has been prior agreement not to do so.
Northwest Arkansas has some fairly large mainframe installations that would likely help if needed. (Potential organizations to contact are WalMart, Tyson Foods, JBHunt, and IBM.) Also, the University has a reciprocal agreement with the U of A for Medical Sciences in Little Rock to provide assistance to the other party in the event of a disaster. UAMS may be able to provide computing facilities for short-term, critical applications. The University can also seek assistance from the State Department of Computer Services in Little Rock.
The use of reciprocal disaster agreements of this nature may work well as a low-cost alternative to hiring a disaster recovery company or building a hot site. And they can be used in conjunction with other arrangements, such as the use of a cold recovery site described below. The primary drawback to these agreements is that they usually have no provision for providing computer and network access for anything other than predefined critical applications. So users will be without facilities for a period of time until systems can be returned to operation.
The University of Arkansas has chosen to use the cold site approach for this disaster recovery plan. The necessary agreements are in place for Computing Services to utilize space in the Bell Engineering Building (BELL 108 suite) as its Cold Site. It has adequate space to house the hardware, with some office space available for operating and technical personnel. It has good connectivity to the campus fiber optic network. And a certain amount of preparation has been made for electrical and cooling capacity to support mainframes and network equipment.
More detail on the preparations that have already been done, plus the actual work that needs to be done to renovate the space to be ready to receive the computer equipment is available in the Section DRPCS001: Recovery at the Cold Site.
Go back to the top of this document
This plan contains a complete inventory of the components of each of the computer and network systems and their software that must be restored after a disaster. The inevitable changes that occur in the systems over time require that the plan be periodically updated to reflect the most current configuration. Where possible, agreements have been made with vendors to supply replacements on an emergency basis. To avoid problems and delays in the recovery, every attempt should be made to replicate the current system configuration. However, there will likely be cases where components are not available or the delivery timeframe is unacceptably long. The Recovery Management Team will have the expertise and resources to work through these problems as they are recognized. Although some changes may be required to the procedures documented in the plan, using different models of equipment or equipment from a different vendor may be suitable to expediting the recovery process.
Go back to the top of this document
New hardware can be purchased. New buildings can be built. New employees can be hired. But the data that was stored on the old equipment cannot be bought at any price. It must be restored from a copy that was not affected by the disaster. There are a number of options available to us to help ensure that such a copy of your data survives a disaster at the primary facility.
While this option does not guarantee the up-to-the-second updates available with the remote dual copy disk option, it does provide means for conveniently taking backups and storing them off-site any any time of the day or night. Another huge advantage is that backups can be made from mainframes, file servers, distributed (unix-based) systems, and personal computers. Although such a system is expensive, it is not prohibitively so.
This option has some drawbacks. First, there is a period of exposure from the time that a backup is made to the time it can be physically removed off-site. A disaster striking at the wrong time may result in the loss of all data changes that have occurred from the time of the last off-site backup. There is also the time, expense, and energy of having to transport the tapes. And there is also the risk that tapes can be physical damaged or lost while transporting them.
Some organizations contract with disaster recovery companies to store their backup tapes in hardened storage facilities. These can be in old salt mines or deep within a mountain cavern. While this certainly provides for more secure data storage, considerable expense is undertaken for regular transportation of the data to the storage facility. Quick access to the data can also be an issue if the storage facility is a long distance away from your recovery facility.
The University has opted to taking periodic backups of its primary mainframe systems, databases, file servers, and unix systems and storing those backups in two locations elsewhere on campus. The primary storage location is in Bell Engineering Room 108M, which is adjacent to the Cold Site recovery suite. The second location is in the Business Administration Building Room 107. The tape vaults at the Administrative Services Building are the final storage location where the oldest generation of system and application backup tapes are kept.
In general, backups for each subsystem are cycled through the three sites. Backups are initially taken to BELL 108M in the Computing Services morning delivery run. These are the first generation backups. Existing tapes at BELL are relocated to BADM 107. These are the second generation backups. Existing tapes at BADM are relocated back to the Administrative Services Building for storage in the tape vaults in the machine room. These are the third generation tapes. They are retained until the next set up backups are made, and then released to scratch status. Then the cycle starts all over again.
The actual backup and cycling procedures vary somewhat depending on the computer platform. Details of these procedures are contained in the following document:
DRPBK001: Backup Procedures
Go back to the top of this document
To ensure that an up-to-date copy of this plan is available when a disaster occurs, procedures have been established to store a copy of the plan with other important recovery information at the Cold Site backup tape storage area. Two Lock Boxes have been purchased to hold these materials. The contents of both lock boxes are identical. One resides at BELL 108M; the other resides in the tape vault just off the machine room in the Administrative Services Building.
When changes to the contents of the lock boxes are necessary, the box at the Administrative Services Building is first updated, then it is take over to BELL and swapped with the box stored there. That box is returned to ADSB and updated and replaced in the tape vault. This ensures that at least one copy of the plan is available at the recovery site.
The lock boxes are to remain locked at all times. Keys to the boxes are kept by several key people within the department, including
In a disaster situation when entry into a lock box is needed but the key is not available, you can physically break the lock with bolt cutters.
The contents of the lock boxes are described in the following document:
DRPDR016: Disaster Lock Boxes Contents
Go back to the top of this document