[University of Arkansas][Computing Services]

Disaster Recovery Plan
Disaster Recovery Teams

Last update: Thursday, 10-Oct-2002 11:46:54 CDT

To function in an efficient manner and to allow independent tasks to proceed simultaneously, the recovery process will be handled by teams. This plan calls for eight teams that work together, but for which specific portions of the recovery are assigned.

The eight Disaster Recovery Teams are:

  1. Recovery Management Team
  2. Damage Assessment Team
  3. Facility Recovery Team
  4. Network Recovery Team
  5. Platform Recovery Team
  6. Applications Recovery Team
  7. Computer Operations Team
  8. Administrative Support Team

The Recovery Management Team oversees the whole recovery process. The other seven teams are represented in the Recovery Management Team. The Recovery Manager leads the Recovery Management Team. The Manager has the final authority on decisions that must be made during the recovery. The Recovery Manager is responsible for appointing the other members of the Recovery Management Team. Each member of the Recovery Management Team will have the responsibility for appointing the other members of the respective team(s).

Selecting Personnel for the Recovery Management Team

The selection of the members of the Recovery Management Team is very important. Since it is almost impossible to document exactly what each of the individual recovery teams will be required to do (each disaster will have its own special set of circumstances, many of which will be completely unanticipated), each member of the Recovery Management Team must be capable of stepping in with the technical and management skills to make the on-the-spot decisions necessary to complete the task at hand.

The discussion that follows identifies those skills that are needed by members of the Recovery Management Team. If these positions are filled with qualified individuals, then the odds for a timely and successful recovery are very high.

Recovery Manager
This individual needs to be a skilled manager/administrator who is accustomed to dealing with pressure situations. He should have a broad knowledge of the hardware and software in use at the site. He should be a "problem solver" as there will be many problems arise that have not been anticipated in advance. He must be able to delegate responsibility to others. He must also have signature authority to expend funds as a part of the disaster recovery process. The current Director of Computing Services is the first choice for the Recovery Manager.

Facilities Coordinator
This individual needs some of the same skills as the Recovery Manager. However, he also needs to be familiar with the process of getting construction work scheduled and completed on time. He should be able to understand and oversee the setup of the electrical, environmental, and communications requirements of a data center.

Technical Coordinator
This individual needs to be highly skilled in a number of areas. He must have a strong background in the setup and interfacing of as many of the platforms in use as possible. He needs to be able to communicate easily with vendor technical represenatives and engineers concerning installation options, performance issues, problem resolution, and a myriad of other things. He must also be able to schedule and manage people.

Administrative Coordinator
This individual needs to be skilled in the business operations of the University and the State of Arkansas. He should be well acquainted with the day-to-day operations of a University department. He should also be a "people person" who can deal with employees and their families during hard times. This person must also be familiar with State purchasing procedures and contracts.

Network Coordinator
This individual needs to be skilled in the area of network design and maintenance. He should be trained in diagnosing and correcting network outages and in connecting and debugging new additions to an existing network.

Applications Coordinator
First choice for this individual would be someone from the existing application support group. The person should have exposure to a cross section of the currently used applications. The most critical areas are Payroll, Accounting, and Student Records. If no one from the current staff is available, the most important technical skills are: a knowledge of VSAM and ADABAS utilities and storage techniques, CICS application experience, and experience testing and debugging applications developed for them. The person will need to use available tools to ascertain the status of files and data base objects and be prepared to restore later versions from backups if required. He will also need to interface with users to verify that applications are functioning as expected or analyze and develop solutions to problems that arise.

Computer Operations Coordinator
This individual needs to be skilled in the day-to-day operations of the mainframe systems and software, as well as the knowledge and skills to recreate (or implement new) production schedules for application systems. This person will also be responsible for setting up a limited help desk function that will provide information to callers on status and availability of systems, how to access systems that are in a temporary setting, or any new procedures that users need for submitting their production applications for processing.

The following table contains a sample list of the people currently employed who could fill the positions on the Recovery Management Team. Alternates are listed, but there are other qualified individuals who could step in should any of these persons not be available.

Sample Recovery Management Team Roster
Position Primary Alternates

Recovery Manager Robert Zimmerman David Merrifield

Facilities Coordinator Leo Yanda  

Technical Coordinator David Merrifield Dan Martin

Administrative Coordinator Tina Whatley Ron Neyman
Randy Putt

Network Coordinator Craig Brown Stephen Hamlin
Terry Davis

Applications Coordinator Ron Neyman Randy Putt
Allen Fields

Computer Operations Coordinator Chuck Dwyer David Merrifield
El Orwig

Disaster Recovery Team Responsibilities

As the recovery process gets underway, it is imperative that each of the recovery teams remain in close communication and strive to work together to complete the recovery as expediently as possible. The following section provides a brief description of the responsibilities for each team.

Recovery Management Team

The Recovery Management Team is responsible for the coordination of the entire project. It is composed of seven skilled people:

  1. Recovery Manager
  2. Facilities Coordinator
  3. Technical Coordinator
  4. Administrative Coordinator
  5. Network Coordinator
  6. Applications Coordinator
  7. Computer Operations Coordinator

The Recovery Manager is the leader of the Recovery Management Team and has the final authority regarding decisions during the recovery process. Each of the remaining individuals will be the leader of a specialized team that will address a portion of the recovery tasks. As the recovery process gets underway, there will likely be areas of overlap between teams and close communication will be required. The Recovery Management Team will have regular meetings scheduled to provide for communication between team coordinators.

Each coordinator should schedule a meeting for members of his team well in advance of their first planned activities. A first-meeting agenda might include:

  1. Reviewing the current status of the recovery operation.

  2. Emphasizing what the team's responsibilities are

  3. Making sure that members are aware of any changes to the original recovery plan

  4. Assigning tasks to individual team members

  5. Setting up time and location for future team meetings

Damage Assessment Team

The Damage Assessment Team will be led by the Technical Coordinator. He will be responsible for selecting the other team members. Likely choices would be a member(s) from Physical Plant, Operations, Network Services, Campus Telephone Services, and Technical Services. This team will not be responsible for a detailed damage assessment for insurance purposes. The primary thrust for this team is to do two things:

  1. Provide information for the Recovery Management Team to be able to make the choice of the recovery site.

  2. Provide an assessment of the salvageability of major hardware components.

Based on this assessment the Recovery Management Team can begin the process of acquiring replacement equipment for the recovery.

Facility Recovery Team

The Facility Recovery Team will be led by the Facilities Coordinator. He will be responsible for selecting the other team members. Likely choices would be member(s) from Operations, Network Services, Physical Plant, Cold Site Building Representative, and Technical Services.

This team will be responsible for the details of preparing the recovery site to accomodate the hardware, supplies, and personnel necessary for recovery. Detailed layouts and instructions for the Cold Site preparation are included in the recovery plan.

This team will also be responsible for oversight of the activities for the repair and/or rebuilding of the primary site (the Administrative Services Building). It is anticipated that the major responsibility for this will lie within Physical Plant and contractors. However, this team must oversee these operations to ensure that the facility is repaired to properly support the operation of mainframe and networking equipment per the original design of the primary site.

Network Recovery Team

The Network Recovery Team will be led by the Network Coordinator. He will be responsible for selecting the other team members. Likely choices would be member(s) from Network Services, Technical Services, User Services, and Physical Plant. It may also be helpful to have the building and/or network manager for the Cold Site building be a part of this team should it be necessary to use the Cold Site.

This team will be responsible for overseeing the restoration of the campus network and all network connections necessary at the recovery site. It is entirely possible in certain disaster situations that the Network Recovery Team may be the only team convened as a result of a campus disaster. For instance, should a fire occur at the Band Building and destroy fiber optic connections and network equipment, this team will be charged with the recovery of operations out of that building or in another building on campus in the most expedient manner.

Because there is such a high degree of reliance on the campus network, for instruction, research, and administrative purposes, very high emphasis must be placed on restoring the network as quickly as possible.

Platform Recovery Team

The Platform Recovery Team will be led by the Technical Coordinator. He will be responsible for selecting the other members of the team, each of which will be the leader in charge of restoring one or more of the computer platforms described in this plan.

Each team member may recruit others to assist in the technical and detailed work of the recovery. They are responsible for communicating needs and status information to other recovery teams and to coordinate restoration operations between parties working on different computer platforms.

Each platform recovery group will follow this general plan of action:

  1. Review damage assessment.
  2. Determine which hardware, software, and supplies will be needed to start the restoration of a particular system.
  3. Communicate list of components to be purchased and their specifications to the Administrative Support Team.
  4. Review the recovery steps documented in this plan and make any changes necessary to fit the situations present at the moment.
  5. When hardware begins to arrive, work with vendor representatives to install the equipment.
  6. When all components are assembled, begin the steps to restore the operating system(s) and other data from the off-site backup tapes.
  7. Attempt to recreate status of all systems up to the point of the disaster if possible. If not, the system is handed off to the Application Recovery Team.

Application Recovery Team

The Application Recovery Team will be led by the Application Coordinator. He will be responsible for selecting the other team members. This team will be responsible for conducting activities leading up to the approval and acceptance of application systems for production use. In general, this team's activities will begin after the Platform Recovery Team has completed work on the target platform. Some of the team members may in fact be from the platform recovery teams.

Some of the anticipated tasks include:

  1. Analysis of need for additional recovery activities such as data base restores or individual file restores

  2. Developing programs/procedures to address specific problems

  3. Interfacing with application users to test applications

Computer Operations Team

The Computer Operations Team will be led by the Computer Operations Coordinator. He will be responsible for selecting the other team members. This team will provide three major functions:

  1. Man the Help Desk to provide phone assistance and status information to end-users.

  2. Provide operator staffing for the computer systems at the Cold Site.

  3. Provide Production/Control function for establishing production job schedules after systems and applications are restored.

Administrative Support Team

The Administrative Support Team will be led by the Administrative Coordinator. He will be responsible for selecting the other team members. This team will provide administrative support to the other recovery teams as well as support to employees and their families. One of the most important functions that this team can provide is to take the burden of administrative details so that the engineers and technicians who are responsible for systems recovery can concentrate on their recovery work.

One member of this team should be designated as Family Contact. This person will be available throughout the recovery process to provide assistance to employee family members.

One member of this team should be a designated representative of the University's Purchasing Office. This person will the liaison to the Business Manager's Office for the purpose of expediting all emergency purchases and ensuring that proper University and State regulations for purchasing in an emergency are followed. The Purchasing Office has their own Disaster Contingency Plan that they will implement to aid departments needing to restore or rebuild facilities in the event of a disaster.

Some of the anticipated team tasks include:

  1. Provide support for executing acquisition paperwork.

  2. Assist with the detailed damage assessment and insurance procedures.

  3. Determine the status of staff working at the time of the disaster.

  4. Provide counseling services for staff or family members having emotional problems resulting from the disaster.

  5. Assist the individual Team Coordinators in locating potential team members.

  6. Coordinate food and sleeping arrangements of recovery staff as necessary.

  7. Provide support to track time and expenses related to the disaster.

  8. Provide delivery and transportation services to the Cold Site or other locations as required.

  9. Provide public relations support (this function may be provided by University Relations).

  10. Assist in contracting with outside parties for work to be done in the recovery process (such as the installation of equipment, or consulting assistance for the installation or recovery of software systems).

[Home Page] [Table of Contents] [Send Mail]
Copyright © 1997 University of Arkansas
All rights reserved