What is in a Business Continuity Plan
Business Continuity Planning
What is Business Continuity Planning?
Business Continuity Planning (BCP) is the process of creating preventive and recovery systems to deal with potential cyber threats to an organization or to ensure process continuity in the wake of a cyberattack. BCP's secondary goal is to ensure operational continuity before and during execution of disaster recovery.
The planning entails asset and personnel protection, thus ensuring a quick recovery of operations in the event of a disaster. In a nutshell, the basic business continuity requirement is to keep essential functions up and running during a disaster and to recover with as little downtime as possible. A business continuity plan considers various unpredictable events, such as natural disasters, fires, disease outbreaks, cyberattacks, and other external threats.
Importance of Business Continuity
At a time when downtime is unacceptable for any organization, business continuity is critical to address client management, retention, and operational security. Causes for downtime in a system are a dime a dozen, but cyberattacks and extreme weather conditions are two of the most prevalent issues that can disrupt a business in a very short time. That's why it's essential to have a business continuity plan BCP in place that can get your business operations up and running in the least possible time.
The premise of BCP is to empower an organization to keep crucial functions running during downtime. This, in turn, helps the organization respond quickly to an interruption, while creating resilient operational protocols. A robust business continuity plan helps save money, time, and reputation/brand image. Eventually, this helps in mitigating financial risks.
Steps in Business Continuity Plan(BCP)
1. Conduct Business Impact Analysis & Risk Assessment
2. Develop Recovery Strategies
3. Solution Implementation
4. Testing & Acceptance
5. Routine Maintenance
Role of Risk Assessment & Business Impact Analysis (BIA)
Planning is the key to recovering from a business interruption and enables you to maintain focus during the aftermath of an outage. Companies can prepare for the possibility of adverse events that interrupt their operations by developing a business impact analysis (BIA) and conducting a risk assessment (RA).
Before a business continuity plan (BCP) is created, an organization must conduct a detailed risk assessment in order to identify the areas of exposure and all possible threats that could potentially cause a business interruption.
Types of threats that should be considered include natural, man-made, technological, loss of utilities, and pandemic outbreaks. All possible threats should be analyzed to determine the likelihood of their occurrence and the level of impact to the organization. Consideration should also be given to what mitigation steps have been taken to lessen the likelihood of occurrence and/or impact.
Threats that result in high-risk ratings should be reviewed with management to determine the need for additional mitigation strategies to lessen the possibility of the threat causing a business outage.
Risk Assessment Process
The risk assessment process consists of the following steps:
- Risk Identification: It is the process of determining risks that could potentially prevent the program, enterprise, or investment from achieving its objectives. It includes documenting and communicating the concern.
- Risk Analysis: Risk Analysis involves examining how project outcomes and objectives might change due to the impact of the risk event. Once the risks are identified, they are analysed to identify the qualitative and quantitative impact of the risk on the project so that appropriate steps can be taken to mitigate them.
- Risk Evaluation: Risk Evaluation is the process used to compare the estimated risk against the given risk criteria to determine the significance of the risk.
Performing Risk Evaluation
A risk evaluation can be performed in five simple steps.
- Identify and Prioritize Assets: Consider all the different types of data, software applications, servers and other assets that are managed. Determine which of these is the most sensitive or would be the most damaging to the company if compromised.
- Locate Assets: Find and list the source of those assets. Be it desktop office computers, mobile devices, internal servers, or anything else, you'll want to trace each asset back to its source.
- Classify Assets: Categorize each asset as either public information, sensitive internal information, non-sensitive internal information, compartmentalized internal information, and regulated information.
- Threat Modeling Exercise: Identify and rate all the threats faced by your top-rated assets. Microsoft's STRIDE method is a popular one.
- Data Finalization & Planning: Once you have your evaluation, it's time to start tackling those risks, beginning with the most critical.
Business Impact Analysis
When developing and managing an effective Business Continuity Management System (BCMS), the backbone of its correct implementation is the business impact analysis (BIA) stage.
Methodological steps for developing a BIA:
1. Define the boundaries of the BIA
The starting point prior to the development of the BIA is the identification of the scope of the BCMS within the organization. Strategically, top management should have identified the scope, considering the products and services of the organization. Several key criteria could be considered to decide the products and services of the organization that need to be protected to assure continuity including: a) market pressure, b) specific company sites, c) products and services profitability. Once the scope has been established, it is strategically recommended that its boundaries are outlined and precisely defined in terms of with what activity they initiate and with which one they terminate.
2. Identify activities that support the scope
An activity is considered a process or set of processes undertaken by an organization (or on its behalf) that produces or supports one or more products or services. When the scope of the BCMS is delimited, the organization should identify all the activities involved in the scope that directly contribute to the generation of its products and services. A good tool that helps in this step is a flowchart.
3. Assess financial and operational impacts
The third step is to assess the financial and operational impacts that would affect the organization in the event of a disruption of the activities identified in the preceding step. The financial impact assessment is performed before carrying out the operational impact assessment.
4. Identify critical activities
This step identifies the activities that have to be performed in order to deliver the key products and services, which enable an organization to meet its most important and time-sensitive objectives. The financial and operational impact rankings assigned in step three provide a basis for identifying critical activities.
5. Assess MTPDs and prioritize critical activities
The maximum tolerable period of disruption (MTPD) is the duration after which the viability of the organization will be irrevocably threatened if product and service delivery cannot be resumed. The estimates of MTPD can be based on either financial or operational impacts. The personnel responsible for assessing the financial and operational impacts are asked the following question: "What is the maximum period of time that can be tolerated for this process based on the financial and operational impact levels?" Let's imagine that the financial loss of US $25,000 per day becomes unacceptable when it exceeds US $50,000. Therefore, the MTPD is two days, since the financial losses will exceed US $50,000 by then, if the disruption continues for a longer period of time. This example assumes that the operational impacts are insignificant relative to the financial losses.
6. Estimate the resources that each critical activity will require for resumption
In this step, the organization needs to estimate the resources required for resumption at the level of each critical activity. Previously, the firmshould have identified the minimum level at which each critical activity needs to be performed upon resumption. The sources that a business can use to determine the minimum levels of performance acceptable are the contractual agreements and service level agreements for the key products and services involved in the scope. The minimum resources needed for each activity can be classified as: (a) critical IT systems and applications and (b) critical non-IT resources. The second category can be subdivided into: "physical areas," "human competences," "equipment," and "documents."
7. Determine RTOs for critical activities
"The recovery time objective (RTO) is the target time set for resumption of product, service or activity delivery after an incident" (Fullick, 2013). The RTO, which is the length of time between a disruptive event and the recovery of resources, indicates the time available to recover disrupted resources. The MTPD value expresses the maximum limit for the RTO value. The exercise of business continuity management arrangements enables the organization to validate its RTOs and, therefore, to take corrective actions to reduce them. Cross-functional teams involved with the critical activities have the task to make the estimates of the RTOs.
8. Identify all dependencies relevant to critical activities
In this step, the organization has to "consider all dependencies relevant to the critical activities, including suppliers and outsource partners" (Alexander, 2009). The critical activities that have been considered usually have some vital inputs that are provided by some other company processes or by external suppliers or outsource partners. The internal processes that supply important inputs to critical activities also have to be considered as critical activities. In the case of external suppliers and outsource partners, contractual agreements requiring them to have a BCMS set up and managed should be in place. It is important to bear in mind that every company is only as a resilient as its weakest link in the supply chain.
9. Recovery point objectives for critical activities
The Recovery Point Objective (RPO) is the amount of data lost because of a business disruption. The RPO is the time it will take to investigate, repair, and carry out all the arrangements to be able to activate the RTO. RPO is measured as the time between the last data backup and the disruptive event. In the BIA process, RPO is determined for each application, by asking the critical activity owners the following question: "What is the tolerance, in terms of length of time, to loss of data that may occur between any two backup periods?" The response to this question indicates the values of RPO.
Challenges in Business Continuity Management and How to Overcome Them
1. Some business unit heads may not completely understand what a process is.
The best way to address this is to look at the organizational procedures or website of the particular business unit and go to the BIA interview or meeting with a prior understanding of the high-level processes. This will make the discussion process-focused.
2. Impacts are not captured properly and priorities are misjudged.
Often the first BIA does not go to plan and it may be a good idea to repeat the BIA at least once soon after the first attempt. After the first round there will be a much better understanding of the processes, applications, and other resources, as well as their interdependencies. This will enable a focused discussion in the second round.
3. Interdependency may not be captured in the first round of BIA.
The best way to address this is to talk about "what happens next" and "what happens before."
4. Common applications and online systems such as the intranet, file storage, and email can be easily missed in a BIA where ownership is not defined.
This needs to be captured by asking intelligent questions or with discussion within the IT section.
5. In general, finding impact on revenue or profit is difficult unless it is a retail sales process.
Having financial details and budgetary information and analysing them prior to a BIA discussion will be useful to help the business unit estimate the financial impacts, especially on revenue or profit.
6. Interpolation and extrapolation often cause difficulties.
Simplification is often necessary in this area. For example, consider impacts for one day or three days, depending on the overall organizational criticality. If overall impacts are high, select a shorter duration such as one day. If impacts are low, choose three or five days. It is easier for the business unit to assess what the impacts would be if their unit shuts down for two days than it is for the business unit to estimate the impact as time changes. After assessing the impact for a particular duration, then interpolation and extrapolation could be done using mathematical formulae or otherwise. However, such interpolation and extrapolation (linear or non-linear) needs to be realistic and validated by the business unit.
7. Key man dependency is something used by staff as a weapon to address job security.
The BIA should identify this risk, but addressing it may well be challenging in small functional areas.
8. Single points of failure should be identified in the BIA using the resources that are required for a particular process.
The identification of single points of failure can be difficult and asking questions such as "What do you use for this?" or "What are your dependencies?" could be helpful. This should be followed by a risk assessment to further understand single points of failure and their hidden components.
9. The final review of the business impact analysis should have a good distribution of the priority of processes.
Although there is no specific limit defined, and the levels of criticality could vary, it is ideal to have four-five levels of criticality, with the top priorities not exceeding 25%. Anything more than that will not result in the processes being effectively restored (or exercised) as the focus will be lost.
10. It is important not to spoil the relationship with the business units as the BIA is only the first step in the business continuity process.
The support of business units is essential in a successful business continuity implementation. In some cases, if an agreement cannot be reached about prioritization, some tactics need to be used.
Disaster Recovery
Disaster recovery (DR) is an organization's method of regaining access and functionality to its IT infrastructure after events like a natural disaster, cyberattack, or business disruptions. A variety of DR methods are often part of a Disaster Recovery Plan (DRP).
IT Disaster Recovery Plan
A DRP contains both responsive and preventative elements and is a key part of a company's Business Continuity Planning (BCP). DRP is a formal document created by a corporation that contains detailed instructions on the way to answer unplanned incidents like natural disasters, power outages, cyberattacks, and any other disruptive event. The plan contains strategies on minimizing the consequences of a disaster so a corporation will still operate — or quickly resume key operations.
A DRP is more focused than a business continuity plan and doesn't necessarily cover all contingencies for business processes, assets, human resources, and business partners.
Disaster recovery encompasses the procedures, policies, or processes that prepare an organization's vital IT infrastructure to effectively recover from natural or human-induced disasters, and ensure business continuity. It must contain scripts (instructions) which can be implemented by anyone.
Why is an IT Disaster Recovery Plan important?
Key reasons why a business would need a detailed and tested DRP include:
What are the types of Disaster Recovery?
Virtualization Disaster Recovery
Virtualization provides flexibility in disaster recovery. Servers are virtualized independent from the underlying hardware. Therefore, an organization does not need the same physical servers at the primary site as at its secondary disaster recovery site.
Network Disaster Recovery
Network Disaster Recovery A network disaster recovery plan identifies specific issues or threats related to an organization's network operations as a result of network provider problems or disasters caused by nature or human activities.
Cloud-based Disaster Recovery
Cloud disaster recovery enables the backup and recovery of remote machines on a cloud-based platform. Cloud disaster recovery is primarily an infrastructure as a service (IaaS) solution that backs up designated system data on a remote offsite cloud server.
Data Center Disaster Recovery
Data center disaster recovery is the organizational planning to resume business operations following an unexpected event that may damage or destroy data, software, and hardware systems.
To meet an organization's RTO and RPO objectives, data center operators face numerous challenges. A key challenge is data synchronization, and it depends on frequency of replication. The most common replication methods are:
Synchronous Replication
In a synchronous replication, the receiving system acknowledges every single change received from the sending system. Adopting this method requires maintenance of a "hot" backup site, and it is most effective in combination with "hot" failover solutions and Global Server Load Balancing (GSLB) solutions.
Semi-Synchronous Replication
The receiving system sends an acknowledgement only after a series of changes have been received. This method of synchronization is parallel to the "warm" failover approach and may be the right choice for services that — in the event of a disaster — can allow for some loss of data and a reasonable amount of downtime.
Asynchronous Replication
This method's data replication is faster but less secure, as the sending system simply continues to send data, without receiving any response. Parallel to the "cold" failover approach, this method is best suited for static resources or scenarios in which data loss is acceptable.
Disaster Recovery Sites
One of the key elements in any DRP is the selection of a secondary site for data storage to help prevent data loss in the event of cyberattacks or a natural disaster. There are three major types of disaster recovery sites that can be used: cold sites, warm sites, and hot sites. Understanding the differences between these can help select the one that best suits company needs and mission-critical business operations.
Cold Backup Strategy: A cold site can be a backup facility with little or no hardware equipment installed. A cold site is an office space with basic utilities like power, cooling system, air con, communication system, etc. A cold site is the most cost-effective option among the three disaster recovery sites.
However, thanks to the very fact that a chilly site doesn't have any pre-installed equipment, it takes tons of your time to properly set it up and fully resume business operations. In case of a disaster, an organization would require help from IT personnel to migrate necessary servers and make them functional in order to take on the workload of the primary site.
Warm Backup Strategy: A warm site is seen as the center ground between the cold site and hot site. A warm site may be a backup facility that has the network connectivity and
therefore the necessary hardware equipment already pre-installed. However, a warm site cannot perform on an equivalent level as a production center because they're not equipped within the same way.
Therefore, a warm site has less operational capacity than the first site. Moreover, data synchronization between the first and therefore the secondary site is performed daily or weekly, which may end in minor data loss. A warm site is ideal for organizations which operate with less critical data and may tolerate a brief period of downtime. This type of a DR site is the second costliest option.
Hot Backup Strategy: A hot site may be a backup facility which represents a mirrored copy of the first production center. A hot site is provided with all the required hardware, software, and network connectivity, which allows you to perform near real-time backup or replication of the critical data. This way, the assembly workload is often failed over to a DR site in a jiffy or hours, thus ensuring minimal downtime and zero data loss.
A hot site is predicted to be always online and running without disruption so as to ensure data synchronization between the sites. A hot site is the most expensive option among the three. Thus, it is important to ensure that this type of a DR site is located far enough from the production center. This way, you'll decrease the likelihood of a hot site suffering from an equivalent disaster like the primary site.
Key Steps in a Disaster Recovery Plan
The objective of a DRP is to make sure that a corporation can answer a disaster or any emergency that affects information systems, and thus minimize the effect on business operations. IBM has created a template to supply a basic DRP. The following are the suggested steps as found within the DR template. Once you've got the knowledge, it's recommended that you simply store the document at a safe, accessible location off site.
Step 1: Major goals
The first step is to broadly outline the major goals of a disaster recovery plan.
Step 2: Personnel
Record your data processing personnel. Include a replica of the chart together with your plan.
Step 3: Application profile
List applications and assess whether they are critical and if they're a hard and fast asset.
Step 4: Inventory profile
List the manufacturer, model, serial number, cost, and whether each item is owned or leased.
Step 5: Information services backup procedures
Include information such as: "Journal receivers are changed at ________ and at ________." And: "Changed objects in the following libraries and directories are saved at ____."
Step 6: Disaster recovery procedures
For any DRP, these three elements should be addressed:
- Emergency response procedures to document the acceptable emergency response to a fireplace, natural disaster, or any other activities, to guard lives and limit damages.
- Backup operations procedures to make sure that essential processing operational tasks are often conducted after the disruption.
- Recovery actions procedures to facilitate the rapid restoration of a knowledge processing system following a disaster.
Step 7: DRP for mobile site
The plan should include a mobile site setup plan, a communication disaster plan (including the wiring diagrams) and an electrical service diagram.
Step 8: DRP for decent site
An alternate hot site plan should provide for an alternate (backup) site. The alternate site features a backup system for temporary use while the house site is being re-established.
Step 9: Restoring the entire system
To urge your system back to the way it had been before the disaster, use the procedures on recovering after a whole system loss in Systems management: Backup and recovery.
Step 10: Rebuilding process
The management team must assess the damage and start the reconstruction of a replacement data center.
Step 11: Testing the disaster recovery and cyber recovery plan
In successful contingency planning, it's important to check and evaluate the DRP regularly. Data processing operations are volatile in nature, resulting in frequent changes to equipment, programs, and documentation. These actions make it critical to think about the plan as a changing document.
Step 12: Disaster site rebuilding
This step should include a floor plan of the data center, the current hardware needs and possible alternatives, as well as the data center square footage, power requirements, and security requirements.
Step 13: Record of plan changes
Keep your DRP current. Keep records of changes to your configuration, your applications, and your backup schedules and procedures.
Business Continuity Plan Testing
Business Continuity Plan BCP testing will help you to:
Testing Scenarios
Data Loss/Breach: One of the most prevalent workplace disasters today. The explanation for data loss or breach could vary:
- Ransomware and cyberattacks
- Unintentionally erased files or folders
- Server/drive crash
- Data center outage
Data is mission-critical for any company, and losing it can have many serious consequences, like significantly impacting sales and logistics applications.
The goal is to regain access to data as soon as possible. Restoring a backup is the solution.
However, who's responsible for that? What's the communication plan in this case? What are the priorities? Who needs to be contacted right away? Are there any vendors involved?
These and other questions will be answered during a test.
Data Recovery: In this scenario, you would like to make sure your BCDR systems work like clockwork. To do that, run a test that involves losing a bulk of knowledge, then attempt to recover it. Some of the elements you've got to evaluate will include your RTO, and whether your team met its objectives. Was there any damage to the files during recovery? If your backup was stored within the cloud, did you encounter any issues?
Power Outage: Let's imagine there was an influence outage thanks to a recent storm. The utility company reported that the power wouldn't be back up for a few days.. What does one do? First off, your incident response team must coordinate among themselves and communicate with the remainder of the organization.
How will you notify your workforce about the incident? Who's expected to return within the office, and who's ready to work remotely? Which departments get affected the foremost and thus need immediate relief (e.g., accounting, logistics)?
Do you have a backup power generator? If yes, does anyone on the team have the skills to use it? Do you have an arranged office or mobile recovery location?
Answers to these questions must be covered in your BCP. And running a test will confirm that everyone's on an equivalent page.
Network Outage: A power outage inevitably leads to a network outage. However, network outages can happen with electricity still being on, and they could last indefinitely. In such scenarios, many businesses rely on a work from home strategy that isn't reliable for an extended period. When working from home, many employees have various distractions that affect their productivity. So, during your test, verify the following points:
- Does everyone have access to their work systems?
- Is everyone conscious of the safety measures required while working remotely (VPN, safe network connection, etc.)?
- What is the plan for network restoration?
Answers to these questions also have to be laid out in your BCP.
Physical Disruption: Fire drills are one among the foremost critical company-wide drills that have to be completed annually. There may already be an area code compliance in your area, but if not, it's vital to conduct a fireplace drill regardless.
Like a fire drill, you can test responses to other situations like natural disasters (e.g., earthquakes, tornadoes, storms) or other critical situations (active shooter, bomb threat, etc.). These exercises will help familiarize everyone with emergency procedures and safety steps required.
Business Continuity Plan Maintenance
Technology evolves and organization's IT landscape also changes with time, so the plan needs to be updated, too. Thus, key personnel involved in business continuity planning need to review the plan and discuss any areas that must be modified.
Typically, the business continuity professional is responsible for ensuring that the plan is kept current. A business continuity plan needs to be reviewed and updated on a timely basis for overall plan coverage and incident management procedures, making sure the plan addresses any new risks as changes to the company and its operations take place.
At least twice a year, updates should be identified and applied in the BCP, and the revision history should be updated to record the changes. Revisions should be included in the master copy of the plan.
Source: https://www.eccouncil.org/business-continuity-planning/