Recent from talks
Nothing was collected or created yet.
Business continuity planning
View on Wikipedia
Business continuity may be defined as "the capability of an organization to continue the delivery of products or services at pre-defined acceptable levels following a disruptive incident",[1] and business continuity planning[2][3] (or business continuity and resiliency planning) is the process of creating systems of prevention and recovery to deal with potential threats to a company.[4] In addition to prevention, the goal is to enable ongoing operations before and during execution of disaster recovery.[5] Business continuity is the intended outcome of proper execution of both business continuity planning and disaster recovery.
Several business continuity standards have been published by various standards bodies to assist in checklisting ongoing planning tasks.[6]
Business continuity requires a top-down approach to identify an organisation's minimum requirements to ensure its viability as an entity. An organization's resistance to failure is "the ability ... to withstand changes in its environment and still function".[7] Often called resilience, resistance to failure is a capability that enables organizations to either endure environmental changes without having to permanently adapt, or the organization is forced to adapt a new way of working that better suits the new environmental conditions.[7]
Overview
[edit]Any event that could negatively impact operations should be included in the plan, such as supply chain interruption, loss of or damage to critical infrastructure (major machinery or computing/network resource). As such, BCP is a subset of risk management.[8] In the U.S., government entities refer to the process as continuity of operations planning (COOP).[9] A business continuity plan[10] outlines a range of disaster scenarios and the steps the business will take in any particular scenario to return to regular trade. BCP's are written ahead of time and can also include precautions to be put in place. Usually created with the input of key staff as well as stakeholders, a BCP is a set of contingencies to minimize potential harm to businesses during adverse scenarios.[11]
Resilience
[edit]A 2005 analysis of how disruptions can adversely affect the operations of corporations and how investments in resilience can give a competitive advantage over entities not prepared for various contingencies[12] extended then-common business continuity planning practices. Business organizations such as the Council on Competitiveness embraced this resilience goal.[13]
Adapting to change in an apparently slower, more evolutionary manner - sometimes over many years or decades - has been described as being more resilient,[14] and the term "strategic resilience" is now used to go beyond resisting a one-time crisis, but rather continuously anticipating and adjusting, "before the case for change becomes desperately obvious".
This approach is sometimes summarized as: preparedness,[15] protection, response and recovery.[16]
Resilience Theory can be related to the field of Public Relations. Resilience is a communicative process that is constructed by citizens, families, media system, organizations and governments through everyday talk and mediated conversation.[17]
The theory is based on the work of Patrice M. Buzzanell, a professor at the Brian Lamb School of Communication at Purdue University. In her 2010 article, "Resilience: Talking, Resisting, and Imagining New Normalcies Into Being"[18] Buzzanell discussed the ability for organizations to thrive after having a crisis through building resistance. Buzzanell notes that there are five different processes that individuals use when trying to maintain resilience- crafting normalcy, affirming identity anchors, maintaining and using communication networks, putting alternative logics to work and downplaying negative feelings while foregrounding positive emotions.
While resilience theory and crisis communication theory share similarities, they are not the same. The crisis communication theory is based on the reputation of the company, but the resilience theory is based on the process of recovery of the company. There are five main components of resilience: crafting normalcy, affirming identity anchors, maintaining and using communication networks, putting alternative logics to work, and downplaying negative feelings while foregrounding negative emotions.[19] Each of these processes can be applicable to businesses in crisis times, making resilience an important factor for companies to focus on while training.
There are three main groups that are affected by a crisis. They are micro (individual), meso (group or organization) and macro (national or interorganizational). There are also two main types of resilience, which are proactive and post resilience. Proactive resilience is preparing for a crisis and creating a solid foundation for the company. Post resilience includes continuing to maintain communication and check in with employees.[20] Proactive resilience is dealing with issues at hand before they cause a possible shift in the work environment and post resilience maintaining communication and accepting changes after an incident has happened. Resilience can be applied to any organization. In New Zealand, the Canterbury University Resilient Organisations programme developed an assessment tool for benchmarking the Resilience of Organisations.[21] It covers 11 categories, each having 5 to 7 questions. A Resilience Ratio summarizes this evaluation.[22]
Continuity
[edit]Plans and procedures are used in business continuity planning to ensure that the critical organizational operations required to keep an organization running continue to operate during events when key dependencies of operations are disrupted. Continuity does not need to apply to every activity which the organization undertakes. For example, under ISO 22301:2019, organizations are required to define their business continuity objectives, the minimum levels of product and service operations which will be considered acceptable and the maximum tolerable period of disruption (MTPD) which can be allowed.[23]
A major cost in planning for this is the preparation of audit compliance management documents; automation tools are available to reduce the time and cost associated with manually producing this information.
Inventory
[edit]Planners must have information about:
- Equipment
- People (roles and responsibilities)
- Suppliers and Partners
- Technology (IT Systems, Communication) [24]
- Locations, including other offices and backup/work area recovery (WAR) sites
- Documents and documentation, including which have off-site backup copies:[10]
- Business documents
- Procedure documentation
Analysis
[edit]The analysis phase consists of:
- Impact analysis
- Threat and risks analysis
- Impact scenarios
Quantifying of loss ratios must also include "dollars to defend a lawsuit."[25] It has been estimated that a dollar spent in loss prevention can prevent "seven dollars of disaster-related economic loss."[26]
Business impact analysis (BIA)
[edit]A Business Impact Analysis (BIA) is a process used to identify and evaluate the effects of disruptions on an organization's operations, and to determine recovery priorities and strategies appropriate to the organizational needs.
The main objectives of a BIA are to:
1. Identify critical activities and dependencies (people, processes, vendors, technology & facilities).
2. Assess the impact of disruptions on these activities (financial, operational, reputational, legal).
3. Determine recovery time objectives (RTO) and recovery point objectives (RPO).
4. Support the development of business continuity strategies and plans.
5. Inform risk assessment and mitigation efforts within the BCMS framework.[27]
For each function, two values are assigned:
- Recovery point objective (RPO) – the acceptable latency of data that will not be recovered. For example, is it acceptable for the company to lose 2 days of data?[28] The recovery point objective must ensure that the maximum tolerable data loss for each activity is not exceeded.
- Recovery time objective (RTO) – the acceptable amount of time to restore the function
Maximum RTO
[edit]Maximum time constraints for how long an enterprise's key products or services can be unavailable or undeliverable before stakeholders perceive unacceptable consequences have been named as:
- Maximum tolerable period of disruption (MTPoD)
- Maximum tolerable downtime (MTD)
- Maximum tolerable outage (MTO)
- Maximum acceptable outage (MAO)[29][30]
According to ISO 22301 the terms maximum acceptable outage and maximum tolerable period of disruption mean the same thing and are defined using exactly the same words.[31] Some standards use the term maximum downtime limit.[32]
Consistency
[edit]When more than one system crashes, recovery plans must balance the need for data consistency with other objectives, such as RTO and RPO. [33] Recovery Consistency Objective (RCO) is the name of this goal. It applies data consistency objectives, to define a measurement for the consistency of distributed business data within interlinked systems after a disaster incident. Similar terms used in this context are "Recovery Consistency Characteristics" (RCC) and "Recovery Object Granularity" (ROG).[34]
While RTO and RPO are absolute per-system values, RCO is expressed as a percentage that measures the deviation between actual and targeted state of business data across systems for process groups or individual business processes.
The following formula calculates RCO with "n" representing the number of business processes and "entities" representing an abstract value for business data:
100% RCO means that post recovery, no business data deviation occurs.[35]
Risk Assessment(RA)
[edit]The purpose of the Risk Assessment phase is to identify risks that could lead to disruptions and to assess their likelihood and potential impact. The main action of the Risk Assessment include: 1. Identify internal and external threats (see Common Threats section). 2. Analyze vulnerabilities and potential consequences (e.g., not having a generator during a power outage). 3. Assessing each risk by determining the likelihood of occurrence and the severity of its impact. 4. Prioritizing risks for treatment and mitigation.[36] Common threats include:
- Epidemic/pandemic
- Earthquake
- Fire
- Flood
- Cyber attack
- Sabotage (insider or external threat)
- Hurricane or other major storm
- Power outage
- Water outage (supply interruption, contamination)
- Telecomms outage
- IT outage
- Terrorism/Piracy
- War/civil disorder
- Theft (insider or external threat, vital information or material)
- Random failure of mission-critical systems
- Single point dependency
- Supplier failure
- Data corruption
- Misconfiguration
- Network outage
The above areas can cascade: Responders can stumble. Supplies may become depleted. During the 2002–2003 SARS outbreak, some organizations compartmentalized and rotated teams to match the incubation period of the disease. They also banned in-person contact during both business and non-business hours. This increased resiliency against the threat.
Impact scenarios
[edit]Impact scenarios are identified and documented:
- need for medical supplies[37]
- need for transportation options[38]
- civilian impact of nuclear disasters[39]
- need for business and data processing supplies[40]
These should reflect the widest possible damage.
Tiers of preparedness
[edit]SHARE's seven tiers of disaster recovery[41] released in 1992, were updated in 2012 by IBM as an eight tier model:[42]
- Tier 0 – No off-site data • Businesses with a Tier 0 Disaster Recovery solution have no Disaster Recovery Plan. There is no saved information, no documentation, no backup hardware, and no contingency plan. Typical recovery time: The length of recovery time in this instance is unpredictable. In fact, it may not be possible to recover at all.
- Tier 1 – Data backup with no Hot Site • Businesses that use Tier 1 Disaster Recovery solutions back up their data at an off-site facility. Depending on how often backups are made, they are prepared to accept several days to weeks of data loss, but their backups are secure off-site. However, this Tier lacks the systems on which to restore data. Pickup Truck Access Method (PTAM).
- Tier 2 – Data backup with Hot Site • Tier 2 Disaster Recovery solutions make regular backups on tape. This is combined with an off-site facility and infrastructure (known as a hot site) in which to restore systems from those tapes in the event of a disaster. This tier solution will still result in the need to recreate several hours to days worth of data, but it is less unpredictable in recovery time. Examples include: PTAM with Hot Site available, IBM Tivoli Storage Manager.
- Tier 3 – Electronic vaulting • Tier 3 solutions utilize components of Tier 2. Additionally, some mission-critical data is electronically vaulted. This electronically vaulted data is typically more current than that which is shipped via PTAM. As a result, there is less data recreation or loss after a disaster occurs.
- Tier 4 – Point-in-time copies • Tier 4 solutions are used by businesses that require both greater data currency and faster recovery than users of lower tiers. Rather than relying largely on shipping tape, as is common in the lower tiers, Tier 4 solutions begin to incorporate more disk-based solutions. Several hours of data loss is still possible, but it is easier to make such point-in-time (PIT) copies with greater frequency than data that can be replicated through tape-based solutions.
- Tier 5 – Transaction integrity • Tier 5 solutions are used by businesses with a requirement for consistency of data between production and recovery data centers. There is little to no data loss in such solutions; however, the presence of this functionality is entirely dependent on the application in use.
- Tier 6 – Zero or little data loss • Tier 6 Disaster Recovery solutions maintain the highest levels of data currency. They are used by businesses with little or no tolerance for data loss and who need to restore data to applications rapidly. These solutions have no dependence on the applications to provide data consistency.
- Tier 7 – Highly automated, business-integrated solution • Tier 7 solutions include all the major components being used for a Tier 6 solution with the additional integration of automation. This allows a Tier 7 solution to ensure consistency of data above that of which is granted by Tier 6 solutions. Additionally, recovery of the applications is automated, allowing for restoration of systems and applications much faster and more reliably than would be possible through manual Disaster Recovery procedures.
Solution design
[edit]Two main requirements from the impact analysis stage are:
- For IT: the minimum application and data requirements and the time in which they must be available.
- Outside IT: preservation of hard copy (such as contracts). A process plan must consider skilled staff and embedded technology.
This phase overlaps with disaster recovery planning.
The solution phase determines:
- Crisis management command structure
- Telecommunication architecture between primary and secondary work sites
- Data replication methodology between primary and secondary work sites
- Backup site with applications, data and work space
Standards
[edit]ISO Standards
[edit]There are many standards that are available to support business continuity planning and management.[43][44] The International Organization for Standardization (ISO) has for example developed a whole series of standards on Business continuity management systems [45] under responsibility of technical committee ISO/TC 292:
- ISO 22300:2021 Security and resilience – Vocabulary (Replaces ISO 22300:2018 Security and resilience - Vocabulary and ISO 22300:2012 Security and resilience - Vocabulary.)[46]
- ISO 22301:2019 Security and resilience – Business continuity management systems – Requirements (Replaces ISO 22301:2012.)[47]
- ISO 22313:2020 Security and resilience – Business continuity management systems – Guidance on the use of ISO 22301 (Replaces ISO 22313:2012 Security and resilience - Business continuity management systems - Guidance on the use of ISO 22301.)[48]
- ISO/TS 22317:2021 Security and resilience – Business continuity management systems – Guidelines for business impact analysis - (Replaces ISO/TS 22315:2015 Societal security – Business continuity management systems – Guidelines for business impact analysis.)[49]
- ISO/TS 22318:2021 Security and resilience – Business continuity management systems – Guidelines for supply chain continuity (Replaces ISO/TS 22318:2015 Societal security — Business continuity management systems — Guidelines for supply chain continuity.)[50]
- ISO/TS 22330:2018 Security and resilience – Business continuity management systems – Guidelines for people aspects on business continuity (Current as of 2022.)[51]
- ISO/TS 22331:2018 Security and resilience – Business continuity management systems – Guidelines for business continuity strategy - (Current as of 2022.)[52]
- ISO/TS 22332:2021 Security and resilience – Business continuity management systems – Guidelines for developing business continuity plans and procedures (Current as of 2022.)[53]
- ISO/IEC/TS 17021-6:2014 Conformity assessment – Requirements for bodies providing audit and certification of management systems – Part 6: Competence requirements for auditing and certification of business continuity management systems.[54]
- ISO/IEC 24762:2008 Information technology — Security techniques — Guidelines for information and communications technology disaster recovery services (withdrawn)[55]
- ISO/IEC 27001:2022 Information security, cybersecurity and privacy protection — Information security management systems — Requirements. (Replaces ISO/IEC 27001:2013 Information technology — Security techniques — Information security management systems — Requirements.)[56]
- ISO/IEC 27002:2022 Information security, cybersecurity and privacy protection — Information security controls. (Replaces ISO/IEC 27002:2013 Information technology — Security techniques — Code of practice for information security controls.)[57]
- ISO/IEC 27031:2011 Information technology – Security techniques – Guidelines for information and communication technology readiness for business continuity.[58]
- ISO/PAS 22399:2007 Societal security - Guideline for incident preparedness and operational continuity management (withdrawn)[59]
- IWA 5:2006 Emergency Preparedness (withdrawn)[60]
British standards
[edit]The British Standards Institution (BSI Group) released a series of standards which have since been withdrawn and replaced by the ISO standards above.
- BS 7799-1:1995 - peripherally addressed information security procedures. (withdrawn)[61]
- BS 25999-1:2006 - Business continuity management Part 1: Code of practice (superseded, withdrawn)[62]
- BS 25999-2:2007 Business Continuity Management Part 2: Specification (superseded, withdrawn)[63]
- 2008: BS 25777, Information and communications technology continuity management. Code of practice. (withdrawn)[64]
Within the UK, BS 25999-2:2007 and BS 25999-1:2006 were being used for business continuity management across all organizations, industries and sectors. These documents give a practical plan to deal with most eventualities—from extreme weather conditions to terrorism, IT system failure, and staff sickness.[65]
In 2004, following crises in the preceding years, the UK government passed the Civil Contingencies Act of 2004: Businesses must have continuity planning measures to survive and continue to thrive whilst working towards keeping the incident as minimal as possible. The Act was separated into two parts: Part 1: civil protection, covering roles & responsibilities for local responders Part 2: emergency powers.[66] In the United Kingdom, resilience is implemented locally by the Local Resilience Forum.[67]
Australian standards
[edit]- HB 292–2006, "A practitioners guide to business continuity management"[68]
- HB 293–2006, "Executive guide to business continuity management"[69]
United States
[edit]- NFPA 1600 Standard on Disaster/Emergency Management and Business Continuity Programs (2010). National Fire Protection Association. (superseded).[70]
- NFPA 1600, Standard on Continuity, Emergency, and Crisis Management (2019, current standard), National Fire Protection Association.[71]
- Continuity of Operations (COOP) and National Continuity Policy Implementation Plan (NCPIP), United States Federal Government[72][73][74]
- Business Continuity Planning Suite, DHS National Protection and Programs Directorate and FEMA.[75][76][77][72]
- ASIS SPC.1-2009, Organizational Resilience: Security, Preparedness, and Continuity Management Systems - Requirements with Guidance for Use, American National Standards Institute[78]
Implementation and testing
[edit]The implementation phase involves policy changes, material acquisitions, staffing and testing.
Testing and organizational acceptance
[edit]The 2008 book Exercising for Excellence, published by The British Standards Institution identified three types of exercises that can be employed when testing business continuity plans.
- Tabletop exercises - a small number of people concentrate on a specific aspect of a BCP. Another form involves a single representative from each of several teams.
- Medium exercises - Several departments, teams or disciplines concentrate on multiple BCP aspects; the scope can range from a few teams from one building to multiple teams operating across dispersed locations. Pre-scripted "surprises" are added.
- Complex exercises - All aspects of a medium exercise remain, but for maximum realism no-notice activation, actual evacuation and actual invocation of a disaster recovery site is added.
While start and stop times are pre-agreed, the actual duration might be unknown if events are allowed to run their course.
Maintenance
[edit]Biannual or annual maintenance cycle maintenance of a BCP manual[79] is broken down into three periodic activities.
- Confirmation of information in the manual, roll out to staff for awareness and specific training for critical individuals.
- Testing and verification of technical solutions established for recovery operations.
- Testing and verification of organization recovery procedures.
Issues found during the testing phase often must be reintroduced to the analysis phase.
Information and targets
[edit]The BCP manual must evolve with the organization, and maintain information about who has to know what:
- A series of checklists
- Job descriptions, skillsets needed, training requirements
- Documentation and document management
- Definitions of terminology to facilitate timely communication during disaster recovery,[80]
- Distribution lists (staff, important clients, vendors/suppliers)
- Information about communication and transportation infrastructure (roads, bridges)[81]
Technical
[edit]Specialized technical resources must be maintained. Checks include:
- Virus definition distribution
- Application security and service patch distribution
- Hardware operability
- Application operability
- Data verification
- Data application
Testing and verification of recovery procedures
[edit]Software and work process changes must be documented and validated, including verification that documented work process recovery tasks and supporting disaster recovery infrastructure allow staff to recover within the predetermined recovery time objective.[82]
See also
[edit]References
[edit]- ^ BCI Good Practice Guidelines 2013, quoted in Mid Sussex District Council, Business Continuity Policy Statement Archived 2022-01-20 at the Wayback Machine, published April 2018, accessed 19 February 2021
- ^ "How to Build an Effective and Organized Business Continuity Plan". Forbes. June 26, 2015.
- ^ "Surviving a Disaster" (PDF). American Bar.org (American Bar Association). 2011. Archived (PDF) from the original on 2022-10-09.
- ^ Elliot, D.; Swartz, E.; Herbane, B. (1999) Just waiting for the next big bang: business continuity planning in the UK finance sector. Journal of Applied Management Studies, Vol. 8, No, pp. 43–60. Here: p. 48.
- ^ Alan Berman (March 9, 2015). "Constructing a Successful Business Continuity Plan". Business Insurance Magazine. Archived from the original on August 4, 2024. Retrieved February 4, 2019.
- ^ "Business Continuity Plan". United States Department of Homeland Security. Archived from the original on 7 December 2018. Retrieved 4 October 2018.
- ^ a b Ian McCarthy; Mark Collard; Michael Johnson (2017). "Adaptive organizational resilience: an evolutionary perspective". Current Opinion in Environmental Sustainability. 28: 33–40. Bibcode:2017COES...28...33M. doi:10.1016/j.cosust.2017.07.005.
- ^ Intrieri, Charles (10 September 2013). "Business Continuity Planning". Flevy. Retrieved 29 September 2013.
- ^ "Continuity Resources and Technical Assistance | FEMA.gov". www.fema.gov.
- ^ a b "A Guide to the preparation of a Business Continuity Plan" (PDF). Archived from the original (PDF) on 2019-02-09. Retrieved 2019-02-08.
- ^ "Business Continuity Planning (BCP) for Businesses of all Sizes". 19 April 2017. Archived from the original on 24 April 2017. Retrieved 28 April 2017.
- ^ Yossi Sheffi (October 2005). The Resilient Enterprise: Overcoming Vulnerability for Competitive Enterprise. MIT Press.
- ^ "Transform. The Resilient Economy". Archived from the original on 2013-10-22. Retrieved 2019-02-04.
- ^ "Newsday | Long Island's & NYC's News Source | Newsday". Newsday.
- ^ Tiffany Braun; Benjamin Martz (2007). "Business Continuity Preparedness and the Mindfulness State of Mind". AMCIS 2007 Proceedings. S2CID 7698286.
"An estimated 80 percent of companies without a well-conceived and tested business continuity plan, go out of business within two years of a major disaster" (Santangelo 2004)
- ^ "Annex A.17: Information Security Aspects of Business Continuity Management". ISMS.online. November 2021.
- ^ "Communication and resilience: concluding thoughts and key issues for future research". www.researchgate.net.
- ^ Buzzanell, Patrice M. (2010). "Resilience: Talking, Resisting, and Imagining New Normalcies Into Being". Journal of Communication. 60 (1): 1–14. doi:10.1111/j.1460-2466.2009.01469.x. ISSN 1460-2466.
- ^ Buzzanell, Patrice M. (March 2010). "Resilience: Talking, Resisting, and Imagining New Normalcies Into Being". Journal of Communication. 60 (1): 1–14. doi:10.1111/j.1460-2466.2009.01469.x. ISSN 0021-9916.
- ^ Buzzanell, Patrice M. (2018-01-02). "Organizing resilience as adaptive-transformational tensions". Journal of Applied Communication Research. 46 (1): 14–18. doi:10.1080/00909882.2018.1426711. ISSN 0090-9882. S2CID 149004681.
- ^ "Resilient Organisations". March 22, 2011.
- ^ "Resilience Diagnostic". November 28, 2017. Archived from the original on March 11, 2024. Retrieved February 4, 2019.
- ^ ISO, ISO 22301 Business Continuity Management: Your implementation guide, published, accessed 20 February 2021
- ^ https://www.iso.org/obp/ui/en/#iso:std:iso:22313:ed-2:v1:en [bare URL]
- ^ "Emergency Planning" (PDF). Archived (PDF) from the original on 2022-10-09.
- ^ Helen Clark (August 15, 2012). "Can your Organization survive a natural disaster?" (PDF). RI.gov. Archived (PDF) from the original on 2022-10-09.
- ^ "Iso/Ts 22317:2021".
- ^ May, Richard. "Finding RPO and RTO". Archived from the original on 2016-03-03.
- ^ "Maximum Acceptable Outage (Definition)". riskythinking.com. Albion Research Ltd. Retrieved 4 October 2018.
- ^ "BIA Instructions, BUSINESS CONTINUITY MANAGEMENT - WORKSHOP" (PDF). driecentral.org. Disaster Recovery Information Exchange (DRIE) Central. Archived (PDF) from the original on 2022-10-09. Retrieved 4 October 2018.
- ^ "Plain English ISO 22301 2012 Business Continuity Definitions". praxiom.com. Praxiom Research Group LTD. Archived from the original on 13 May 2020. Retrieved 4 October 2018.
- ^ "Baseline Cyber Security Controls" (PDF). Ministry of Interior - National Cyber Security Center. 2022. p. 12.
- ^ "The Rise and Rise of the Recovery Consistency Objective". 2016-03-22. Archived from the original on 2020-09-26. Retrieved September 9, 2019.
- ^ "How to evaluate a recovery management solution." West World Productions, 2006 [1]
- ^ Josh Krischer; Donna Scott; Roberta J. Witty. "Six Myths About Business Continuity Management and Disaster Recovery" (PDF). Gartner Research. Archived (PDF) from the original on 2022-10-09.
- ^ https://dri.ca/docs/ISO_DIS_22301_%28E%29.pdf [bare URL PDF]
- ^ "Medical supply location and distribution in disasters". doi:10.1016/j.ijpe.2009.10.004.
{{cite journal}}: Cite journal requires|journal=(help)[clarification needed] - ^ "transportation planning in disaster recovery". SCHOLAR.google.com. Archived from the original on 2022-10-09.
- ^ "PLANNING SCENARIOS Executive Summaries" (PDF). Archived (PDF) from the original on 2022-10-09.
- ^ Chloe Demrovsky (December 22, 2017). "Holding It All Together". Manufacturing Business Technology.
- ^ developed by SHARE's Technical Steering Committee, working with IBM
- ^ Ellis Holman (March 13, 2012). "A Business Continuity Solution Selection Methodology" (PDF). IBM Corp. Archived (PDF) from the original on 2022-10-09.
- ^ Tierney, Kathleen (21 November 2012). "Disaster Governance: Social, Political, and Economic Dimensions". Annual Review of Environment and Resources. 37 (1): 341–363. doi:10.1146/annurev-environ-020911-095618. ISSN 1543-5938. S2CID 154422711.
- ^ Partridge, Kevin G.; Young, Lisa R. (2011). CERT® Resilience Management Model (RMM) v1.1: Code of Practice Crosswalk Commercial Version 1.1 (PDF). Pittsburgh, PA: Carnegie Mellon University. Retrieved 5 January 2023.
- ^ "ISO - ISO/TC 292 - Security and resilience". International Organization for Standardization.
- ^ "ISO 22300:2018". ISO. 12 July 2019.
- ^ "ISO 22301:2019". ISO. 5 June 2023.
- ^ "ISO 22313:2020". ISO.
- ^ "Iso/Ts 22317:2021".
- ^ "Iso/Ts 22318:2021".
- ^ "ISO/TS 22330:2018". ISO. 12 July 2019.
- ^ "ISO/TS 22331:2018". ISO.
- ^ "Iso/Ts 22332:2021".
- ^ "ISO/IEC TS 17021-6:2014". ISO.
- ^ "ISO/IEC 24762:2008". ISO. 6 March 2008. Retrieved 5 January 2023.
- ^ "ISO/IEC 27001:2022". ISO. Retrieved 5 January 2023.
- ^ "ISO/IEC 27002:2022". ISO. Retrieved 5 January 2023.
- ^ "ISO/IEC 27031:2011". ISO. 5 September 2016. Retrieved 5 January 2023.
- ^ "ISO/PAS 22399:2007". ISO. 18 June 2012. Retrieved 5 January 2023.
- ^ "IWA 5:2006". ISO. Retrieved 5 January 2023.
- ^ "BS 7799-1:1995 Information security management - Code of practice for information security management systems". BSI Group. Retrieved 5 January 2023.
- ^ "BS 25999-1:2006 Business continuity management - Code of practice". BSI Group. Retrieved 5 January 2023.
- ^ "BS 25999-2:2007 (USA Edition) Business continuity management - Specification". BSI Group. Retrieved 5 January 2023.
- ^ "BS 25777:2008 (Paperback) Information and communications technology continuity management. Code of practice". BSI Group. Retrieved 5 January 2023.
- ^ British Standards Institution (2006). Business continuity management-Part 1: Code of practice :London
- ^ Cabinet Office. (2004). overview of the Act. In: Civil Contingencies Secretariat Civil Contingencies Act 2004: a short. London: Civil Contingencies Secretariat
- ^ "July 2013 (V2) The role of Local Resilience Forums: A reference document" (PDF). Cabinet Office. Retrieved 5 January 2023.
- ^ "HB HB 292—2006 Executive Guide to Business Continuity Management" (PDF). Standards Australia. Archived from the original (PDF) on 19 October 2023. Retrieved 5 January 2023.
- ^ "HB 293—2006 Executive Guide to Business Continuity Management" (PDF). Standards Australia. Retrieved 5 January 2023.[permanent dead link]
- ^ NFPA 1600, Standard on Disaster/Emergency Management and Business Continuity Programs (PDF) (2010 ed.). Quincy, MA: National Fire Protection Association. 2010. ISBN 978-161665005-6. Archived from the original (PDF) on 2023-01-04. Retrieved 2023-01-04.
- ^ "A Comprehensive Overview of the NFPA 1600 Standard". AlertMedia. 29 January 2019. Retrieved 4 January 2023.
- ^ a b "Business Continuity Plan | Ready.gov". www.ready.gov. Retrieved 5 January 2023.
- ^ "NATIONAL CONTINUITY POLICY IMPLEMENTATION PLAN Homeland Security Council August 2007" (PDF). FEMA. Archived from the original (PDF) on December 22, 2014. Retrieved 5 January 2023.
- ^ "Continuity Resources and Technical Assistance | FEMA.gov". FEMA. Retrieved 5 January 2023.
- ^ "Continuity of operations: An overview" (PDF). FEMA. Retrieved 5 January 2023.
- ^ "Business | Ready.gov". www.ready.gov. Retrieved 5 January 2023.
- ^ "Business Continuity Planning Suite | Ready.gov". www.ready.gov. Retrieved 5 January 2023.
- ^ ASIS SPC.1-2009 Organizational Resilience: Security, Preparedness, and Continuity Management Systems - Requirements with Guidance for Use (PDF). American National Standards Institute. 2009. ISBN 978-1-887056-92-2.
- ^ "Business Continuity Plan Template".
- ^ "Glossary | DRI International". drii.org.
- ^ "Disaster Recovery Plan Checklist" (PDF). CMS.gov. Archived (PDF) from the original on 2022-10-09.
- ^ Othman. "Validation of a Disaster Management Metamodel (DMM)". SCHOLAR.google.com.
Further reading
[edit]- James C. Barnes (2001-06-08). A Guide to Business Continuity Planning. Wiley. ISBN 978-0471530152.
- Kenneth L Fulmer (2004-10-04). Business Continuity Planning, A Step-by-Step Guide. Rothstein. ISBN 978-1931332217.
- Richard Kepenach. Business Continuity Plan Design, 8 Steps for Getting Started Designing a Plan.
- Judy Bell (October 1991). Disaster Survival Planning: A Practical Guide for Businesses. Disaster Survival Planning, Incorporated. ISBN 978-0963058003.
- Dimattia, S. (November 15, 2001). "Planning for Continuity". Library Journal. 126 (19): 32–34.
- Andrew Zolli; Ann Marie Healy (2013). Resilience: Why Things Bounce Back. Simon & Schuster. ISBN 978-1451683813.
- International Glossary for Resilience, DRI International.
External links
[edit]- The tiers of Disaster Recovery and TSM. Charlotte Brooks, Matthew Bedernjak, Igor Juran, and John Merryman. In, Disaster Recovery Strategies with Tivoli Storage Management. Chapter 2. Pages 21–36. Red Books Series. IBM. Tivoli Software. 2002.
- SteelStore Cloud Storage Gateway: Disaster Recovery Best Practices Guide. Riverbed Technology, Inc. October 2011.
- Disaster Recovery Levels. Robert Kern and Victor Peltz. IBM Systems Magazine. November 2003.
- Business Continuity: The 5-tiers of Disaster Recovery. Archived 2018-09-26 at the Wayback Machine Recovery Specialties. 2007.
- Continuous Operations: The Seven Tiers of Disaster Recovery. Mary Hall. The Storage Community (IBM). 18 July 2011. Retrieved 26 March 2013.
- Maximum Tolerable Period of Disruption (MTPOD)
- Maximum Tolerable Period of Disruption (MTPOD): BSI committee response
- Wayback Machine
- Janco Associates
- Business Continuity Plan
- Department of Homeland Security Emergency Plan Guidelines
- CIDRAP/SHRM Pandemic HR Guide Toolkit Archived 2013-05-18 at the Wayback Machine
- Adapt and respond to risks with a business continuity plan (BCP)
Business continuity planning
View on GrokipediaIntroduction
Definition and Scope
Business continuity planning (BCP) is a strategic process designed to ensure that an organization's critical business functions can continue operating during and after a disruption, such as natural disasters, cyber incidents, or supply chain failures. According to the National Institute of Standards and Technology (NIST), a BCP consists of documented procedures that outline how mission-essential processes will be sustained, focusing on the overall resilience of business operations rather than isolated technical elements.[7] Similarly, the Business Continuity Institute (BCI) defines business continuity as the capability to deliver products and services at predefined levels within acceptable timeframes following an incident, as aligned with ISO 22301 standards.[2] This process integrates risk identification, impact assessment, and recovery strategies to maintain organizational viability. The scope of BCP extends across prevention, response, recovery, and resumption phases, encompassing all critical business processes and supporting resources enterprise-wide. It addresses potential threats by developing frameworks to protect against disruptions and enable swift restoration to normal or near-normal operations, including coordination with external stakeholders like suppliers and regulators.[8] Unlike narrower IT-focused plans, BCP's breadth ensures holistic coverage of human, physical, and informational assets, prioritizing the continuity of value-creating activities.[6] Key objectives of BCP include minimizing operational downtime, safeguarding physical and intellectual assets, and protecting the safety of employees and stakeholders during crises. By proactively identifying vulnerabilities and establishing recovery priorities, organizations can reduce financial losses and reputational damage, while enhancing overall resilience to meet legal and contractual obligations.[8] For instance, effective BCP aims to limit the impact of disruptions to tolerable levels, ensuring compliance with regulations in sectors like finance and healthcare.[6] BCP is distinct from related disciplines such as disaster recovery (DR), which primarily concentrates on restoring IT systems and data after a failure, whereas BCP addresses broader business processes and operational continuity.[7] It also differs from crisis management, which handles immediate tactical responses to acute events like public relations issues, while BCP emphasizes sustained operations and long-term recovery planning.[2] This differentiation allows organizations to layer these approaches for comprehensive risk mitigation.Historical Evolution
Business continuity planning (BCP) originated in the 1970s amid Cold War-era concerns over potential disruptions to critical infrastructure, particularly in government and financial sectors where contingency planning emphasized protecting electronic data processing systems from technological failures.[9] Early practices focused on reactive disaster recovery for mainframe computers, such as backups and standby sites, driven by the adoption of IBM 360/370 systems and regulations like the U.S. Foreign Corrupt Practices Act of 1977, which mandated record protection.[9] This period marked the shift from ad hoc crisis responses to structured IT-focused continuity efforts in organizations heavily reliant on centralized information processing.[10] The 1980s and 1990s saw accelerated growth in BCP due to high-profile disruptions, including the 1987 stock market crash, which exposed vulnerabilities in financial operations, and the Y2K millennium bug fears that prompted widespread testing and formalization of plans across industries.[11] Events like the 1988 Illinois Bell fire underscored third-party risks, leading to compliance-driven frameworks such as the U.S. Office of the Comptroller of the Currency's BC-177 policy in 1983, while the 1990 London Stock Exchange bombing highlighted needs beyond IT recovery.[9] By the late 1990s, BCP evolved into organization-wide strategies integrating business processes, moving from isolated disaster recovery to value-oriented approaches that considered stakeholder impacts and regulatory demands.[10] The September 11, 2001, attacks dramatically accelerated BCP adoption, emphasizing holistic risk management and enterprise resilience in response to large-scale, multi-hazard events affecting physical infrastructure, personnel, and markets.[12] In the financial sector, this led to requirements for geographic diversity in operations, split-site models for real-time continuity, and coordinated testing with regulators, as outlined in 2002 interagency guidelines from the Federal Reserve and others.[12] Post-2001 regulations and standards, such as BS 25999 in 2006, further institutionalized proactive planning across sectors. This evolution culminated in the international standard ISO 22301, published in 2012, which provided a comprehensive framework for business continuity management systems (BCMS) and was later revised in 2019.[9][13][4] In the 2010s and 2020s, BCP frameworks incorporated emerging threats like cyberattacks, pandemics, and supply chain vulnerabilities, with the 2020 COVID-19 outbreak revealing gaps in workforce health, remote operations, and global logistics, prompting updates such as enhanced digital tools and agility-focused actions in 50 leading companies.[14] Cyber threats drove integrations with cybersecurity standards, including NIST guidelines for contingency planning that address event recovery from digital disruptions.[15] Supply chain resilience became a priority, with 64% of supply chain executives anticipating acceleration of digital transformation due to the pandemic, as per a 2020 survey.[16] From 2023 onward, frameworks continued to evolve with the DRI International's updated Professional Practices in 2023 focusing on integrated resilience, BCI reports in 2023 and 2025 underscoring strategic expansion and climate integration, and regulatory shifts like the 2024 JAS-ANZ updates requiring climate risk assessment in BCP. These adaptations address emerging challenges including AI, geopolitics, and environmental disruptions as of 2025.[17][18][19][20] Overall, BCP has evolved from reactive, technology-centric measures to proactive, resilience-based strategies that anticipate and adapt to interconnected risks.[9]Core Concepts
Resilience and Continuity
Organizational resilience refers to an organization's capacity to anticipate, respond to, absorb, and recover from disruptions while preserving its fundamental purpose, values, and integrity.[21] This capability is achieved through adaptive strategies, robust systems, and a resilient culture that enable navigation of adversity, such as natural disasters or economic shifts.[21] According to ISO 22316:2017, it encompasses the ability to absorb and adapt to change to deliver on objectives, survive, and prosper amid uncertainties.[22] Business continuity, in contrast, is the capability of an organization to continue delivering products and services within acceptable time frames at a predefined capacity during a disruption.[2] It focuses on maintaining essential functions and providing uninterrupted critical services and support while preserving organizational viability before, during, and after events that disrupt normal operations.[23] This ensures that key business processes remain operational at an acceptable level, minimizing the impact of crises on stakeholders and value creation.[2] Resilience and continuity are interrelated, with resilience serving as a broader foundation that enables continuity through adaptive capacities such as redundancy and flexibility.[22] Business continuity acts as a key component of organizational resilience, providing the operational mechanisms to sustain functions during disruptions, while resilience enhances continuity by fostering proactive adaptation and recovery.[22] For instance, redundant systems, like backup power supplies or duplicated data centers, build resilience by preventing single points of failure and allowing seamless operation during outages.[4] Similarly, alternate sites—facilities equipped to serve as temporary operational hubs when the primary location is inaccessible—support continuity by enabling the relocation of essential functions with minimal interruption.[24] A critical metric for measuring continuity is the Recovery Time Objective (RTO), defined as the maximum acceptable length of time that can elapse before the lack of a business function severely impacts the organization.[25] In the context of business continuity planning, RTO specifies the targeted duration for restoring systems or processes after a disruption, ensuring alignment with predefined tolerable downtime levels.[26] For example, an RTO of four hours for a financial transaction system indicates the maximum allowable recovery phase without compromising mission-critical operations.[26] This objective guides the design of recovery strategies, prioritizing resources based on the potential business impact of extended downtime.[25]Key Terminology
Business continuity planning (BCP) relies on a standardized set of terms to ensure precise communication and alignment across organizational functions. These terms, often derived from international standards like ISO 22301, help delineate the boundaries of disruption tolerance and recovery strategies.[4] Maximum Acceptable Outage (MAO) refers to the maximum duration an organization can tolerate a disruption to a critical process or system before it jeopardizes mission objectives or viability. This metric, also known as the Maximum Tolerable Period of Disruption (MTPD) in ISO 22301, sets the upper limit for downtime, guiding the prioritization of recovery efforts.[27][28] Recovery Point Objective (RPO) defines the maximum acceptable amount of data loss measured in time, representing the point to which data must be restored after a disruption to resume operations without excessive impact. In IT-heavy contexts, RPO determines backup frequency; for instance, an RPO of four hours means no more than four hours of data can be lost. This term is consistently applied across industries, from finance to manufacturing, to quantify data tolerance in BCP frameworks.[29][4] Recovery Time Objective (RTO), often paired with RPO, specifies the targeted duration to restore a process or system to operational status following an interruption. Like RPO, RTO maintains uniformity in BCP terminology across sectors, enabling comparable recovery benchmarks; for example, e-commerce firms might target an RTO of one hour to minimize revenue loss.[29][28] Single Point of Failure (SPOF) describes a component, process, or resource whose failure would halt an entire system or operation, undermining overall resilience. Identifying SPOFs during planning is crucial, as their elimination through redundancy supports continuity in diverse environments like supply chains or data centers. Business Impact Analysis (BIA) evaluates the potential effects of disruptions on business functions, quantifying financial, operational, and reputational losses to prioritize recovery. In contrast, Risk Assessment (RA) identifies and evaluates threats and vulnerabilities that could cause those disruptions, focusing on likelihood and mitigation rather than impact severity. This distinction ensures BIA informs resource allocation while RA drives preventive controls.[30][31] Vital records encompass essential documents, data, and information required to sustain legal, financial, and operational continuity during and after a disruption, such as contracts, employee records, or intellectual property. These records must be protected through duplication and secure storage to enable rapid resumption of critical activities.[32][33] Crisis communication plan outlines predefined protocols for disseminating accurate information to stakeholders during a disruption, including message templates, spokesperson roles, and channels to manage internal and external perceptions. Integrated into broader BCP, it ensures coordinated responses that maintain trust and operational stability.[34][35]Planning Phases
Asset Inventory
Asset inventory is a foundational step in business continuity planning (BCP), involving the systematic cataloging of an organization's resources to understand what must be protected and recovered during disruptions. This process ensures that all elements essential to operations are documented, providing a comprehensive baseline for subsequent planning activities. According to the Business Continuity Institute's Good Practice Guidelines, Edition 7.0 (2023), asset inventory focuses on compiling details about resources that support critical functions, distinguishing between physical and non-physical items to avoid oversight of key dependencies. This aligns with Professional Practice 2 (Understanding the Organisation), which integrates asset identification into broader organizational analysis.[36] The identification of assets begins with a thorough review of organizational components, encompassing both tangible and intangible categories. Tangible assets include physical infrastructure such as facilities, IT hardware like servers and workstations, and equipment necessary for operations. Intangible assets cover non-physical elements, including data repositories, business processes, intellectual property, and human resources like skilled personnel. The Federal Deposit Insurance Corporation (FDIC) emphasizes developing comprehensive inventories of hardware, software, communications systems, data files, and vital records to capture these elements accurately. This step often involves cross-departmental interviews, physical audits, and documentation reviews to ensure completeness, with the Cybersecurity and Infrastructure Security Agency (CISA) recommending physical inspections and logical surveys for operational technology environments.[37][38] Once identified, assets are categorized by criticality to prioritize protection efforts, typically using a tiered system of high, medium, and low impact based on their role in supporting business functions. High-impact assets are those whose loss would severely impair core operations, such as primary data centers or key supply chain partners, while medium and low categories include supportive or redundant items. The BCI guidelines advocate assessing criticality through metrics like the maximum tolerable period of disruption, which helps in ranking assets without delving into detailed impact quantification. Dependencies are integrated into this categorization, documenting interrelations such as reliance on external suppliers or interconnected IT systems, to reveal potential single points of failure. For instance, inventorying supply chain vulnerabilities might highlight a critical vendor's facilities as a high-impact asset due to its influence on production continuity.[36] Tools for managing asset inventories range from basic spreadsheets for small-scale efforts to specialized asset management software that automates tracking and updates. CISA highlights the use of centralized databases with security controls to store attributes like location, manufacturer, and protocols, facilitating ongoing maintenance. The FDIC suggests uniform inventory templates to ensure consistency across departments, including details on outsourced relationships and backup requirements. These tools enable the inclusion of dynamic elements, such as evolving supplier chains, ensuring the inventory remains current through regular reviews and life cycle management processes.[38][37] The importance of a robust asset inventory lies in its role as essential input for business impact analysis (BIA) and risk assessment, providing the detailed resource map needed to evaluate potential disruptions. By establishing this foundation, organizations can identify vulnerabilities early, such as over-reliance on a single supplier in the supply chain, and allocate resources effectively for continuity strategies. The BCI notes that this inventory directly informs the design of recovery options, enhancing overall resilience without which BCP efforts risk incomplete coverage.[36]Business Impact Analysis
Business impact analysis (BIA) is a systematic process used in business continuity planning to identify and evaluate the potential effects of disruptions on critical business functions and processes. It focuses on determining the operational, financial, and non-financial consequences of interruptions, such as revenue loss from halted sales or reputational damage from prolonged service outages, to prioritize recovery efforts. By quantifying these impacts, organizations can establish priorities that align recovery strategies with overall business objectives.[39][40] The BIA process begins with gathering data on critical functions, often building briefly on an asset inventory to map dependencies. This involves conducting interviews with process owners, managers, and stakeholders, as well as distributing surveys or questionnaires to assess the importance of each business process to organizational missions. Key steps include validating mission-critical processes, such as payroll processing or customer order fulfillment, and evaluating their resource requirements, including personnel, equipment, and facilities. Processes are then prioritized based on the severity of potential impacts, using criteria like downtime tolerance to rank them from high to low criticality.[39][40] Impacts are quantified by assessing both tangible financial losses, such as increased expenses or lost revenue (e.g., daily sales figures multiplied by outage duration), and intangible effects like customer dissatisfaction or regulatory non-compliance penalties. For instance, a disruption to a core manufacturing process might result in moderate financial impact estimated at $500,000 over 24 hours, alongside severe reputational harm from delayed deliveries. This analysis ensures consistency with organizational goals by cross-referencing impacts against strategic priorities, such as maintaining market share or complying with service-level agreements, to avoid over- or under-prioritizing functions.[39] Key outputs of the BIA include the recovery time objective (RTO) and recovery point objective (RPO), which guide recovery strategy design. The RTO represents the maximum acceptable amount of time a business process can be disrupted before causing unacceptable impacts, calculated as the duration from the onset of disruption to full operational recovery (e.g., 48 hours for a vital financial reporting function). The RPO defines the maximum tolerable period of data loss, measured backward from the time of disruption to the most recent point of data recovery, such as the last backup interval (e.g., 12 hours of potential data unavailability). These metrics are derived directly from impact assessments and must be realistic given available resources.[39]Risk Assessment
Risk assessment is a critical component of business continuity planning (BCP), involving the systematic identification, analysis, and evaluation of potential threats that could interrupt organizational operations.[4] This process helps organizations understand vulnerabilities and determine the necessary resources for maintaining continuity during disruptions. According to ISO 22301:2019/Amd 1:2024, the international standard for business continuity management systems—which includes updates for climate action changes—the risk assessment must be conducted regularly to align with the organization's context and objectives, incorporating climate-related risks such as extreme weather events into threat evaluation.[4][5] Risk identification techniques commonly employed in BCP include brainstorming sessions, SWOT analysis, and threat modeling. Brainstorming involves collaborative workshops where stakeholders generate ideas on potential disruptions, fostering diverse perspectives to uncover hidden vulnerabilities.[36] SWOT analysis evaluates internal strengths and weaknesses alongside external opportunities and threats, providing a structured framework to pinpoint risks such as supply chain dependencies.[36] Threat modeling, often used in information security contexts, maps out specific attack vectors or failure points, such as natural disasters (e.g., floods or earthquakes), cyber attacks (e.g., ransomware), and human errors (e.g., operator mistakes leading to system failures). These methods ensure a comprehensive catalog of threats, including both internal factors like equipment malfunctions and external ones like power outages or sabotage. Once identified, risks are evaluated using a likelihood versus impact matrix, which categorizes threats based on their probability of occurrence and potential severity. Qualitative scales typically rate likelihood as low (unlikely), medium (possible), or high (likely), while impact is assessed as low (minimal disruption), medium (moderate operational effects), or high (severe business interruption).[36] For more precision, semi-quantitative scoring assigns numerical values, such as 1-5 for likelihood and 1-5 for impact, allowing for a visual heat map where high-likelihood, high-impact risks appear in the upper-right quadrant.[36] This evaluation draws on data from business impact analysis to quantify consequences like financial loss or downtime. Prioritization follows evaluation through a risk scoring formula, commonly defined as Risk Score = Likelihood × Impact, which ranks threats to focus resources on the most critical ones.[36] For instance, a cyber attack with high likelihood (score of 4) and high impact (score of 5) yields a risk score of 20, placing it above a low-likelihood natural disaster (score of 1 × 3 = 3).[36] This approach, aligned with ISO 22301:2019/Amd 1:2024, enables organizations to allocate efforts efficiently without overlooking lower-scoring risks that could compound over time.[4][5] Basic mitigation measures identified during risk assessment include preventive controls such as insurance to transfer financial risks from high-impact events like natural disasters.[4] Other foundational controls involve redundancy in critical systems or access restrictions to reduce human error vulnerabilities, serving as initial steps before full strategy development.Strategy Development
Impact Scenarios
Impact scenarios in business continuity planning (BCP) refer to hypothetical disruptions used to evaluate the potential effects on organizational operations and test the robustness of continuity assumptions. These scenarios are derived from outputs of the risk assessment phase, where threats are identified and prioritized based on their likelihood and severity.[40] Disruption scenarios are categorized into internal, external, and cascading types to encompass a broad range of potential threats. Internal scenarios involve disruptions originating within the organization, such as IT system failures or power outages that halt critical processes like data processing. External scenarios arise from outside factors, including natural disasters like floods or pandemics that can overwhelm infrastructure and workforce availability. Cascading scenarios represent chain reactions where an initial disruption triggers secondary effects, for example, a supply chain interruption compounded by a cyberattack, amplifying downtime across multiple functions.[41][42] The development of impact scenarios focuses on both worst-case and most-likely events to ensure comprehensive coverage, drawing directly from risk assessment findings to prioritize those with high potential impact on essential operations. Organizations simulate these scenarios through modeling or exercises to assess effects on critical functions, such as revenue loss, regulatory non-compliance, or reputational damage. A prominent real-world example is the 2020 COVID-19 pandemic, which served as a global external scenario forcing rapid shifts to remote work and exposing vulnerabilities in supply chains and employee health protocols for many businesses.[43][44] By analyzing these scenarios, BCP teams identify gaps in current capabilities, such as inadequate remote access tools or unaddressed interdependencies, thereby informing targeted enhancements to continuity strategies without prescribing specific solutions. This process ensures that plans are resilient to a variety of disruptions, enhancing overall organizational preparedness.[45]Preparedness Tiers
Business continuity preparedness tiers provide a framework for organizations to assess and structure their recovery capabilities based on potential disruptions identified through impact scenarios. These tiers, adapted from standard seven-tier disaster recovery models, range from basic reactive measures to advanced proactive strategies, enabling tailored approaches to minimize downtime and maintain operations. The model emphasizes escalating levels of redundancy, automation, and planning sophistication.[46] Tier 1: Basic Reactive Recovery focuses on fundamental data protection through off-site backups without dedicated recovery infrastructure. Organizations at this level rely on manual restoration processes, such as tape or cloud backups, which can take days or weeks to implement following a disruption. This tier suits low-risk environments where extended recovery times are tolerable, but it exposes businesses to significant data loss and operational interruptions.[47] Tier 2: Planned Continuity with Alternates incorporates predefined alternate sites or resources, such as hot sites, alongside regular backups to enable more predictable recovery within hours to a day. This level involves coordinated planning for failover to secondary locations, reducing manual intervention and improving reliability over Tier 1. It balances cost and preparedness for organizations facing moderate disruption risks.[47] Tier 3: Electronic Vaulting employs electronic vaulting to automatically transfer backup data to a secure off-site location, such as a remote data center or cloud, using near-real-time or regular interval backups. This tier achieves faster recovery times, typically within 24 hours, and reduces manual effort compared to lower tiers through integrated automation and monitoring. It is essential for operations requiring improved reliability without full real-time synchronization.[48] Selection of a preparedness tier is influenced by organizational size, industry-specific regulations, and overall risk exposure. Smaller organizations with limited resources often default to Tier 1, as it requires minimal investment while providing essential safeguards against total failure. In contrast, regulated sectors like finance demand higher-tier compliance (e.g., beyond Tier 3) to meet mandates for rapid recovery and data integrity, as outlined by bodies such as FINRA, which require business continuity plans scaled to operational complexity.[49][50] Illustrative examples highlight tier applicability: A small retail business might adopt Tier 1, using periodic off-site backups to restore operations after events like floods, accepting potential short-term closures. Hospitals, however, typically implement advanced tiers with automated systems for real-time failover in electronic health records and critical equipment, ensuring uninterrupted patient care during outages as emphasized in healthcare continuity guidelines.[51][50] Organizations advance through preparedness tiers progressively by leveraging maturity models that guide incremental enhancements. Starting from ad-hoc responses, businesses conduct gap analyses, invest in technology upgrades, and foster a resilience culture through training and audits, potentially moving from Tier 1 to higher levels over several years as resources and threats evolve. This staged progression aligns with frameworks like the Business Continuity Maturity Model, promoting sustained improvement in readiness.[52]Solution Design
Solution design in business continuity planning involves developing specific strategies and technical solutions to mitigate risks identified through prior assessments, ensuring organizational operations can resume within defined tolerances. These designs prioritize resilience by selecting measures that align with business priorities, such as minimizing downtime and financial loss. Key to this phase is balancing cost, feasibility, and effectiveness to create robust recovery mechanisms.[53] Business continuity strategies are typically categorized into three types: preventive, detective, and corrective. Preventive strategies aim to avoid disruptions before they occur, such as implementing regular data backups and redundant systems to prevent data loss from failures.[54] Detective strategies focus on identifying incidents in progress, through tools like real-time monitoring systems that alert to anomalies in network traffic or system performance.[54] Corrective strategies address recovery after an event, including detailed procedures for restoring operations, such as failover to backup servers.[54] Core design elements include establishing alternate sites, securing vendor contracts, and allocating resources efficiently. Alternate sites provide off-premises facilities for relocation during disruptions, classified as cold sites (basic infrastructure requiring full setup, suitable for non-critical functions with longer recovery times), warm sites (pre-configured hardware and partial data, enabling moderate recovery speed at balanced costs), and hot sites (fully mirrored environments with real-time data synchronization for near-instant failover, ideal for high-priority operations but expensive to maintain).[55] Vendor contracts must incorporate business continuity clauses, specifying service level agreements for recovery times and mutual support during incidents to ensure third-party dependencies do not amplify disruptions.[56] Resource allocation involves assigning personnel, budgets, and technology based on criticality, such as dedicating skilled IT teams to high-impact systems while optimizing costs for lower-priority areas.[53] These solutions integrate directly with business impact analysis (BIA) and recovery time objectives (RTO) to ensure viability; for instance, a BIA identifies critical processes, and corresponding RTOs—such as four hours for core financial systems—dictate the selection of hot sites or automated recovery tools to meet those targets without excess expenditure.[53] In modern contexts post-2020, cloud-based resilience has become integral, offering scalable alternate sites with automatic replication and geo-redundancy to achieve sub-hour RTOs, as seen in hybrid models combining on-premises and cloud infrastructure for enhanced flexibility during events like pandemics.[57] Additionally, AI-driven threat detection enhances detective strategies by analyzing patterns in real-time data to predict and flag potential disruptions, such as supply chain anomalies, improving proactive response in dynamic environments.[58]Standards and Regulations
International Standards
ISO 22301:2019 specifies requirements for establishing, implementing, maintaining, and continually improving a business continuity management system (BCMS) within organizations of any size or sector.[4] This standard outlines a structured framework that includes planning for disruptions, defining business continuity objectives, and ensuring the capability to continue delivering products or services at acceptable predefined levels during and after such events.[4] It emphasizes leadership commitment, risk assessment, and performance evaluation to build organizational resilience.[59] Complementing ISO 22301, ISO 22313:2020 provides practical guidance for applying the BCMS requirements, covering key processes such as business impact analysis (BIA), risk assessment, business continuity strategy development, and testing of continuity arrangements. The guidance supports organizations in conducting BIA to identify critical functions and potential impacts, as well as in designing and exercising plans to verify effectiveness. It promotes a holistic approach to integrating business continuity into overall management systems. Adoption of ISO 22301 enhances interoperability among supply chain partners by standardizing continuity practices, while enabling independent audits and third-party certification for verifiable compliance.[4] As of the ISO Survey 2022, 3,200 valid certificates had been issued worldwide.[60] The 2019 edition of ISO 22301 and the 2020 edition of ISO 22313 enhanced focus on risks such as supply chain vulnerabilities and cyber incidents based on pre-2019 experiences. An Amendment 1 to ISO 22301 was published in February 2024, potentially incorporating further updates.[5]National and Regional Standards
In the United Kingdom, the British Standards Institution developed BS 25999 as a foundational national standard for business continuity management (BCM), with BS 25999-1:2006 providing a code of practice and BS 25999-2:2007 specifying requirements for implementing a BCM system to ensure organizational resilience against disruptions.[61] This standard emphasized a management systems approach, including risk assessment, business impact analysis, and recovery strategies, and served as a direct predecessor to the international ISO 22301, to which UK practices have since aligned following its withdrawal in 2012.[62] In Australia and New Zealand, AS/NZS 5050:2020 addresses managing disruption-related risk to achieve improved business continuity by focusing on applying the principles and processes from AS/NZS ISO 31000 to identify, analyze, and mitigate threats that could interrupt operations.[63] Complementing this, HB 221:2004 served as a handbook outlining a comprehensive framework for BCM, including core processes such as strategy development, plan implementation, and testing, though it has been withdrawn and its guidance integrated into broader risk management practices.[64] In the United States, the National Institute of Standards and Technology (NIST) provides NIST SP 800-34 Revision 1 as a key guideline for federal information systems, offering detailed instructions on contingency planning to support IT continuity, including development of plans for incidents like natural disasters or cyberattacks affecting government operations.[15] For the financial sector, the Federal Financial Institutions Examination Council (FFIEC) issues the Business Continuity Management booklet within its IT Examination Handbook, which mandates financial institutions to establish governance, risk assessments, and recovery strategies tailored to sector-specific threats, such as cyber incidents or infrastructure failures, to maintain critical services.[65] Across the European Union, the Network and Information Systems (NIS) Directive, particularly its update as NIS2 (Directive (EU) 2022/2555), imposes requirements on operators of essential services in critical infrastructure sectors—like energy, transport, and digital services—to implement risk-management measures that include business continuity planning for ensuring service resilience against cybersecurity threats and other disruptions.[66] Enforcement is handled at the member-state level, with authorities empowered to issue fines for non-compliance; for essential entities, penalties can reach up to €10 million or 2% of total global annual turnover, whichever is higher, while important entities face up to €7 million or 1.4%.[67]Implementation
Plan Development
Plan development transforms the outputs of business impact analysis, risk assessment, and strategy development into a structured, actionable document that guides an organization's response to disruptions. This process involves defining clear objectives, outlining recovery strategies, and ensuring the plan is comprehensive yet practical for implementation. According to ISO 22301:2019, the business continuity plan (BCP) must be documented as part of the business continuity management system (BCMS) to enable systematic preparation, response, and recovery from disruptive incidents.[4] The development follows a structured approach, starting with drafting key sections and incorporating input from cross-functional teams to align with organizational priorities. A core component of the BCP is the executive summary, which provides a high-level overview of the plan's purpose, scope, and objectives, including essential mission processes, restoration priorities, and contact information. This summary ensures senior leadership can quickly grasp the plan's intent and authorize activation if needed. NIST SP 800-34 Revision 1 emphasizes that the executive summary should outline contingency planning for federal information systems, focusing on recovery strategies and three operational phases: activation/notification, recovery, and reconstitution.[39] It serves as the entry point for stakeholders, summarizing risks and mitigation measures without delving into procedural details. Roles and responsibilities form another essential component, often documented using a RACI matrix (Responsible, Accountable, Consulted, Informed) to clarify accountability and prevent overlaps during crises. The RACI matrix assigns specific duties, such as the ISCP coordinator overseeing recovery progress and the recovery team executing procedures, ensuring coordinated efforts. In business continuity contexts, this tool helps define who activates the plan (typically senior management like the CIO), who performs recovery tasks, and who must be informed, reducing confusion under pressure.[39] DRI International's Professional Practices for Business Continuity Management recommend integrating RACI into plan development to align roles with recovery time objectives.[68] Procedures for plan activation detail the triggers and steps to initiate the BCP, such as outages exceeding the recovery time objective (RTO), facility damage, or assessed disruption severity based on system criticality. Activation begins with notification via call trees or escalation chains, followed by damage assessment and resource mobilization. NIST guidelines specify that activation criteria should consider outage duration and impact, with the management team leading the response to sustain operations.[39] These procedures are derived from prior solution designs, ensuring alignment with predefined recovery strategies. Documentation supports the plan's usability through visual aids like flowcharts, contact lists, and escalation protocols. Flowcharts illustrate activation sequences, such as notification hierarchies and recovery workflows, making complex processes accessible. Contact lists include personnel details (work, home, cellular, and email) for key roles, while escalation protocols outline steps for reporting delays, resource needs, or status updates to leadership. NIST SP 800-34 requires these elements in appendices, including sample call trees and equipment inventories, to facilitate rapid execution.[39] Comprehensive documentation ensures the plan remains a living reference, updated as needed. Integration with IT disaster recovery (DR) and emergency response plans is critical for holistic resilience, coordinating system relocation to alternate sites (e.g., hot, warm, or cold) and leveraging offsite backups. The BCP incorporates DR procedures for technology recovery while focusing on business operations, using business impact analysis findings to prioritize actions. NIST SP 800-34 stresses this linkage through controls like CP-6 (alternate storage) and CP-7 (alternate processing), ensuring seamless transitions during disruptions.[39] Emergency response elements, such as initial incident handling, feed into the BCP for sustained continuity. Legal aspects, particularly compliance with data protection laws like the GDPR, require the BCP to address personal data security during disruptions. Plans must include regular backups of sensitive data, stored off-site, with recovery processes tested to prevent breaches or loss. The UK's Information Commissioner's Office (ICO) mandates that BCPs identify critical records, ensure staff awareness of recovery procedures, and incorporate risk-based measures to maintain data availability and integrity under Article 32 of the GDPR.[69] Non-compliance could result in fines up to 4% of global annual turnover, underscoring the need for explicit data protection protocols in plan development.Training and Organizational Acceptance
Effective training programs are essential for equipping personnel with the knowledge and skills required to execute business continuity plans (BCPs), as mandated by international standards such as ISO 22301, which requires organizations to determine necessary competence for those affecting the business continuity management system (BCMS) and retain appropriate documented information. These programs typically include workshops that cover BCP fundamentals, policy, and roles; simulations to practice response scenarios; and role-specific drills tailored to functions like executive decision-making or IT recovery operations.[70] For instance, executives may focus on strategic oversight and resource allocation during disruptions, while IT staff emphasize technical recovery procedures, ensuring competence through evaluation and ongoing development.[71] Organizational acceptance of BCP relies on strategies that foster commitment across all levels, beginning with leadership endorsement to demonstrate priority and allocate resources effectively.[72] Communication campaigns, such as regular newsletters, intranet updates, and town halls, raise awareness of BCP importance and individual contributions, often integrated into broader BCMS awareness efforts as outlined in ISO 22301 Clause 7.3. Metrics for engagement include participation rates in training sessions and feedback surveys to gauge understanding, helping to measure and improve adoption.[73] Challenges in achieving acceptance often stem from resistance due to perceived irrelevance or resource demands, with 61% of organizations citing lack of engagement as a primary obstacle according to industry benchmarks.[74] Post-9/11 implementations highlighted these issues in federal agencies, where uneven organizational buy-in and limited training for non-essential operations led to coordination gaps, despite leadership actions like the U.S. Office of Personnel Management's (OPM) promotion of telework and emergency preparedness.[75] Overcoming resistance involves addressing concerns through targeted education, involving employees in plan development, and using real-world case studies to illustrate benefits, thereby building a culture of resilience.[74] To verify familiarity, organizations often require employee acknowledgments, such as signed confirmations or attestations following training, confirming understanding of their BCP roles and responsibilities.[76] This practice, aligned with BCI Good Practice Guidelines, ensures accountability and supports audit readiness under standards like ISO 22301, with records maintained as evidence of competence and awareness.[73]Testing and Maintenance
Testing Procedures
Testing procedures are essential for validating the effectiveness of a business continuity plan (BCP), ensuring that organizations can respond to disruptions while meeting recovery objectives. These procedures involve structured exercises that simulate potential incidents, allowing teams to practice responses, identify gaps, and refine strategies without risking actual operations. According to ISO 22301, organizations must establish an exercise program to test business continuity procedures at planned intervals or following significant changes, with results used to evaluate and improve the plan.[77] Common testing types include tabletop exercises, walkthroughs, full-scale simulations, and component tests, each escalating in complexity to assess different aspects of the BCP. Tabletop exercises involve facilitated discussions where participants review a hypothetical scenario, such as a cyberattack, to evaluate decision-making and coordination without executing actions; this method is ideal for initial validation and building team awareness.[78] Walkthroughs entail step-by-step reviews of procedures by relevant teams, often focusing on specific processes like data backup restoration to confirm procedural clarity and resource availability.[37] Full-scale simulations replicate a real disruption by activating recovery sites and processing actual data, testing end-to-end recovery capabilities under time pressure.[37] Component tests target isolated elements, such as IT system failover or supply chain alternatives, to verify individual functionalities before broader integration.[79]| Testing Type | Description | Purpose |
|---|---|---|
| Tabletop Exercise | Group discussion of a scenario without physical actions | Identify procedural gaps and enhance coordination |
| Walkthrough | Sequential review of plan steps by participants | Ensure procedural accuracy and familiarity |
| Full-Scale Simulation | Actual execution of recovery processes at alternate sites | Validate overall plan effectiveness under realistic conditions |
| Component Test | Isolated evaluation of specific plan elements | Confirm functionality of critical subsystems |
