CSE030

Power System Resilience: definition, features and properties

Authors

E. CIAPESSONI, D. CIRIO, A. PITTO - Ricerca sul sistema Energetico - RSE S.p.A., Milan, Italy
M. VAN HARTE - Eskom, Johannesburg, South Africa
M. PANTELI - University of Cyprus, Cyprus

Summary

Modern power systems are subject to natural and man-made threats with increasing frequency. The management of the system in case of extreme events can benefit from the introduction of the property of “resilience”. After a brief overview on the strengths and limits of the most significant definitions of resilience available in literature, this paper discusses the definition of resilience recently elaborated by the CIGRE WG C4.47 “Power System Resilience”, clarifying its features, the main differences between resilience and well-established properties such as adequacy and security, and proposing some models to represent the relationships between resilience and reliability. Moreover, the paper lists and briefly discusses the key measures that make a system resilient.

Keywords
resilience - power systems - reliability - extreme events

1. Introduction

The term resilience has been used in very different fields of knowledge for many decades, and it has been more recently applied in the power system sector due to the increasing number of extreme events which negatively affect power systems [1]. Considering this trend in natural events but also in cyber and/or physical attacks, the stakeholders (in particular Transmission system Operators, TSOs, and Distribution System Operators, DSOs) started focusing their attention to pursue these two goals: (a) to evaluate the impact of multiple (also dependent) outages of components, potentially leading to blackouts, and (b) to propose preventive or corrective countermeasures in order to absorb the effects of such disruptive events and recover fast. In this context, the concept of resilience can prove useful to develop proper approaches.

The first definitions of resilience referred to materials and were used in the XIX century especially in the naval engineering context. C.S. Holling provided the first system-level definition of resilience by defining it in 1973 [2] as a measure of “the persistence of systems and of their ability to absorb change and disturbance and still maintain the same relationships between populations or state variables”. Since this foundational definition, the concept of resilience has evolved remarkably in several systems, such as safety management, organizational, social-ecological, and economic ones. After Holling, numerous interpretations of resilience have been developed, resulting in many different definitions and a lack of a universal understanding of what resilience really means. For example, Perrings in [3] defines economic resilience as “the response to hazards that enables people and communities to avoid some economic losses at micro-macro market levels. It is the capacity for the enterprise to survive and adapt following market or environmental shocks”. Hollnagel in [4] defines organizational resilience as “the ability of an organization to identify risks and to handle perturbations that affect its competencies, strategies and coordination”.

In the sector of electric power systems as critical infrastructures the picture is even blurrier, as the concept of resilience has roughly only emerged in the last decade. There have been several attempts by organizations worldwide in the power and energy engineering communities, such as the UK Energy Research Center (UKERC) [5] and the Power Systems Engineering Research Center (PSERC), USA, to define resilience and distinguish it from the concept of reliability. According to the UK Cabinet Office, resilience encompasses reliability and further includes resistance, redundancy, response, and recovery as key features. Another pioneer definition comes from the Multidisciplinary and National Center for Earthquake Engineering Research (MCEER), USA, where a generic organizational resilience framework has been developed [6] that can be applied to any critical infrastructure, including power systems. The definition of MCEER for the disaster resilience of social units is reported below [6]:

“Disaster resilience is the ability of social units (e.g., organizations, communities) to mitigate hazards, contain the effects of disasters, and carry out recovery activities in ways that minimize social disruption, while also mitigating the effects of future disasters. Consequently, strength, flexibility, and the ability to cope with and overcome extreme challenges, are the hallmarks of disaster-resilient communities.”

The above framework entails the “4 R’s” of resilience which consist in robustness, redundancy, resourcefulness, and rapidity according to [6].

The paper is organized as follows: Section 2 discusses the main features of reliability and its sub-properties in order to highlight the need for introducing the new concept of resilience in power system planning and operation. Section 3 presents the available definitions of resilience and the motivations at the basis of the new definition proposed by CIGRE WG C4.47. Section 4 illustrates the new definition of resilience and clarifies the differences between resilience, reliability, and its sub-properties. Section 5 discusses the relationship between reliability and resilience. Section 6 concludes. Finally, the appendix (Section 7) lists some possible resilience-reliability relationship models.

2. From reliability to resilience

This section presents the current definitions of reliability in the electricity sector and the reasons which are bringing a change of paradigm from the traditional concept of reliability to resilience.

2.1. Reliability and its sub-properties

Reliability concept was introduced to assess the performance of power system in providing energy to users even in case of disturbances. Such a concept is shaped not only by formal definitions (reviewed in this sub-section), but also by the way these definitions are applied (sub-section 2.3).

Reliability has been defined in different ways by CIGRE, IEEE, IEC, NERC, ENTSO-E. Table 2 summarizes the definitions of reliability from these different entities, along with those of adequacy and security, identified as the sub-properties of reliability.

All the reported definitions agree that reliability refers to the probability of satisfactory operation of the system in the long term. To this regard, IEC definition [7] also includes a reference to the time interval of analysis.

The degree of reliability can be measured through the frequency, duration and intensity of situations of service degradation for the customers.

Reliability depends on the adequacy and security of the power system, as reported e.g. in CIGRE references [8]-[9].

As far as adequacy is concerned, the key concept is the availability of resources and components, or system elements, with suitable capacity to meet the load demand without violating operating limits.

To be adequate, a power system must be endowed with resources for generation, storage, demand flexibility, as well as transmission capacity sufficient to cover the expected demand plus reserves for contingencies, at all times. On planning horizons, this requires a suitable development of the above resources, within the mechanisms defined by the regulatory framework.

All adequacy definitions include the explicit reference to “unscheduled outages of system components”, i.e. contingencies. In particular, NERC, IEEE and ENTSO-E refer to reasonably expected unscheduled outages, thus also including an application criterion (i.e. the credibility criterion) in the definition.

ENTSO-E [12] defines the contingency as the “identified and possible or already occurred fault of an element, including not only the transmission system elements, but also significant grid users and distribution network elements if relevant for the transmission system operational security”.

As for security, CIGRE [8][9], NERC [13], and ENTSO-E [12], [14], [15] definitions are perfectly coherent in recognizing security as “the ability to withstand sudden disturbances”. IEC definition [7] includes the requirement of “integrity of demand supply” (i.e., “without loss of load”) in case of an event which satisfies a credibility criterion. IEEE [10][11] similarly specifies “without interruption of customer service”.

In view of that, all the security definitions concur that a system can be considered secure if it is in an acceptable operating condition after the occurrence of at least credible contingencies. This requires:

  • the planning of system operation by setting appropriate margins with respect to stability (frequency, angle, voltage stability) and overload, in order to take into account operational uncertainties;
  • the definition of international agreements for the control of interconnected electrical systems;
  • the coordination between system operators (TSOs and DSOs).

Reliability, adequacy, and security concepts include elements of planning and operation and can be applied to the power system in steady-state, dynamic, and transient conditions, encompassing all elements of the generation, transmission and distribution systems, and loads [8].

Table 1 – Definitions of reliability, adequacy and security from the literature.
  Reliability Adequacy Security

CIGRE [8]-[9]

A measure of the ability of a power system to deliver electricity to all points of consumption and receive electricity from all points of supply within accepted standards and in the amount desired.

A measure of the ability of a power system to meet the electric power and energy requirements of its customers within acceptable technical limits, taking into account scheduled and unscheduled outages of system components, where:

  • Power system includes all elements of the generation, transmission and distribution systems, and customer facilities that supply or use power and energy, or provide ancillary services;
  • Customers include all parties that supply power and energy or ancillary services, as well as those who consume them;
  • Requirements of customers include their basic power and energy needs, and agreed use of customers’ ability to vary power supply, adjust demand and provide ancillary services;
  • Acceptable technical limits and scheduled and unscheduled outages are those specified in the applicable planning criteria and standards; and
  • System components include all elements of the supply, delivery and utilization systems regardless of ownership or control.

The ability of the power system to withstand disturbances, where:

  • Power system includes all elements of the generation, transmission and distribution systems, and customer facilities that supply or use power and energy, or provide ancillary services;
  • Ability to withstand will vary depending on specific disturbances and applicable criteria or standards, and includes agreed use of customers’ ability to vary power supply, adjust demand and provide ancillary services;
  • Disturbances include electric short circuits, unanticipated loss of system facilities, or other rapid changes such as in wind or solar generation.

NERC [13]

The degree to which the performance of the elements of that system results in power being delivered to consumers within accepted standards and in the amount desired.

The ability of the electric system to supply the aggregate electrical demand and energy requirements of the end-use customers at all times, taking into account scheduled and reasonably expected unscheduled outages of system elements

The ability of the bulk power system to withstand sudden, unexpected disturbances, such as short circuits or unanticipated loss of system elements.

IEEE  [10][11]

Reliability of a power system refers to the probability of its satisfactory operation over the long run.

It denotes the ability to supply adequate electric service on a nearly continuous basis, with few interruptions over an extended time period.

The ability of the electric systems to supply the aggregate electrical demand and energy requirements of their customers at all times, taking into account scheduled and reasonably expected unscheduled outage of system elements.

Security of a power system refers to the degree of risk in its ability to survive imminent disturbances (contingencies) without interruption of customer service.

It relates to robustness of the system to imminent disturbances and, hence, depends on the system operating condition as well as the contingent probability of disturbances.

IEC [7]

The ability of a power system to meet its supply function under stated conditions for a specified period of time.

The ability of an electric power system to supply the aggregate electric power and energy required by the customers, under steady-state conditions, with system component ratings not exceeded, bus voltages and system frequency maintained within tolerances, taking into account planned and unplanned system component outages.

The ability to tolerate a credible event without loss of load, over-stress of system components, or deviation from specified voltage and frequency tolerances.

 

ENTSO-E

[12], [14], [15]

The degree of performance of the elements of the bulk electric system that results in electricity being delivered to customers within accepted standards and in the amount desired.

The ability of the electric system to supply the aggregate electrical demand and energy requirements of the customers at all times, taking into account scheduled and reasonably expected unscheduled outages of system elements.

The ability of the electric system to withstand sudden disturbances such as electric short circuits or unanticipated loss of system elements.

2.2. Reliability, adequacy and security quantification

Reliability can be quantified by deterministic or probabilistic criteria. The former ones typically require the system to be able to endure specific situations, consisting of operating conditions with contingency or unavailability of components, without undesired consequences (e.g., load shedding, overloads, instability). With deterministic criteria, the selection of contingencies to consider is based on a credibility basis. On the other hand, probabilistic criteria aim to verify if the risk of undesired consequences is below specific thresholds. In this case, contingencies do not need to be limited by a credibility approach: all contingencies significantly contributing to the risk should be accounted for. Even if probabilistic criteria have recently been adopted or are under development in reliability analyses [16] to account for various uncertainty sources, the relevant indices are still defined in terms of average values and may neglect high impact, low probability events.

For system adequacy, the focus is the quantification of availability of facilities needed to satisfy the consumer demand plus reserves for contingencies, which may be relevant both in long term horizons and in short-term ones.

To this aim, probabilistic criteria are becoming a standard [16]-[19]. Indeed, many countries and regions all over the world adopt resource adequacy standards in terms of Loss Of Load Expectation (LOLE), or based on similar metrics. The evaluation of the LOLE can then consider not only the most credible contingencies but also other, less likely contingencies that could lead to a power shortage. However, adequacy criteria are typically expressed purely on expected values. In adequacy assessment, contingencies are analysed with static tools, usually based on Optimal Power Flow.

For Security the focus is to evaluate the final state of the system following a contingency, with the aim of assessing the system’s capability to withstand disturbances analysing both steady state violations and dynamic transients.

Security can be assessed considering hypothetical operating conditions over planning horizons, as well as short-term or nearly real-time forecast operating conditions. In the former case, the lack of security may drive interventions at the planning stage (e.g. install new devices to stabilise the system, tune control parameters etc.); in both cases the operating condition may need to be modified to achieve “preventive” or “corrective” security. Extreme events are usually not considered in security analyses. In fact, assuring system security in case of multiple component outages is not viable from a techno-economic viewpoint, due to the technical challenges and high costs required for grid strengthening against a much broader and more severe set of contingencies.

Probabilistic approaches for system security have thus been investigated to estimate the risk of undesired consequences [16], [20]. Risk based approaches allow considering multiple contingencies, thus addressing the aforementioned problem.

2.3. Reliability, adequacy and security application criteria

Some definitions of adequacy and/or security, e.g. the IEEE [10][11], ENTSO-E [15] and NERC [13] definitions of adequacy, and the IEC definition of security [7] include a contingency credibility criterion, implicitly assuming a particular way to apply the definitions.

In particular:

  • the methods to apply the definition of adequacy depend on the specific segment of the power system under analysis (generation, transmission or distribution): for example, transmission adequacy is assured by applying and evaluating subsequent reinforcements to the transmission system in an iterative procedure, up to the attainment of the goal, which consists in accommodating the load and the corresponding generation for a set of specific states. These states can include specific outages of system components. Note that the use of probabilistic approaches is rising also for transmission adequacy, but extreme events are usually not considered.
  • system security is guaranteed through deterministic criteria considering a credibility criterion for the selection of contingencies (typically the N-1 criterion): in case of faults on individual grid components, the system must remain in the normal state or end up in an alert state that does not cause any violations of the operating limits or any load disruption, but it may be no longer secure in case of an additional fault. Recently, the increasing use of advanced automation/support systems at control system level, which can solve security violations within acceptable times to assure the system correct operation, is making more and more arguable the need for the strict interpretation, given above, of the N-1 security criterion. This is evidenced by the increasing use of "corrective" security in operational practice. 

2.4. Need for resilience

Several definitions of reliability, adequacy and security referenced in Table 1 do not limit the general properties to specific disturbances or contingencies.

Nevertheless, the traditional criterion for the application of these properties may not assure satisfactory performances of the system in case of extreme events. In fact, blackouts are often originated by multiple contingencies that are not contemplated by the N-1 criterion. Being this criterion deterministic, it does not consider the probability of contingencies, which in turn depends on the threats affecting the system.

There is thus a gap coming from both the definitions and the way they are applied, such that extreme events are not part of the traditional reliability analyses. To overcome this gap, the concept of resilience has been introduced:

  • to provide a conceptual framework in support of the characterization and design of measures aimed to improve the performances of power system response, following extreme events triggered by adverse weather conditions, malicious acts, cyber-attacks, etc.
  • to allow a comprehensive evaluation of system response to disturbances, including not only the system degradation due to the disturbance but also the system behaviour during the restoration phase, as well as all the measures taken to preventively improve system performance on the basis of past events.

Resilience adds a new dimension to system management and reliability. The application of resilience concepts can assist utilities and regulators to encourage prudent investments to enhance system performances in case of extreme events characterised by low frequency of occurrence but significant consequences [21], [22]. These include significantly deteriorated operational capabilities, possibly leading to widespread cascading impacts that could also affect interdependent critical infrastructures with catastrophic consequences.

Therefore, resilience assessments may require a multi-dimensional evaluation of the response of an interconnected power system to these extreme and disruptive events, considering the power system interaction with its surrounding environment and possibly other critical infrastructures (e.g. roads, water and gas supply, cyber systems, etc.). Achieving resilience may require multiple strategies with due consideration of utility response objectives for planning and/or response efforts. These undertakings can be very complex and challenging due to the interdependence and relationship with essential services and mission-critical loads.

3. Power system resilience: current definitions and genesis of the new approach

This section presents the current definitions of resilience in the electricity sector and the motivations underlying the proposed resilience definition.

3.1. Current definitions in electricity sector

Focusing on resilience in the power system sector, Table 2 proposes a bunch of previously mentioned definitions from the literature. Even though there is not a unique definition of resilience, most of definitions agree upon the main features a resilient system should have.

Almost all of the definitions above (12 out of 13) agree upon the fact that resilience is an ability of the power system, just like reliability and security. This implies the possibility to propose methods and metrics to quantify this ability.

Most of the terms used in the definitions above correspond to some capabilities which make a system resilient (e.g. withstand, adapt to, recover, absorb, anticipate). Half of the definitions (7 out of 13) indicate that resilience must be evaluated in case of a “disruptive event” affecting the system. Only 4 definitions out of 13 (in particular #1, #6, #9 and #12) characterize the types of these disruptive events, by specifying that these events are “extraordinary and high impact – low probability events”.

Table 2 - Definitions for power system resilience
ID Reference Definition

1

UK Energy Research Center (UKERC), “Building a Resilient UK Energy System”, 2009 [5]

The ability of a power system to withstand extraordinary and high impact-low probability events such as due to extreme weather, rapidly recover from such disruptive events and absorb lessons for adapting its operation and structure to prevent or mitigate the impact of similar events in the future.

2

Haimes [23]

The ability of the system to withstand a major disruption within acceptable degradation parameters and to recover within an acceptable time and composite costs and risks.

3

NIAC [24]

(Infrastructure resilience) the ability to reduce the magnitude and/or duration of disruptive events. The effectiveness of a resilient infrastructure or enterprise depends upon its ability to anticipate, absorb, adapt to, and/or rapidly recover from a potentially disruptive event.

4

UK Cabinet Office [25]

The ability of assets, networks and systems to anticipate, absorb, adapt to and / or rapidly recover from a disruptive event.

5

PSERC [26]

The ability of a system to gradually degrade under increasing system stress, and then to return to its pre-disturbance condition when the disturbance is removed.

6

NAURC [27]

Robustness and recovery characteristics of utility infrastructure and operations, which avoid or minimize interruptions of service during an extraordinary and hazardous event.

7

US Presidential Policy Directive 21: Critical Infrastructure Security and Resilience, 2013 [28]

The ability to prepare for and adapt to changing conditions and withstand and recover rapidly from disruptions.

Resilience includes the ability to withstand and recover from deliberate attacks, accidents, or naturally occurring threats or incidents.

8

Sandia Lab 2011 [29]

Given a disruptive event (or set of events), the resilience of a system to that event (or events) is

the ability to reduce ‘efficiently’ both the magnitude and duration of the deviation from targeted system performance levels.

The bold words of this definition are key components of resilience; further discussion follows below:

  • Disruptive event: Different disruptions may affect a system in different ways and thus necessitate different recovery processes. Hence, a system may have different levels of resilience to different disruptions. This definition considers resilience of a system to a specific disruption.
  • Efficiently: Efficiently means using the lowest possible amount of resources during recovery processes; depending on the domain, these resources could be dollars, repair man-hours, infrastructure replacement assets, or time.
  • System performance: Given the flexibility of many systems to adjust and reconfigure to a disruptive event, maintaining system structure is not as important as maintaining system performance. Hence, measurement of resilience should evaluate how a disruption affects system performance and causes productivity to decrease relative to targeted system performance levels: that is, how the system should behave during and after disruptive events.

9

Consultation document of the Italian Ministry of Economic Development, June 12, 2017 [30]

The ability of a system not only to resist to stresses which have overcome the withstanding limits of the system itself, but also to come back fast to a normal state of operation. The effectiveness of the resilient system depends on its capability to anticipate, to absorb, to adapt to and/or recover itself from an extreme event.

10

IEEE Task Force on Definition and Quantification of Resilience, April 2018 [31]

The ability to withstand and reduce the magnitude and/or duration of disruptive events, which includes the capability to anticipate, absorb, adapt to, and/or rapidly recover from such an event

11

National Security Policy for Critical Infrastructures, Brazilian government, November 2018 [32]

The capacity of the critical infrastructures to be recovered after the occurrence of an adverse situation

12

NATF (North American Transmission Forum) [33]

The ability of the system and its components (i.e. both the equipment and human components) to minimize damage and improve recovery from non-routine disruptions, including high impact, low frequency (HILF) events, in a reasonable amount of time. Resiliency includes a diverse range of topics, such as flexibility, hardening, security and recovery

13

US National Academies of Science [34]

The ability to prepare and plan for, absorb, recover from, or more successfully adapt to actual or potential adverse events

Definition #2 also mentions “major disruption” to specify the high severity of the events taken into account. However, in general, resilience definitions are detached from the relevant application criterion.

The key concept of degradation is mentioned only in two definitions (#2 and #5), but a similar concept (“deviation from targeted system performance levels”) is also indicated in definition #8. Even though little mentioned, the concept of degradation should be considered a major aspect distinguishing resilience from other properties such as “security”, which are usually of binary nature (i.e. a system is secure or not secure) as they require the fulfilment of strict criteria.

All the definitions referring to degradation agree on the need to define adequate levels of degradation and of its rate. In fact, Definition #2 in Table 2 introduces the concept of “acceptable degradation parameters” and “acceptable time and composite costs and risks”, also specifying an economic criterion of acceptability, while PSERC definition (#5) introduces the concept of “gradual degradation” which implicitly calls for an indication of a harmless or acceptable slope of degradation.

Besides definition #2, the reference to the economic aspects of resilience can be found in Sandia Lab’s definition (#8), which mentions an efficient reduction of the magnitude and duration of the deviation from targeted performances. The reference to efficiency implies the adoption of the lowest possible amount of resources (money, repair man-hours, etc.).

Among all the definitions, only Sandia’s (#8) and PSERC’s (#5) ones separate the definition of “resilience” property from the key measures that make the system resilient (anticipation, adaptation, absorption, fast recovery); moreover, as mentioned above, both refer to a generic “disturbance” or “disruptive event” without specifying anything about its severity or probability of occurrence. This is a valuable point of these definitions: in fact, for a rigorous analysis of a property, the definition of the property should be separate from its enabling capabilities, as well as from its application criteria and quantification metrics.

Most of the definitions (11 out of 13) highlight the importance of fast recovery in order to characterize a resilient system. All these definitions underline that the assessment of power system resilience, unlike security, calls for the evaluation of the restoration process. The point that a fast recovery is the major characteristic of a resilient system is also highlighted in [35].

In particular, Sandia Lab’s definition (#8) specifies two aspects of the system degradation i.e. its magnitude and its duration. To this regard, term “magnitude” is quite indistinct or indefinite, because it may refer to the severity, in terms of triggered damages, and to the extent of the disturbance. However, these two aspects are not correlated: in fact, certain threats like tornados determine very localised (low extent) but significant (high severity) damages to the power infrastructure and service. Other threats, like wet snow events, can have moderate severity but large extensions.

Also, the IEEE definition of resilience (#10), besides mentioning the enabling capabilities of resilience, refers to the magnitude and the duration of disruptive events, without clarifying the ambiguity of term “magnitude”.

Some limits can be detected in the previous definitions:

  • Term “withstand” used by 4 definitions (# 1, 2, 7 and 10) appears more adapt -and it has been widely used in the literature- to define security. In fact, “withstand” describes the system capability to survive a disturbance typically fulfilling strict performance requirements in the energy supply (as verified during security analyses);
  • The explanation of term “efficiently” in definition #8 should be amended, because it only refers to the restoration process. However, in the resilience definition, the efficiency concept is applied to both the magnitude and the duration of disruptive events, which means that efficiency must be considered also during the phase of system performance degradation.
  • Definition #6 uses the term “robustness”, which may generate confusion as this term has also been used, e.g. to indicate power system’s ability to remain in a normal state in case of disturbances;
  • Definition #11 only focuses on the recovery capability of a resilient system, neglecting the ability to anticipate the event hence limiting system degradation.

4. Power system resilience: a new definition

This section presents and discusses the definition for power system resilience discussed in CIGRE WG C4.47 [36]. As the resilience is different from other widely used properties in the power system sector such as security, reliability, the section also presents a comparison of resilience against these other properties.

The definition proposed by the WG intends to separate:

  • the property to be achieved, from
  • the possible key measures that can be deployed to achieve power system resilience.

This definition has also been an important trigger for further activities in the CIGRE scientific community, as demonstrated by the TB by CIGRE C2.25 “Operating strategies and preparedness for system operational resilience” [37] which also discusses the aspects related to the application of WG C4.47 definition of resilience in the context of power system operation.

4.1. A definition of power system resilience property

The new definition associates the concept of resilience to the system’s ability to limit the extent, severity and duration of system degradation following an event. As the criterion of application for this property mainly regards extreme events, Power System Resilience is defined as:

the ability to limit the extent, severity and duration of system degradation following an extreme event.

Table 3 explains the bold words of the resilience property definition proposed above.

 

Table 3 – Explanation of the bold words in the new definition of resilience

extent, severity and duration

The replacement of the ambiguous term “magnitude” in definition #8 with the two terms “extent and severity” provides further details about the action of the disruptive event and assures a more focused characterization of the dimensions of system degradation, still keeping the definition concise and effective.

Severity” in the present definition refers to the “severity of the event consequences”, which must be kept separate from the “severity of the event” which in general does not imply any system degradations.

The concept of “severity” also depends on the stakeholder’s point of view. For example, some system situations might be deemed as severe by the TSO yet without any grid degradations (e.g. a degraded operating point without security margins and ready to collapse if a single failure happens on a specific grid component).

system degradation

 

Term “degradation” is intended as “deviation from specified target performances”. This term refers to the criteria used to apply the resilience concept in system planning and operation.

In fact, the costs to assure power system security in case of multiple contingencies can be unacceptable and unsustainable: thus, the rationale is to assure the fulfillment of security criteria for “ordinary” events (N-1 credibility criterion) and to provide a weaker criterion of not exceeding maximum specified deviations of system performances in case of severe multiple contingencies.

Term “degradation” refers to both the power supply and the grid infrastructure, thus the property definition can be applied to both infrastructural and operational resilience. In fact it is worth distinguishing:

  • the Resilience of the infrastructure against extreme events involving multiple failures, which requires the repair or replacement of components, and
  • the Resilience of the energy supply service, which concerns the management of service disruptions, i.e. power outages up to the total blackout.

The different declination of resilience into power supply and infrastructure resilience also implies different performance metrics to be adopted to quantify the property, for example:

  • number of damaged lines over the time is a typical metrics for infrastructural resilience,
  • number of unsupplied customers or the amount of energy not supplied over the time are typical metrics for operational resilience.

extreme event

 

Like most of the definitions in Table 2, the new definition includes a specification about the disruptive events affecting the system: “extreme event” refers to events with a large impact in terms of amount of damaged components, reduction of component functionalities, as well as number of unsupplied customers.

With this specification the new definition links the definition of resilience property with the application criteria (i.e. application to extreme events).[1]

Also, events such as long droughts and extreme temperatures that do not have an impact on the status of the components (i.e. on or off/collapsed) but on their operational capabilities (e.g. derating of lines due to high temperatures or limited cooling of thermal plants due to water scarcity) are to be included in the definition of “extreme event”, as the system should be resilient to these prolonged events too.

The term “extreme” used in the WG definition does not include any information about the probability of occurrence of the events, but only refers to the severity of the impact of such events on the system. These events often have a low probability: in this case they are referred to as HILP (High Impact Low Probability) events. A risk-based approach to resilience can be very effective in dealing with HILP events, because the risk concept combines the consequences (impact) of a contingency with its probability. 

This definition is innovative with respect to most of the previous definitions in Table 2 because:

  • It is more accurate in defining the details of the action of the disruptive event. The indefinite term “magnitude” – used in definition #8 in Table 2 - is replaced with two terms “extent and severity” which refer respectively to the geographical extension and the severity of the effects.
  • It operates a clear separation between resilience as a property and the key measures (shock absorption, fast recovery, etc.) which allow to achieve it. The latter are an integral part of the definition and are addressed in next sub-section. This is common to only a few other definitions in literature.

The WG definition applies to both transmission and distribution systems, even if methods and metrics for resilience assessment and measures for its enhancement must be specified taking into account the peculiarities of the two grids (e.g. different operation criteria and different vulnerabilities of the components, etc.).

4.2. Key measures to achieve power system resilience

An essential part of the new definition of resilience is the list of the key measures that can be used to achieve system resilience over different time frames (from short to long term), as follows.


Power system resilience is achieved through a set of key actionable measures to be taken before, during and after extreme events, such as:

  • anticipation,
  • preparation,
  • absorption,
  • adaptation,
  • rapid recovery and
  • sustainment of critical system operation

including application of lessons learnt.


These “measures” listed in the present subsection characterize any power system: in fact, the adoption of defence plans to absorb the contingency impact, the upgrades of operating and maintenance procedures on the basis of past events, the scheduling of maintenance teams are only few examples of current practices of grid operators to face severe disruptive events. However, a resilient system should be capable to exploit these measures to achieve acceptable targets for the energy supply in case of extreme events.

Table 4 introduces the key measures with a brief description.

The corresponding capabilities of the system to deploy the previous key measures, i.e. the anticipative, absorptive, adaptive and restorative capabilities, must be quantitatively assessed by means of suitable metrics, so that it is possible to set objectives, to establish suitable strategies and measure the improvements, thus providing a valuable support to decision making process.

Table 4 - Explanations and examples for each key actionable measure

Measure

Description

Anticipation

 

This process consists in evaluating and/or monitoring the onset of foreseeable scenarios that could have disastrous outcomes. It assists power system engineers to enumerate plausible disaster scenarios and proposed mitigation plans and allowing decision makers to envisage the “multiple” future states and strategies required to contain, avoid and/or respond to an emergent threat to the power system.

Anticipation during the pre-disturbance period is realized by any resource or action that can reduce the probability of extreme events, or any initial damage.

Preparation

This process is required by decision makers to advance the knowledge gained during the anticipation phase from the resilience strategies to clear objectives to guide the deployment of measures. Tolerance to the possible adverse consequences has to be considered, with emphasis on maintaining mission critical loads and the minimum system load level to sustain a reduced but acceptable functioning of everyday life and, importantly, the orderly functioning of a modern society.

Absorption

The process by which a system can minimise or avoid the consequences of extreme events: the outcomes are represented by the “slope” and amount of the power system performance degradation after the shock has occurred or been avoided.

Adaptation

In this process changes are carried out in the power system management, defence and operational regimes, on the basis of past disruptions, in order to contain and/or limit the undesirable situations. This process includes the upgrades of prevention barriers, operational regimes and maintenance procedures on the basis of lessons learnt from past disruptive events.

Rapid Recovery

In this process the energy supply to the customers is restored and the damages to the grid infrastructure are repaired. The process requires the operational response to the initial shock to contain or limit the consequence to the disruptive events, by focusing on mission critical or essential loads that are required to support the restoration efforts. This requires integrated planning to develop efficient and effective response plans in a coordinated manner to recover the system operation back to a normal state. 

Sustainment of critical conditions

The process of maintaining the operational capability of the impaired power system to supply the mission critical loads and a minimum system load level to maintain a reduced but acceptable functioning of everyday life and importantly the functioning of modern societies that are dependent on so many critical and interdependent infrastructures driven by electricity. This may require the deployment of additional components (e.g. mobile generators), systems (e.g. Uninterruptible Power Supplies) and distributed energy resources to sustain operations until the power system is restored to a normal state.

4.3. Comparison between resilience and reliability

A first comparison between reliability and resilience properties has been done in literature [37], [38]. Table 5 compares security, adequacy and resilience under different aspects, which are individually discussed in the sequel. On the basis of the definitions given in subsection 2.1 and Table 1, reliability refers to the fundamental function for which the power system is designed and operated, i.e. to deliver electricity to customers within specific standards, considering the occurrence of contingencies.

As a preliminary remark, it is worth recalling that there are some differences in the interpretation of the “security” property passing from some standardization entities (IEEE, IEC) to operators’ associations or regulating entities (such as ENTSO-E and NERC). This difference is due to the fact that from a pragmatic viewpoint operators can perform not only preventive but also corrective actions, including the shedding of specific loads under special contracts (Interruptible Loads), to assure a normal (thus, secure) state to the power system. Thus, a broader interpretation of security as “the ability to withstand disturbances” may include the possibility not to fulfill the whole customers’ demand, if we include the above flexibility measures regarding loads.

Table 5 - Comparison of the main aspects concerning reliability, adequacy, security and resilience.
ASPECT Adequacy Security Resilience

Scope

Power system [15]

Power system

Power system with its interactions with environment, humans.

Extreme   events

Limited relevance, depending on operators’ guidelines for system design and operation (grid codes) [40][15]

Usually limited relevance, depending on operators’ guidelines for system design and operation (grid codes) [40][15]

Relevance to events with high impacts that are commonly excluded from design and operating provisions.

Contingency selection

Predefined set of events (from N-1 to some N-k) depending on TSO/ISO’s grid code indications [40][15].

Predefined set of events (from N-1 to some N-k, mainly N-1-1 and N-2) depending on TSO/ISO’s  grid code indications [40][15].

Contingencies with very high impact selected based on TSO’s experience or other approaches such as risk-based techniques accounting for the likelihood of events (over different time frames) and their impact.

Time evolution

Account for the availability of generating units and grid components over the time.

 

Account for the power system response to contingencies, over the time.

In operational planning, security studies may also exploit probabilistic models to assess possible time evolutions of loads and renewable sources.

 

Account for the time evolution of threats, power system and humans interacting together over different phases, from the absorption of contingency effects to the restoration of damaged system facilities and the supply of service to customers.

Long term dynamics of the organization (such as adaptation from past events, preparation and anticipation strategies) is also included.

Impact on the system

Lack of demand coverage due to insufficient generation/transmission capacity and/or reserve [42].

Potential instabilities or frequency/current/voltage violations.

Effects on customer supply (customer interruption times and/or energy not supplied).

Effects on customer supply, also due to severe damages to the infrastructure itself (disruption and recovery times).

Account for grid operators’ actions under stressed situations and staff (e.g. maintenance teams) interventions under extreme conditions.

Acceptability criterion

Comparing indexes (e.g. LOLE) with an acceptability threshold is irrespective of the considered events.

Classification “secure”/“unsecure”, considering the same threshold of performance level applied to all the events considered.

Degraded operation admitted according to the severity of the event.

 

Modeling techniques

Deterministic approaches assess the capability of generation and transmission resources to meet demand at some reference “worst case” situations in the future (usually winter and summer peak demand).

Probabilistic approaches (analytical or Monte Carlo) estimate the performances of the system to supply the demand considering variabilities and uncertainties associated with the generation and transmission resources and the demand, through indices such as Loss Of Load Expectation (LOLE), Expected Energy Not Served (EENS).

Traditionally based on deterministic simulations applied to the expected system state, performed with static or dynamic tools.

Probabilistic models are being introduced to represent the uncertainties on grid operating conditions and system response to contingencies.

Deterministic simulation of extreme event scenarios (based on TSO’s operational experiences).

Probabilistic models for threats (e.g. extreme value distributions in the long term), grid scenarios and system response to contingencies and restoration process.

Also focus on countermeasures and on the quantification of their benefits to limit system degradation.

Advertising, continue reading below

It’s worth recalling also CIGRE definition of security in [9], where the meaning of “withstand disturbances” includes “agreed use of customers’ ability to vary power supply, adjust demand and provide ancillary services”, which confirms the possibility to assure power system security also exploiting measures of load modulation (such as demand response or interruptible loads contracts).

In particular, if one considers the diagram of power system operating states [39], preventive controls can be adopted to move the power system from an “alert” state after the occurrence of a contingency to “normal” state (i.e. “preventively” secure state), while corrective actions, such as the corrective redispatch of generators taking account of the ramping limitations or the shedding of loads under specific contracts, can be deployed to move the system from an “emergency state” (which shows violations of operating quantities and/or instabilities) to an alert or a normal state (i.e. “correctively” secure). In the end, if even corrective controls are not sufficient, then emergency controls, such as system splitting or generalized load shedding, are performed to avoid the “in extremis” state which is characterized by both violations of operating quantities and/or instabilities and lack of supply to customers.

As in CIGRE perspective [8], measures for load modulation (such as interruptible loads and demand response) can be envisaged as “measures to assure system security”, thus they should not be counted for the evaluation of “load integrity” requirement in security definition. Thus, within the framework of the newly defined “resilience” property, one can distinguish between loads under special contracts for modulation purposes and “conventional loads” with no special contracts, stating that:

  1. Security implies the integrity of the conventional loads,
  2. Resilience does not require the integrity of conventional loads.

The first aspect compared in Table 5 is the scope of the analysis: adequacy and security focus on the power system per se. The interactions with environment can be considered by adjusting the failure rates of the components according to a rough classification of weather (adverse/normal) or climate conditions, but in a conventional reliability methodology the vulnerability of individual components is not modelled.

A resilience-based perspective enlarges the scope of the analysis so that it includes a model of the threats and of their interactions with the power system.

As far as extreme events are concerned, current grid codes [15] [40] require the evaluation of system response to a predefined list of contingencies passing from single outages to multiple plausible outages involving the loss of a double circuit, or busbar contingencies. For the sake of clearness, ENTSO-e and NERC approaches are presented in the subsequent paragraphs.

In [12], [14], [15] and ENTSO-e currently distinguishes the contingencies into normal, exceptional and out-of-range events:

  • Ordinary type of contingency, defined as “contingency of a single branch or injection” (a line, a generating unit, a transformer or two transformers connected to the same bay respectively, a Phase Shifter Transformer, a large voltage compensation installation, a DC link considered as a generating unit or a large consumer).
  • Exceptional type of contingency, defined as the “simultaneous occurrence of multiple contingencies with a common cause”. Some examples are the loss of a double line, which refers to two lines on the same tower over a long distance, the loss of a single busbar, the common mode failure with the loss of more than one generating units, including large wind production, common mode failure of DC links. These events are identified on the basis of the design of the network structure and of the event probability (potentially linked to special operational conditions like storm or maintenance).
  • Out-of-range type of contingency, defined as “the simultaneous occurrence of multiple contingencies without a common cause, or a loss of power generating modules with a total loss of generation capacity exceeding the reference incident”. Out-of-range contingencies are at least the independent and simultaneous loss of two lines, the loss of a total substation with more than one busbar, the total loss of a power plant with more than two generating units, the loss of a tower with more than 2 lines, severe power swinging or oscillations. In case of the occurrence of such an event, the system is in emergency condition and the resulting situation has to be dealt conforming to Synchronous Area Framework (SAFA) indications [13]. Out-of-range contingencies represent those extremely severe events for which the system is not designed.

In [40] NERC considers two categories:

  • Planning events (P0-P7), for which the power system is required to fulfil specific performance criteria; this category contains all the P0-P7 events defined in the Standard, passing from no outage (P0) or single outage (P1) to a busbar fault following by a stuck breaker (P7).
  • Extreme events include very widespread and impacting events due to weather phenomena (hurricanes, droughts, geomagnetic storms, etc.) and/or cyber-attacks. This set of events requires the time domain simulations of the sequence of events (cascading outages, protection interventions, etc.) following the initiating contingency, considering the normal fault clearing process.

From references [15], [40] it’s worth noting that limited relevance is given to extreme events in conventional reliability analyses, because only few plausible severe contingencies which are chosen based on TSOs’ experience are simulated in detail. Instead, extreme events are given a very high relevance in resilience analyses.

The selection of the contingencies defines another difference between the classical concept of reliability and resilience: in fact, current grid codes in US [40] and in Europe [13] indicate a predefined list of (single, possibly a few multiple) contingencies for which specific requirements in the system response must be met. This list is completed by operators on the basis of their operational experience. For all the contingencies of the set, the operators must verify the fulfilment of adequacy and security requirements. However, resilience analyses are focused on extreme events with possibly catastrophic impacts on power system service and infrastructure. These events are normally not considered in current reliability-centered design of the system: in fact, a complete enumeration of multiple contingencies would lead to combinatorial explosion problems; moreover, contingencies with a very high number of outaged components would inevitably lead to unacceptable impacts. Thus, the selection of multiple contingencies can be based on experience or from risk based analyses [41] where risk, interpreted as a quadruple {threat, vulnerability, contingency, impact}, is used to select the events to be investigated on the basis of the forecasted states of power system and influencing environment.

Consideration of time evolution is another aspect that differentiates reliability and resilience: in fact, adequacy/security requirements are checked in each system state occurring over the study horizon. The evolution of system states regards load and generation, as well as the failures and the return into service of components [42]. In resilience studies, time evolution is even more at the center of the analysis because – in addition to the factors mentioned for reliability - it includes the time evolution of the weather threat, of operators’ behaviour during emergency management and restoration phase and of the interactions between power systems, as well as the possible evolution of TSOs’ preparation and anticipation strategies by adapting their procedures on the basis of past events. Severe events can cause widespread damages to the system infrastructure, besides load disruptions, which explains why restoration and emergency management phases are given a special attention in resilience assessment phases.

As for the impact on the system, reliability assessment focuses on the potential lack of demand coverage due to insufficient generation/transmission capacity and/or problems of reserve for the most limiting contingencies (adequacy analysis) and on the potential instabilities or frequency/current/voltage violations in the power system response to contingencies (security analysis). Even if the return into service of components following a disturbance is modelled, the main scope of such analyses is to evaluate the quality of supply to customers (in terms of customer interruption times and/or energy not supplied). Resilience analyses are meant to evaluate the power system ability to deliver a degraded but still acceptable power supply service to customers also under extreme events, taking into account the potential effects that severe damages to the infrastructure can have on the power supply service. Moreover, resilience assessment should account for the interactions with human and natural environment (control center and maintenance personnel, weather conditions, etc.) during extreme events.

As for the acceptability criteria, adequacy and security must be met in a “strict” way, irrespective of the considered events. With resilience, instead, some degradation of the service is allowed, according to the severity and extension of the event.

Reliability addresses customer supply: adequacy studies evaluate the lack of generation and transmission facilities (reserve margins, etc.) to cope with load demand [42]-[44], while security studies evaluate the occurrence of instability phenomena or violations of currents/voltages or load disruptions in presence of contingencies, considering static and/or dynamic approaches [45].

Instead, resilience analyses do not only focus on the customer supply but also on the multiple aspects of the interactions between power system components and (natural and human) environment, thus assessing both the process of electric supply loss and recovery, and the process of infrastructure disruption and recovery. The response of human staff (operators, maintenance teams) at the various phases of the system response to disturbances is also a part of this process [46].

As for the modelling techniques, both reliability and resilience can be studied via probabilistic or deterministic approaches. Examples of deterministic methods to assess generation adequacy are the “reserve margin” and the “loss of largest unit” criteria applied to some significant operating conditions in the year (i.e. summer or winter peak); on the other side, probabilistic methods based on either Monte Carlo simulations or analytical methods (e.g. Markov processes), can be applied to assess load coverage accounting for the uncertainties of generation and transmission capacity, availability, load level, etc. [42]-[44].

Security analyses can also be carried out via either deterministic approaches (assuming a known operating condition) or probabilistic approaches which account for the uncertainties referring to the operating point (e.g. for the RES and load forecast errors especially in an operational planning context), the contingency characteristics (type and location of the fault), as well as the response of protection, defense and control systems (e.g. protection settings) [45].

Deterministic and probabilistic approaches can be applied to resilience analyses, but in this case the simulation framework also includes the models related to the threats, the relevant countermeasures, the recovery of damaged infrastructure, the potential interdependencies of power system with other critical infrastructures, as well as the human staff response [46][41]. Currently, deterministic studies of power system response to extreme event scenarios can be of great help for operators for the anticipation and the preparation phases where operators foresee and simulate likely extreme events scenarios. Of course, probabilistic methods can provide more information, allowing a risk-based ranking of extreme event scenarios, and helping operators prioritize the interventions on the system to cope with the most risky events. It’s worth pointing out that in any case the modelling of countermeasures is fundamental to quantify their costs and benefits to system resilience. All these modelling efforts highlight the multi-disciplinary nature of the resilience with respect to reliability and call for expertise from different fields of knowledge.

5. Remarks on the relationship between reliability and resilience

The introduction of the new resilience property definition is important to quantify the response of power systems to extreme events. However, given the widespread adoption of reliability related concepts in planning and operational practices of TSOs, it’s essential to identify the relationship of resilience with reliability and its components.

Currently, security and adequacy are the two necessary conditions to verify the classical property of “reliability” for a power system. However, assuring the security of the system in case of multiple outages, such as the ones produced by extreme events, may not be techno-economically viable in terms of design, planning, operation, and asset management requirements, whereas resilience criteria may be met.

Several entities and researchers have already proposed their views on this relationship.

For example, the US Federal Energy Regulatory Commission (FERC) indicates in [47] that “resilience is a component of reliability in relation to an event”.

The North American Electric Reliability Corporation (NERC) states in [48] that “a bulk power system that provides an adequate level of reliability is a resilient one”: this suggests that resilience is a necessary condition for a reliable system. Moreover, the definition of ALR “Adequate Level of Reliability” [49] of a Bulk Electric System (BES) proposed by NERC contains a list of system requirements which refer both to the “traditional” view of reliability (the system must respond satisfactorily to predefined disturbances) and to the property of resilience (the system must limit the performance degradation provoked also by HILP events and must recover effectively from blackout conditions). In 2013 NERC introduced the definition of ALR “Adequate Level of Reliability” [49] of a Bulk Electric System (BES). A BES with ALR must satisfies the following five requirements:

  1. BES does not experience instability, uncontrolled separation, cascading, or voltage collapse under normal operating conditions and when subject to predefined disturbances.
  2. BES frequency is maintained within defined parameters under normal operating conditions and when subject to predefined disturbances.
  3. BES voltage is maintained within defined parameters under normal operating conditions and when subject to predefined disturbances.
  4. Adverse reliability impacts on the BES following low probability disturbances (e.g., multiple contingences, unplanned and uncontrolled equipment outages, cyber security events, and malicious acts) are managed.
  5. The restoration of the BES after major system disturbances that result in blackouts and widespread outages of BES elements is performed in a coordinated and controlled manner.

Requirements 1 to 3 regard the system under normal operating conditions and predefined disturbances like those usually addressed by security criteria, whereas requirements 4 and 5 refer to very severe events like those addressed by resilience.

In the “State of Reliability” reports, NERC highlights the difficulty for BES operators in complying with points 4 and 5, stating that “For these less probable severe events, BES owners and operators may not be able to apply economically justifiable or practical measures to prevent or mitigate an adverse reliability impact on the BES even if these events can result in cascading, uncontrolled separation, or voltage collapse” [50]. Anyway, the ALR definition is considered as the basis of long term reliability assessment reports by NERC in the last years (see for example [51]). Reference [52] states “Within the broad context of reliability defined by these indices, resiliency would appear as a component of reliability. Resilience relates to restorability and speed of restoration”, highlighting that available reliability tools should consider the resilience of the system, by adequately modeling repair and restoration processes [52][53].

Operators already do their best to assure system’s survival to severe events, by applying corrective and preventive actions. Defense plans and restoration plans are elaborated in order to make system survive very severe events and fast recover from them. The introduction of the resilience concept integrates the defense and restoration plans into a wider framework for resilience enhancement which is based on the key actionable measures deployed on different time frames (from planning to operation), including organizational aspects of TSOs. This paves the way to the rational elaboration of coordinated plans aimed to enhance this property at different levels. In this context, following also NERC’s and FERC’s indications, the concept of “resilience” is applied to all those measures which assure the limitation of system degradation in case of extreme events.

As indicated in [54] reliability is related to the ultimate goal of the power systems, i.e. providing electricity to the customers within specific standards for the service supply. The introduction of resilience property permits to specify the “standards for the service supply” in terms of maximum duration and severity of degraded performance of the system when it is struck by very severe events for which security property cannot be assured at reasonable costs. This enables the definition of reliability standards also for extreme events.

In fact, a reliable system must have enough available (generation and transmission) resources to cover the load with a suitable margin (adequacy), in case of the most limiting contingencies, it must withstand disturbances (security), i.e. continues supplying all the customers fulfilling suitable requirements for the electricity supply, at least for the category of contingencies which satisfies a credibility criterion, but it can limit its functional degradation (resilience) for those contingencies which do not satisfy the credibility criterion adopted by the TSO (extreme events).

The response of a power system under (either credible or extreme) events depends not only on the features of the threat (intensity, location, extension, etc.) but also on the intrinsic characteristics of the power system (physical characteristics of the infrastructure, available systems for control, defense, automation and protection, operating condition, in turn depending on load level, network topology, generation and import/export patterns, etc.). A reliable system should satisfy the “standards of quality of supply” over the long run, which implies the capability to have a satisfactory response to (either single and multiple) contingencies for the operating points over the analysis period, so that the reliability indicators (such as EENS) computed typically on a yearly basis stay below a threshold established in operational standards.

To attain this general goal (i.e. providing energy with given standards over the long run) the system must be also resilient - i.e. limit its degradation within given standards - in presence of the extreme events, for the operating points over the same analysis period, considering the potential interactions between operating points and the incumbent extreme events (e.g. potential cut-out of wind farms due to windstorms, the reduction of thermal generation output due to prolonged droughts and/or high ambient temperature, and load demand increase at the time of temperature peaks). This requires the completion of the reliability standards by introducing resilience-informed metrics and setting suitable admissible values for these metrics.

6. Conclusions

After a comprehensive overview of the definitions available in the literature, the paper has presented a new definition of power system resilience discussed in CIGRE WG C4.47. This definition provides a detailed characterization of the action of the disruptive event in terms of geographical extension and severity of the effects. Moreover, the property resilience and the key actionable measures which make a power system resilient are defined separately in the proposed definition. Unlike well-established properties like reliability, resilience is a dynamic multifaceted concept which focuses on extreme (also HILP) events, on the evolution of threats over the time and the interdependence among different critical infrastructures.

The paper has also compared resilience against reliability, adequacy and security under different aspects (e.g. scope of analysis, selection of contingencies): the increasing attention to power system responses to extreme events implies that resilience is a necessary requirement to achieve a satisfactory quality of supply in modern power systems and resilience informed metrics have to be considered in the definition of the standards for a reliable supply of electricity. It is envisaged that resilience will change more and more the paradigm related to the planning and the operation of power systems.

Looking forward, evolving the concept of resilience into engineering terms will require mathematical and probabilistic modeling and various techniques to assess and capture the value of resilience to aid decision making in cost-benefit analyses.

7. Appendix: Proposal of models to describe the relationship between reliability and resilience

The relationship between resilience and reliability is at the center of intense discussions among the experts in the electricity sector, considering all the elements of similarity and diversity highlighted in subsection 4.2. Consequently, different models of relationship have been proposed.

The following subsections present the most promising models discussed within CIGRE WG’s. This does not exclude that other CIGRE WGs have different views on the relationship between these two properties.

Presented models highlight several aspects:

  • “Resilience as a sub-property of reliability”: focus on the links between the properties of reliability, adequacy, security, and resilience,
  • “Resilience and reliability as overlapping circles”: focus on the differences between the types of events considered and the measures taken,
  • “Reliability as a metric of resilience”: exploits reliability as a metric for system performance.

Few common elements can be found in all the presented models:

  • the introduction of resilience property intends to fill the gap related to extreme events which are not typically covered in conventional reliability analyses. It enlarges the power system final goal (provide electricity to customers in a satisfactory manner) also to extreme events.
  • Resilience determines a change of perspective in the way how planning and operation of modern power systems are performed by utilities. In fact, the need to cope with rare events with very severe (up to catastrophic) impacts implies the need to analyse the interdependencies among critical infrastructures and modifies the priorities identified by reliability centered tools for grid investments and system operational planning.

7.1. Resilience as a sub-property of reliability

A first relationship model derives from a careful analysis of the proposed definitions for reliability and resilience. This model provides a consistent framework to characterize the system performance in presence of any type of events (credible and extreme events).

Firstly, from subsection 4.2 it’s worth noticing that reliability can be interpreted as a fundamental property which does not refer to any specific application criterion (no reference to credible or extreme events).

Then, in principle, reliability should be applied both to credible and extreme events. Therefore, under this broader perspective, reliability can be decomposed into three components: adequacy, security and resilience (see Fig. 1).

In this perspective, a reliable system must have enough available generation and transmission resources to cover the load with a suitable margin (adequacy), it must withstand disturbances (security), i.e. continues supplying all the customers fulfilling suitable requirements for the electricity supply, at least for the category of contingencies which satisfy a credibility criterion, but it must also limit its functional degradation (resilience) for those contingencies which do not satisfy the credibility criterion adopted by the TSO (extreme events).

Figure 1 - First model of relationship between reliability and resilience: resilience as a third sub-property of reliability

This extension of the classical reliability model with the addition of resilience as a sub-property of reliability is further justified by the following considerations.

Adequacy and security are necessary conditions to verify the classical property of “reliability” for a power system. However, assuring the security of the system in case of multiple outages such as the ones produced by extreme events leads to excessive costs in terms of design, planning, operation, and maintenance.

If the reliability concept did not include resilience, as in the classical approach, then a power system could not be defined reliable in case of extreme events. In fact, security, which requires no loss of conventional loads, is not assured for extreme events but only for credible contingencies, due to the resulting excessive costs.

Instead, stating that a necessary condition for a system to be reliable is that the system must have a sufficient resilience to extreme events leads to several advantages:

  1. international definitions of reliability which do not include any references to the types of outages, are satisfied, including the recent definition proposed by CIGRE WG C1 [8][9].
  2. the definitions of security and adequacy remain unaltered.
  3. a clear link between resilience and known concepts is established.

Note that a system which is resilient is not necessarily reliable (because it may not respect the requirements concerning the service supply, e.g. the maximum number and duration of supply interruptions, in case of credible contingencies). Moreover, a system which is reliable with respect to credible events (i.e. adequate and secure, according to the conventional definition of reliability) may not be resilient to extreme events. However, a reliable system must have a sufficient level of resilience to extreme events, otherwise it would not satisfy the general definition of reliability (i.e. able to deliver electricity within accepted standards and in the amount desired) under extreme events.

As a conclusion, according to the abovementioned broader interpretation of reliability, a reliable system must be adequate, secure for credible contingencies and it must be enough resilient to extreme events. This statement holds valid for any credibility criterion adopted by a TSO.

This perspective is supported also by FERC – Federal Energy Regulatory Commission –in [47] and by NERC in [50] and by some academics [52][53].

7.2. Resilience and reliability as overlapping circles

The model presented afterward highlights the differences between the types of events considered and the measures taken. According to the present model, reliability and resilience are distinctive and complementary concepts as illustrated in Fig. 2 by the two overlapping circles.

Figure 2 - Second model of relationship between reliability and resilience: two overlapping circles.

This model helps classifying the different sets of disturbances and performance requirements as related to reliability and resilience that they intend to meet. In this sense, the diagram not only represents the total probability space of disturbance (credible and extreme events), but the relationship between reliability and resilience in term of methodology, measures, and metrics:

  • in classical reliability studies (circle A) the failure and success of components or a system are monitored to measure the performance of the given tasks or functions of the system. For instance, the number of failures which occur over a given period of time describes the failure rate of the components or system in not meeting the performance expectation. The quantification of reliability is described as the measurement of success or failure of the power system, using three types of indices, namely: load, customer, and energy indices.
  • in resilience studies (circle B) the events are normally characterized as extreme events, i.e. events that create a condition beyond the normal design criteria for reliability consideration. Extreme events are often characterized by low probabilities (HILP events) which are analysed with probabilistic assessment methods: modelling such probabilities may be a tough task due to the rarity of the events.
  • the intersection of the two circles suggests that conventional reliability assessment methods must be updated considering the growing need (also on the basis of recent regulations) to deal with extreme events, thus leading to a coordinated risk-based optimization of reliability and resilience in power systems.

The overlapping area of the reliability and resilience circles can also be described as a risk-mitigation region in which both reliability and resilience performance requirement goals are met. The goal is to expand the overlapping area by means of integrated reliability and resilience planning processes to optimise both reliability and resilience investment decisions.

In the context of extreme events, conventional investment decision methods may be inadequate because they tend to skew the investments towards a high revenue stream and high frequency of occurrence, typically ignoring the interdependencies at the system level. Another complication is that regulatory mechanisms normally focus on the efficiency pressures of utilities on the operational investment cost. This ignores the opportunity cost of enhancing utilities’ coping capability (i.e., response and recovery capabilities) with these extreme incidents.

7.3. Reliability as a metric of resilience

From definitions in Table 1, reliability is defined as the degree to which the performance of the elements in a bulk system result in electricity being delivered to customers within accepted standards and in the amount desired. The degree of reliability may be measured by the frequency, duration, and magnitude of adverse effects on the electric supply. Reliability indices typically consider such aspects as the number of unsupplied customers, the duration of the interruption or the amount of power interrupted, the frequency of interruption. The three most common are referred to as SAIFI, SAIDI, and CAIDI, defined in IEEE Standard 1366. Reliability is therefore a measurable quantity that can be used as a design parameter for the power system as a target quantity.

By comparing the definitions and the understanding of reliability and resilience, it can be concluded that resilience to different types of incidents, from simple failures to natural disasters, can be measured by reliability metrics, as shown in the conceptual framework in Fig. 3.

Figure 3 – Third model of relationship between reliability and resilience from CIGRE SC C2. Circle represents set of operational/environmental incidents/events

Accordingly, reliability and resilience concepts could be linked as follows: the power system is designed and operated for a set of incidents/events in such a way that it reaches a desired reliability level. Thus, a resilient power system has different reliability degree for different kind of incidents, from credible to non-credible events.

Resilience is given when after such an event the power system is able to return to an acceptable operation within an acceptable (finite) time after an event. How well that succeeds, that is, how long parts of the power system are thereby in an unacceptable condition is described over the measure of the resilience, which can be e.g. “high” or “low”. As technical measurable value one of the reliability indices can be consulted. Fig. 3 visualizes this interconnection between resilience and reliability. Each circle represents a set of incidents in a power system.

The inner circle defines the set of incidents (credible events) for which the power system is designed to achieve a reliability level of almost 100%). This includes the provision of a suitable generation and transmission infrastructure to satisfy the demand with margins (adequacy) and of a satisfactory operation (security in case of credible contingencies). For this set of incidents, the power system fully reliable. For non-credible events the power system should guarantee the supply after a defined time following an incident, the system is still resilient but less reliable.

This model considers that resilience is a necessary condition for power system reliability, as stated by model 1 and NERC documentation. However, according to this model unlike the new resilience definition discussed in this paper, resilience concept is applied to both credible and non-credible (extreme) events.

Acknowledgment

The authors would like to thank CIGRE C4.47 Power System Resilience working group and all members of C4, C2 and C1 WG’s for the useful discussions.

This work has been financed by the Research Fund for the Italian Electrical System under the Three-Year Research Plan 2022-2024 (DM MITE n. 337, 15.09.2022), in compliance with the Decree of April 16th, 2018.

References

  1. M. Panteli, G. Mancarella, “Influence of extreme weather and climate change on the resilience of power systems: Impacts and possible mitigation strategies,” Electric Power Systems Research, vol. 127, pp. 259-270 October 2015.
  2. Holling, C. S. 1973. Resilience and stability of ecological systems. Annual Review of Ecology and Systematics 4:1-2
  3. C. Perrings, "Resilience and sustainable development," Environment and Development Economics, vol. 11, pp. 417-427, 2006.
  4. E. Hollnagel, et al., Resilience engineering: Concepts and precepts, Ashgate Publishing, Ltd., 2007.
  5. M. Chaudry, P. Ekins, K. Ramachandran, A. Shakoor, J. Skea, G. Strbac, X. Wang, J. Whitaker, “Building a Resilient UK Energy System”, UKERC, Tech. rep. REF UKERC/RR/HQ/2011/001, April 2011.
  6. MCEER (Multidisciplinary and National Center for Earthquake Engineering Research), “Engineering Resilience Solutions - From earthquake engineering to extreme events”, Technical report MCEER-08-SP09, December 2008.
  7. IEC, “Glossary of terms”, 2002. Available: http://std.iec.ch/glossary.
  8. CIGRE WG C1.27, "The future of reliability – definition of reliability in light of new developments in various devices and services which offer customers and system operators new levels of flexibility", Technical Brochure no. 715, Jan 2018.
  9. CIGRE WG C1.27, “The future of reliability”, reference paper, Electra no. 296, pp. 1-3, Feb 2018.
  10. IEEE PES, “Definition of Terms”, technical report, 2004.
  11. IEEE/CIGRE Task Force on Stability, «Definition and Classification of Power System Stability,» IEEE Transactions on Power Systems, vol. 19, n. 3, pp. 1387-1401, May 2004.
  12. EU Commission Regulation (EU) 2015/1222 of 24 July 2015 establishing a guideline on capacity allocation and congestion management, 2015.
  13. NERC, “Glossary of Terms”, technical report, last update March 2022.
  14. ENTSO-E, “Synchronous Area Framework (SAFA) Documentation, 2019. Available at the link: https://www.entsoe.eu/publications/system-operations-reports/#continental-europe-synchronous-area-framework-agreement  
  15. EU Commission Regulation (EU) 2017/1485 of 2 August 2017 establishing a guideline on electricity transmission system operation, 2017.
  16. ENTSO-E, “All TSOs Biennial Progress Report on Operational Probabilistic Coordinated Security Assessment and Risk Management”, December 2021
  17. CIGRE WG C4.601, “Review of the current status of tools and techniques for risk-based and probabilistic planning in power systems”, Technical Brochure no. 434, pp 1-132, March 2010.
  18. Methodology for the European resource adequacy assessment in accordance with Article 23 of Regulation (EU) 2019/943 of the European Parliament and of the Council of 5 June 2019 on the internal market for electricity, 2 Oct 2020. Online
  19. Redefining Resource Adequacy Task Force, “Redefining Resource Adequacy for Modern Power Systems,” 2021, Reston, VA: Energy Systems Integration Group., Tech. Rep. ESIG. Online
  20. U. Shahzad, “Probabilistic Security Assessment in Power Transmission Systems: A Review”, Journal of Electrical Engineering, Electronics, Control and Computer Science – JEEECCS, Vol. 7, no. 26, pp. 25-32, 2021
  21. R. Moreno et al., "From Reliability to Resilience: Planning the Grid Against the Extremes," IEEE Power and Energy Magazine, vol. 18, no. 4, pp. 41-53, July-Aug. 2020
  22. M. Panteli, R. Moreno, D. Trakas, M. Jamieson, P. Mancarella, G. Strbac, and N. Hatziargyriou, “Enhancing the infrastructure and operational resilience of power systems against wildfires”, Electra 323, August 2022
  23. Y. Y. Haimes, “On the definition of resilience in systems”, Risk Analysis, Vol 29, no. 4, pp. 498–501, 2009.
  24. A. R. Berkeley III, M. Wallace, “A Framework for Establishing Critical Infrastructure Resilience Goals - Final Report and Recommendations by the Council”, National Infrastructure Advisory Council (NIAC) report Tech. Rep., October 19, 2010.
  25. UK Cabinet Office, “Keeping the Country Running: Natural Hazards and Infrastructure - A Guide to improving the resilience of critical infrastructure and essential services”, pp. 1-98, October 2011.
  26. T. J. Overbye, V. Vittal, I. Dobson, "Engineering Resilient Cyber-Physical Systems", PSERC (Power Systems Engineering Research Center) Tech. Rep. 12-16, pp. 1-22, May 2012.
  27. M. Keogh and C. Cody, Resilience in Regulated Utilities, Washington, DC, USA, The National Association of Regulatory Utility Commissioners (NAURC) Tech Rep., pp. 1-17, Nov. 2013.
  28. Presidential Policy Directive 21: Critical Infrastructure Security and Resilience, Washington, DC, 2013.
  29. E D. Vugrin, D E. Warren, and M A. Ehlen, “A Resilience Assessment Framework for Infrastructure and Economic Systems: Quantitative and Qualitative Resilience Analysis of Petrochemical Supply Chains to a Hurricane,” Process Safety Progress, Vol. 30, no. 3, Sept 2011.
  30. Italian Ministry of Economic Development, “Strategia Energetica Nazionale 2017”, Consultation document, June 12, 2017.
  31. IEEE PES IEEE Task Force on Definition and Quantification of Resilience, “The Definition and Quantification of Resilience”, Technical Report PES-TR65, April 2018
  32. Brazilian Government, “Política Nacional de Segurança de Infraestruturas Críticas”, Decree 9753, November 2018.
  33. NATF (North American Transmission Forum), “Transmission system resiliency: an overview,” 2017. Online
  34. National Academies. “Disaster Resilience: A National Imperative”. Washington, DC: The National Academies Press. 2012. Online
  35. I. Linkov, J.M. Palma-Oliveira. Risk and Resilience, Amsterdam: Springer, 2017.
  36. E. Ciapessoni, D. Cirio, A. Pitto, M. Panteli, M. Van Harte, C. Mak, “Defining power system resilience”, CIGRE WG C4.47 reference paper, Electra, no. 306, pp.32-34, Oct 2019.
  37. CIGRE WG C2.25, “Operating strategies and preparedness for system operational resilience”, Technical Brochure nr. 833, 2021.
  38. M.A. Van Harte, M. Panteli, L. Pittorino, R Koch, “Utilizing Advanced Resiliency Planning within the Electrical Sector”, 2018 CIGRE session, Aug 2018, Paris., REF C4-124_2018
  39. M. Panteli, P. Mancarella, “The Grid: Stronger, Bigger, Smarter? Presenting a conceptual framework of power system resilience,” IEEE Power & Energy Magazine, vol. 13, no. 3, pp. 58-66, 2015.
  40. L. H. Fink and K. Carlsen, "Operating under stress and strain," IEEE Spectrum, vol. 15, no. 3, pp. 48-53, March 1978.
  41. NERC, Transmission System Planning Performance Requirements, Standard TPL 001-4 (Ver. 4). Nov. 26, 2014. Online
  42. E. Ciapessoni, D. Cirio, G. Kjølle, S. Massucco, A. Pitto and M. Sforna, "Probabilistic Risk-Based Security Assessment of Power Systems Considering Incumbent Threats and Uncertainties," IEEE Trans. on Smart Grid, vol. 7, no. 6, pp. 2890-2903, Nov. 2016.
  43. R. Billinton, R.N. Allan, Reliability Assessment of Large Power Systems, Boston, US: Kluwer Academic Publishers, 1988.
  44. R. Billinton and R-N. Man, Reliability Evaluation of Engineering Systems: Concepts and Techniques, 2nd Edition, New York: Plenum Press, 1992.
  45. R. Billinton and W. Li, Reliability Assessment of Electric Power Systems Using Monte Carlo Methods, New York: Plenum Press, 1994.
  46. CIGRE WG C4.601, “Review of on-line dynamic security assessment tools and techniques,” Technical Brochure nr. 325, 2007.
  47. M. Panteli, P. Mancarella, D. N. Trakas, E. Kyriakides and N. D. Hatziargyriou, "Metrics and Quantification of Operational and Infrastructure Resilience in Power Systems," IEEE Trans. on Power Systems, vol. 32, no. 6, pp. 4732-4742, Nov. 2017.
  48. FERC, “Comments of the North American Electric Reliability Corporation”, May 9, 2018.
  49. M. Lauby, “Panel II: Advancing Reliability and Resilience of the Grid”, presentation at FERC Reliability Technical Conference, July 31, 2018. Online
  50. NERC, “Definition: Adequate Level of Reliability for the Bulk Electric System”, Mar 2013. Online
  51. NERC, “State of Reliability”, Technical Report, 2021.
  52. NERC, “2021 Long term reliability assessment”, Technical Report, 2021
  53. C. Singh, “Is resilience different from reliability?”, professor’s notes. Online
  54. L. Yong, C. Singh, “A methodology for evaluation of hurricane impact on composite power system reliability”, IEEE Trans. on Power Systems, Vol. 26, no. 1, pp 145-152, 2011.
  55. A. Clark-Ginsberg, “What’s the difference between Reliability and Resilience?,” Stanford University, 2003.

Biographies

Emanuele Ciapessoni has been in Italian Power system research centres since 1990, when he joined CISE S.p.A. and then ENEL Research. Currently he is Leading Scientist and Chair of the Scientific Committee at Ricerca sul Sistema Energetico (RSE S.p.A.). His research interests include power system resilience and security, risk analysis and mitigation, wide area monitoring protection and control, restoration.
He is convenor of Italian Electrotechnical Committee CT65 on Industrial-process measurement, control and automation.  He also provides consultancy services to the Italian regulatory Authority for electric Energy and Gas in power system resilience and several energy-related topics. He is the scientific lead in the definition of the Terna-RSE methodology for resilience-oriented planning. He is an active member of the CIGRE Working Group C4.47 on power system resilience and of IEEE Working Groups on cascading failures.
Since 1990 he has coordinated several national and international projects on power system, dealing with the development and application of innovative approaches for system resilience and risk management. Currently he is involved in the EU project HVDC-WISE EU project dealing with reliability and resilience of power systems including HVDC grids.
He has published over 80 scientific papers and reports and he has been reviewer and session chair in several international conferences and reviewer for several journals.

Diego Cirio received the M.Sc. degree (1999) and Ph.D. degree (2003) in Electrical Engineering from University of Genoa, Italy. He leads the “Grid Development and Security” Research Group at “Ricerca sul Sistema Energetico - RSE S.p.A.” (Milano, Italy). He has been active in EU and national research projects on power system resilience, security, adequacy, risk, HVDC, flexibility, ancillary services, also supporting the Italian Authority for energy, the Ministry for Environment and Energy Security, and the TSO. He contributed to CIGRE WGs of SC C2/C4, IEEE WG on Cascading failure, IEA WGs on transmission systems (ENARD, ISGAN). He has been IEEE Senior Member since 2013.

Andrea Pitto got his M.Sc. degree (2005) and PhD (2009) in Electrical Engineering from the University of Genoa (Italy), where he worked as a research assistant in 2009-2010. He joined Ricerca sul Sistema Energetico – RSE S.p.A. in 2011. His research interests concern probabilistic risk-based approaches to power system resilience assessment and enhancement, cascading outage analysis, security assessment techniques. He was involved in the development of the methodology for resilience-oriented planning of the Italian TSO. Active member of the CIGRE Working Group C4.47 on power system resilience and of IEEE Working Groups on cascading failures and on common mode dependent outages. He has been IEEE Senior Member since 2016.

Malcolm Van Harte has 26 years of experience in the electric utility Transmission and Distribution industry. He holds an MSc in Electrical Engineering from the University of Cape Town and works in distribution as the Senior Manager for SMART GRID and Head of Network Operations Centre of Excellence (including Cyber Security, Data Analytics and Distribution Telecommunication Operations). He's also worked in risk and resilience, network planning, regional and national control centers, and network optimization. He has chaired or participated in a number of strategy projects, working groups, and study committees aimed at improving the reliability and quality of electricity infrastructure, including the National Blackout, Provincial Transmission Risk Workshops, Network Planning, Network Performance, and Quality of Supply. He has led and participated in a number of strategic initiatives aimed at strengthening Eskom's resilience capabilities, including disaster management, business continuity, organizational resilience, and enterprise risk management. He is now working with the implementation of a new Distribution operating model with additional capabilities such as Distribution System Operator and Energy Trader.

Mathaios Panteli holds an Assistant Professor position within the Department of Electrical and Computer Engineering, University of Cyprus, since January 2021, and an Honorary Lecturer position at the Department of Electrical and Electronic Engineering, Imperial College London since September 2022. Prior to joining UCY, he was a Lecturer at the Power and Energy Division of The University of Manchester, serving as the Deputy Lead of the Sustainable Energy Systems research cluster. His academic qualifications include an M.Eng. degree from Aristotle University of Thessaloniki, Greece, in 2009, and a Ph.D. degree in Electrical Power Engineering from The University of Manchester, U.K., in 2013. His main research interests include techno-economic reliability, resilience and flexibility assessment of future low-carbon energy systems, grid integration of renewable energy sources and integrated modelling and analysis of co-dependent critical infrastructures. Mathaios is an IEEE Senior Member, an IET Chartered Engineer (CEng), the Chair of the CIGRE working group C4.47 “Power System Resilience” and the CIGRE Cyprus National Committee, an invited member of multiple IEEE, CIGRE and CIRED working groups, and a Fellow of the Higher Education Academy (UK). He serves as an Associate Editor in IEEE Transactions on Sustainable Energy and he is the recipient of the prestigious 2018 Newton Prize.


  • [1] Expression “High Impact, Low Probability” is equivalent to “High Impact Low Frequency” adopted in [37].

Power System Resilience: definition, features and properties

E. CIAPESSONI, D. CIRIO, A. PITTO, M. VAN HARTE, M. PANTELI

Top of page