Encyclopediav0

Seamless Redundancy

Last updated:

Seamless Redundancy

Seamless redundancy refers to a class of network protocols designed to provide uninterrupted data communication with zero recovery time in the event of a single network component failure [1][2]. Specifically, it encompasses two standardized protocols, the Parallel Redundancy Protocol (PRP) and High-availability Seamless Redundancy (HSR), which are both defined in the international standard IEC 62439-3 [1][3][3]. These protocols answer the critical need for highly reliable Ethernet networks, particularly in industrial automation and other real-time applications where even brief communication interruptions are unacceptable [2]. By guaranteeing continuous operation through redundant data paths, seamless redundancy protocols form a foundational technology for high-availability automation networks [1]. The core operational principle of both PRP and HSR is the duplication of every data frame and its simultaneous transmission over two independent network paths [3]. The receiving node identifies and discards the duplicate frames using sequence numbers embedded in the transmitted data, ensuring only one copy of each message is processed [3]. This mechanism allows for instantaneous failover with zero recovery time if one path fails, as the duplicate frame arriving via the alternate path ensures successful delivery [2][3][3]. While both protocols share this fundamental approach, they differ significantly in network topology and interoperability. HSR is designed for a ring topology, where nodes are connected in a closed loop and frames are sent in both clockwise and counter-clockwise directions [3][3][3]. In contrast, PRP operates over two separate, parallel Local Area Networks (LANs) of any topology, simply requiring a physical duplication of connections [3][3]. A key distinction is that PRP nodes can interoperate with standard, non-redundant Ethernet networks, whereas HSR networks typically cannot without a special gateway device [3][3]. The primary application for seamless redundancy protocols is in industrial communication networks and power utility automation, such as in substations, where system availability and deterministic real-time performance are paramount [1][2]. The significance of these protocols lies in their ability to meet the stringent reliability requirements of critical infrastructure without requiring complex network reconfiguration or experiencing packet loss during a failure [3]. In an HSR ring, standard two-port devices are known as Doubly Attached Nodes implementing HSR (DANHs), while standard Ethernet devices can connect to the ring via a Redundancy Box (RedBox), which acts as a DANH on their behalf [3]. The modern relevance of IEC 62439-3 protocols continues to grow as industries increasingly depend on networked control systems that demand the highest levels of resilience and predictable performance [1][3].Seamless redundancy, in the context of industrial communication networks, refers to a set of high-availability protocols designed to provide zero recovery time in the event of a single network component failure, ensuring uninterrupted data transmission for critical real-time applications [1][2]. These protocols are formally defined by the international standard IEC 62439-3, which specifies two principal methods for achieving this fault tolerance: the Parallel Redundancy Protocol (PRP) and the High-availability Seamless Redundancy (HSR) protocol [1][3][3]. Both operate on the fundamental principle of frame duplication, sending identical data packets over two independent network paths simultaneously, so that if one path fails, the transmission succeeds without any delay or data loss [3]. This capability to deliver seamless switchover with zero recovery time makes these protocols essential for environments where network downtime is unacceptable, such as in power utility automation, industrial process control, and transportation systems [2][3]. The core operational characteristic of seamless redundancy protocols is the duplication of every Ethernet frame across two ports, with each copy traversing a separate physical network path [3]. At the receiving node, a mechanism based on sequence numbers identifies and discards the duplicate frames, ensuring the application processes only a single instance of the data [3]. While PRP and HSR share this foundational methodology, they differ significantly in network topology and interoperability. PRP is designed for use with two parallel, independent Local Area Networks (LANs) of any configuration; nodes equipped with two network interfaces (Doubly Attached Nodes for PRP, or DANPs) connect to both LANs, while standard, singly attached nodes can coexist on either network [3][3][3]. In contrast, HSR is specifically architected for a ring topology, where each node is connected to its two neighbors, and frames are sent in both clockwise and counter-clockwise directions around the ring [3][3][3]. A key distinction is that PRP networks can interoperate with standard Ethernet devices, whereas HSR networks, composed of DANHs (Doubly Attached Nodes implementing HSR), typically cannot without a specialized gateway device known as a Redundancy Box (RedBox) [3][3]. The significance and primary application of seamless redundancy protocols lie in their ability to meet the stringent reliability requirements of modern industrial automation and critical infrastructure, often referred to as Operational Technology (OT) networks [2]. By guaranteeing continuous communication even during a fault, they support the real-time deterministic operation essential for substation automation, factory automation, and other process industries where a network failure could lead to safety hazards, production losses, or widespread service disruptions [2]. The formal standardization within IEC 62439-3 ensures interoperability between equipment from different manufacturers, fostering widespread adoption and integration into high-availability system designs [1][3]. As industries continue to converge IT and OT systems and demand greater resilience, protocols like PRP and HSR represent a critical technological foundation for building robust, fault-tolerant networked control systems.

Overview

Seamless Redundancy refers to a class of high-availability networking protocols designed to provide fault tolerance in industrial Ethernet networks with zero recovery time following a single network element failure. These protocols, standardized within the IEC 62439-3 framework, are critical for applications where even brief communication interruptions can lead to significant operational, safety, or financial consequences, such as in power substation automation, factory automation, and process control systems [10]. The core principle involves the active duplication of Ethernet frames across two independent communication paths, ensuring that at least one copy of each frame reaches its destination even if one path fails completely. The two primary protocols implementing this concept are the Parallel Redundancy Protocol (PRP) and the High-availability Seamless Redundancy (HSR) protocol, both specified in detail within the IEC 62439-3 standard [10].

Protocol Fundamentals and Operational Principle

Both PRP and HSR achieve seamless redundancy through a common fundamental mechanism: every outgoing Ethernet frame from a redundant node is duplicated and transmitted simultaneously over two separate network interfaces or ports. The receiving node, which is also configured for redundancy, listens on both of its ports. It accepts the first valid copy of a frame that arrives and discards the subsequent duplicate, a process managed through a sequence number and network identifier appended to the original frame in a dedicated redundancy trailer [10]. This design ensures that a failure in one network path—whether a broken cable, a failed switch, or a faulty port—does not interrupt data flow, as the duplicate frame traversing the alternate path is delivered without delay. The recovery time is effectively zero because no network reconfiguration, spanning tree recalculation, or failover signaling is required; the redundancy is active and transparent during normal operation [10].

Parallel Redundancy Protocol (PRP)

The Parallel Redundancy Protocol is characterized by its topology-agnostic design. A PRP node, known as a Doubly Attached Node with PRP (DANP), is equipped with two standard Ethernet ports, each connected to one of two completely independent and parallel Local Area Networks (LANs), conventionally termed LAN A and LAN B [10]. These two networks can have any topology (star, ring, mesh) and can be built using standard, non-redundant commercial off-the-shelf (COTS) switches and infrastructure. The key innovation of PRP is its ability to maintain interoperability with non-redundant, singly attached nodes (SANs) that are connected to only one of the two networks. A DANP sending a frame inserts the PRP trailer (Redundancy Control Trailer, RCT) and transmits identical copies onto both LAN A and LAN B. A receiving DANP uses the RCT to detect and discard duplicates. A SAN on LAN A receives the frame from LAN A without any special trailer, as the DANP source strips the RCT for frames sent to a SAN. This allows for a mixed network where critical devices use PRP for high availability while less critical devices use a standard single connection, providing a cost-effective and flexible redundancy solution [10].

High-availability Seamless Redundancy (HSR)

High-availability Seamless Redundancy, defined in IEC 62439-3 Clause 5, operates on a fundamentally different network topology: a closed ring [11]. An HSR node has at least two Ethernet ports interconnected to form a ring. Unlike PRP's independent parallel networks, HSR creates a single logical network where redundancy is inherent to the ring structure. Each node forwards traffic from one port to the other, acting as a bridge. When an HSR node originates a frame, it sends one copy out of its "Port A" (counter-clockwise direction) and an identical copy out of its "Port B" (clockwise direction) [11]. These frames circulate the ring in opposite directions. Every node in the ring receives both copies—one from each direction—accepts the first arrival, and forwards the other. The destination node accepts the first valid frame and discards the later duplicate. This ring-based approach eliminates the need for external switches, as the nodes themselves form the network fabric. A major advantage is that a failure of any single link or node port in the ring is immediately bypassed because frames can still travel the alternative path around the ring, maintaining full connectivity between all remaining nodes with zero recovery time [11].

Comparative Analysis: PRP vs. HSR

The choice between PRP and HSR involves a trade-off between topological flexibility, infrastructure requirements, and performance characteristics. PRP's primary advantage is its flexibility; it can be overlaid onto existing dual-network infrastructures and supports gradual integration with non-redundant equipment [10]. However, it requires two completely separate sets of network cabling and active equipment (switches), effectively doubling the physical network infrastructure. HSR, in contrast, is more efficient in its use of cabling, requiring only a single ring topology, and eliminates the need for external switches, potentially reducing cost and complexity [11]. Its limitation is the mandatory ring topology and the requirement that all nodes in the ring must be HSR-capable. Furthermore, the ring topology introduces a specific consideration: because every frame travels the entire ring in both directions, the aggregate network load is higher than in a PRP network for the same amount of application data. A four-node HSR ring carrying unicast traffic between two nodes will see that traffic pass through and be processed by all intermediate nodes, increasing the total frame processing burden across the network compared to a switched PRP network where traffic is confined to the path between source and destination switches.

Technical Implementation Details

The seamless operation of both protocols relies on a precisely formatted redundancy control trailer appended to standard Ethernet frames. For PRP, this is the RCT, which includes:

  • A 16-bit sequence number, incremented for each new frame from a source, allowing the receiver to identify duplicates and detect lost frames. - A 16-bit LAN identifier (0 for LAN A, 1 for LAN B). - A 4-bit PRP suffix and a 16-bit frame size field used for integrity checking. HSR uses a similar but distinct HSR tag, which includes a path identifier and sequence number. Both trailers are added after the original Ethernet payload and before the Frame Check Sequence (FCS). The original FCS is recalculated to cover the new frame including the trailer. A critical function within redundant nodes is the duplicate discard algorithm. This algorithm maintains a history of recently received sequence numbers per source node and network identifier. When a frame arrives, the node checks if a frame with the same source address and sequence number has already been received and accepted from the opposite port (or network). If it has, the new frame is silently discarded. This mechanism must operate at line rate to handle worst-case traffic scenarios without introducing delay or jitter. The standard defines maximum network sizes and propagation delays to ensure the duplicate discard algorithm functions correctly; for instance, the difference in arrival time between the two copies of a frame must be less than the time window maintained by the receiver's duplicate acceptance table.

History

The development of seamless redundancy protocols for industrial Ethernet networks emerged from the critical need for fault-tolerant communication in sectors like power utility automation, factory automation, and transportation systems, where network failures could lead to significant safety hazards or economic losses. The foundational work was standardized in the International Electrotechnical Commission's (IEC) IEC 62439 series, specifically within Part 3, which defines protocols for high-availability automation networks [11]. The parallel development of two primary protocols—High-availability Seamless Redundancy (HSR) and Parallel Redundancy Protocol (PRP)—addressed the requirement for zero recovery time upon a single network fault, but through distinct architectural philosophies [12].

Early Standardization and Protocol Genesis (circa 2000-2010)

The initial framework for these protocols was established with the publication of IEC 62439. The core technical challenge was to design a redundancy mechanism that was completely transparent to the application layer, ensuring no data loss or communication interruption during a link or node failure. This required moving beyond traditional spanning-tree protocols, which involve re-convergence delays measured in seconds, to a model with deterministic, sub-millisecond failover. PRP, defined in IEC 62439-3 Clause 4, was conceived as a network-agnostic solution. Its fundamental innovation was the Doubly Attached Node for PRP (DANP), a device equipped with two independent network interfaces connected to two separate, parallel Local Area Networks (LANs) [11]. As noted earlier, PRP's primary advantage is its flexibility. The protocol operates by having the DANP's redundancy layer, the Link Redundancy Entity (LRE), create and send identical frames over both networks. A key historical design decision was the inclusion of a Redundancy Control Trailer (RCT) appended to each frame, containing a sequence number and LAN identifier, which enabled the receiving node's LRE to identify and discard duplicates. This architecture allowed PRP to be retrofitted onto existing dual-network infrastructures, supporting a gradual migration from non-redundant systems. Concurrently, HSR was defined in IEC 62439-3 Clause 5 with a more specialized topology in mind: the HSR ring [11]. This design was particularly suited for applications like substation automation, where a physical ring topology could be efficiently deployed. The analogous node in HSR is the Doubly Attached Node implementing HSR (DANH), which connects to both directions of the ring [11]. In this closed-loop system, a DANH sends duplicate frames in both clockwise and counter-clockwise directions. Building on the concept discussed above, every node in the ring receives both copies. The protocol mandates that all nodes, including intermediate ones, forward frames, ensuring path diversity. A significant historical challenge in HSR's development was managing this constant traffic flow within the ring to prevent broadcast storms while guaranteeing delivery.

Network Integration and the Role of Redundancy Boxes

A critical milestone in the practical adoption of both protocols was the specification for integrating legacy or simpler devices that lacked dual-port redundancy capabilities. The standard addressed this through the concept of Singly Attached Nodes (SANs), which possess only a single network interface [11]. To connect a SAN to a redundant network, a gateway device known as a Redundancy Box (RedBox) is employed. The RedBox serves a dual function:

  • For HSR networks, it attaches to the ring as a DANH, receiving and forwarding frames on behalf of the SANs connected to it [11]. - For PRP networks, it attaches to both parallel LANs as a DANP [11]. In both cases, the RedBox acts as the redundancy proxy for all traffic for which its attached SANs are the source or destination, effectively allowing non-redundant devices to participate in a high-availability network [11]. This design was historically vital for enabling phased upgrades and protecting investments in existing equipment.

Algorithmic Development and Implementation Challenges

Beyond the frame forwarding rules, a substantial portion of the protocols' development history involved designing the duplicate discard algorithms. The IEC standard mandates that the algorithm must guarantee no duplicates are passed to the upper layers but does not prescribe a specific implementation, allowing for optimization [12]. This led to the development of various methods for managing the Duplicate Discard Table (DDT). Key historical implementation challenges included:

  • Designing efficient hash tables or content-addressable memory (CAM) to store frame identifiers (typically source MAC address, sequence number, and LAN ID) for the duration of a duplicate detection time window. - Optimizing the DDT lookup and update process to operate at wire speed for full-duplex Gigabit Ethernet or faster. - Developing distinct DDT logic for different data paths: Port-to-Host (for frames destined for the node itself, used in both HSR and PRP) and Port-to-Port (for frames being forwarded, specific to HSR ring operation) [12]. - Implementing data integrity checks, such as Cyclic Redundancy Check (CRC) verification, during port-to-port forwarding in HSR to prevent error propagation, with complexities arising in cut-through switching modes [12].

Open-Source Integration and Hardware Offload (2010-Present)

A significant evolution in the history of HSR and PRP has been their integration into open-source operating systems, dramatically increasing accessibility and lowering implementation barriers. The Linux kernel began incorporating native support for both protocols, allowing them to function over standard Ethernet ports in software [12]. This demonstrated that the redundancy functionality could be achieved without specialized hardware, though with a computational cost on the main CPU. The subsequent advancement was the development of hardware offload solutions to improve performance and determinism. A prominent example is the use of the Programmable Real-Time Unit and Industrial Communication Subsystem (PRU-ICSS) found on certain System-on-Chip (SoC) processors. Dedicated PRU-ICSS firmware was created to handle the core protocol operations—frame duplication, RCT insertion/removal, DDT management, and forwarding logic—directly in the industrial Ethernet subsystem [12]. The historical advantage of this offload model is substantial:

  • It frees the main CPU and system bandwidth from the continuous processing of redundancy traffic. - It provides more deterministic, low-latency forwarding as the protocol operates in dedicated firmware closer to the hardware. - It allows the main processor to dedicate its resources to application-level tasks, making the solution scalable for complex nodes [12].

Ongoing Refinement and Future Trajectory

The history of HSR and PRP continues to be written through ongoing refinement. Recent discussions, such as those at the Linux Plumbers Conference 2024, focus on optimizing the kernel implementations, enhancing diagnostics and network management capabilities, and standardizing configuration interfaces [12]. Research areas include improving the scalability of DDTs for networks with very high frame rates, refining time synchronization mechanisms over redundant paths, and exploring convergence with other time-sensitive networking (TSN) standards. The evolution from proprietary, hardware-centric solutions to standardized, open, and offload-accelerated implementations marks their journey from specialized industrial protocols to accessible components of robust, deterministic networked systems.

Description

Seamless Redundancy refers to a class of high-availability network protocols defined within the IEC 62439 standard series, specifically in IEC 62439-3, which are engineered to provide fault-tolerant Ethernet communication with zero recovery time in the event of a single network failure [13][7][14]. The standard outlines two principal protocols that achieve this seamless failover: High-availability Seamless Redundancy (HSR) and Parallel Redundancy Protocol (PRP) [11]. Both protocols operate on the fundamental principle of frame duplication, where each data frame is transmitted simultaneously over two independent, parallel network paths using two separate Ethernet ports on each node [7]. The receiving node is responsible for accepting the first valid frame arrival and discarding the subsequent duplicate, ensuring uninterrupted data flow even if one path is completely severed [13]. This approach to redundancy is protocol-agnostic, operating independently of the upper-layer application protocols, which distinguishes it from many other industrial Ethernet solutions specified in IEC 61158 and IEC 61784 [13].

Protocol Architecture and Standardization

The IEC 62439 standard series, developed by the IEC SC65C working group 15, is dedicated to defining redundancy methods for switched Ethernet networks in industrial automation environments [7]. The third part of this series, IEC 62439-3, is the core document specifying HSR and PRP. The standard has undergone several revisions since its inception; the first edition was published in 2010, followed by a second edition in July 2012, and the most current fourth edition was published in December 2021 [11][10]. These updates have incorporated technical improvements and clarifications to the protocols [11]. The standard's scope covers technologies applicable to a variety of industrial networks, with different solutions offered depending on the required real-time behavior and level of redundancy [7]. The current edition, IEC 62439-3:2021, falls under the International Classification for Standards (ICS) categories 25.040.40 (Industrial automation systems) and 35.100.05 (Multilayer applications) and has a stability date set for 2028 [10].

High-availability Seamless Redundancy (HSR)

HSR is designed for networks configured in a closed ring topology [11]. In an HSR network, every node is a Dual Attached Node for HSR (DANH), meaning it is connected to two adjacent nodes in the ring via its two Ethernet ports [7][7]. When a DANH originates a frame, it sends a copy out of each port, causing the frame to travel in both clockwise and counter-clockwise directions around the ring. Every node in the ring, including intermediate nodes, receives and forwards both copies. A key network element in HSR is the RedBox, which connects single-attached nodes (SANs) or other network types to the HSR ring. RedBoxes can operate in several modes:

  • HSR-SAN mode: Connects standard, non-redundant SAN devices to the HSR ring [11]. - HSR-PRP mode: Acts as a bridge between an HSR ring and a PRP network [11]. - HSR-HSR mode: Connects two separate HSR rings using a four-port device known as a QuadBox [11]. HSR DANH nodes support several operational modes defined by the standard, including H, T, U, and N modes, which can be changed at runtime [7]. The implementation also supports cut-through switching, a forwarding technique that reduces latency by beginning to forward a frame before it has been fully received, after the destination address has been read [7].

Parallel Redundancy Protocol (PRP)

In contrast to HSR's ring topology, PRP employs a parallel, duplicated network infrastructure. A PRP node, known as a Dual Attached Node for PRP (DANP), is attached to two completely independent and parallel Local Area Networks (LANs), designated as LAN A and LAN B [7][7]. The DANP transmits identical frames simultaneously onto both networks via its Port A and Port B [7]. The networks can be of any topology (star, ring, mesh) as they are standard IEEE 802.1 Ethernet networks, and the redundancy is achieved solely through the duplicated infrastructure and the DANP's behavior. Building on the concept discussed above, PRP's primary advantage is its flexibility. A significant architectural benefit of PRP is that DANP nodes can interoperate seamlessly with standard, non-redundant Single Attached Nodes (SANs) that are connected to either one of the two parallel networks, allowing for gradual integration into existing infrastructures [11].

Implementation and Duplicate Discard Mechanism

Both HSR and PRP can be implemented in software, such as within the Linux kernel, to function over standard Ethernet ports [7]. However, specialized hardware offload, like that provided by a Programmable Real-Time Unit and Industrial Communication Subsystem (PRU-ICSS) with dedicated HSR/PRP firmware, can significantly enhance performance. This offloading moves protocol processing from the main CPU to firmware, conserving processing bandwidth for application tasks [7]. A critical function of the Link Redundancy Entity (LRE) in both protocols is the management of duplicate frames. The LRE must ensure that duplicate frames are not passed to the upper protocol layers to prevent unnecessary processing overhead [7]. The algorithm for discarding duplicates is not explicitly specified by IEC 62439-3, but the standard mandates that the algorithm must be deterministic and capable of handling out-of-order frame arrivals [7]. This is managed through a Duplicate Discard Table (DDT). The DDT is used on the port-to-host path in both HSR and PRP to filter duplicates delivered to the node's own applications [7]. In HSR, an additional DDT is required on the port-to-port path for frames being forwarded around the ring [7]. Furthermore, during port-to-port forwarding in HSR (except in cut-through mode), the implementation performs a data integrity check using the frame's Cyclic Redundancy Check (CRC) [7].

Technical Specifications and Node Operation

The reference implementation detailed in processor SDK documentation confirms operation in accordance with IEC 62439-3 Edition 2.0, supporting a 100 Mbits/s full-duplex Ethernet interface for both protocols [7]. As per the standard, HSR operates as a DANH (Clause 5) and PRP operates as a DANP (Clause 6) [7][7]. This delineation within the standard clauses provides the formal behavioral specification for each node type. The zero-recovery time characteristic, a defining feature of seamless redundancy, is achieved because the redundant frame is already in transit on the alternative path at the moment of a failure, requiring no convergence or re-routing delay [13]. This makes HSR and PRP particularly suitable for mission-critical industrial applications such as substation automation systems, where even millisecond communication interruptions are unacceptable [7].

Significance

The development of High-availability Seamless Redundancy (HSR) and Parallel Redundancy Protocol (PRP) represents a critical advancement in industrial and utility network design, fundamentally shifting the paradigm for achieving fault tolerance in time-sensitive applications. Their significance stems from their ability to provide deterministic, zero-recovery-time redundancy without relying on software-based failover mechanisms, a requirement paramount in sectors where even millisecond interruptions can lead to catastrophic failures, equipment damage, or safety hazards [15][7].

Hardware-Based Integration of Redundancy and Synchronization

A cornerstone of the protocols' significance is their architectural design, which fully integrates seamless redundancy and precise clock synchronization within hardware, typically Field-Programmable Gate Arrays (FPGAs) or specialized firmware processors, without requiring support from the main application processor [7]. This hardware-centric approach is not merely a performance optimization but a foundational requirement for deterministic behavior. The interaction between redundancy and synchronization is particularly profound: redundant synchronization messages, such as those for Precision Time Protocol (PTP), are not discarded as superfluous traffic. Instead, both copies are utilized to enhance clock accuracy and resilience. By processing multiple timing messages arriving via independent paths, a node can apply algorithms to filter out path-dependent delays and jitter, resulting in a more stable and accurate time reference, which is itself essential for coordinated actions across the network [15]. This symbiotic relationship ensures that the very mechanism providing data redundancy also reinforces the temporal integrity of the system.

Enabling Drop-Out Free Fault Tolerance for Critical Infrastructure

HSR and PRP were developed explicitly to fulfill the stringent requirement of drop-out free fault-tolerance in industrial automation, particularly in electrical power systems [7]. Traditional Ethernet redundancy protocols like Spanning Tree Protocol (RSTP) involve reconvergence times that are unacceptable for hard real-time systems. In contrast, HSR and PRP operate on the principle of proactive duplication, sending every Ethernet frame simultaneously over two independent network paths [15][7]. This design guarantees that a single fault—be it a broken cable, a failed switch port, or a malfunctioning node—does not interrupt data flow. The redundancy is managed at the Link Redundancy Entity (LRE) within each node, which handles the duplication on egress and the discarding of duplicates on ingress. This capability made its first major application in substation automation governed by the IEC 61850 standard, which explicitly references HSR for communication where high availability is required for protection and control functions [15]. The protocols thus form the communication backbone for modern digital substations, enabling features like sampled value (SV) and Generic Object Oriented Substation Event (GOOSE) messaging to continue uninterrupted during network disturbances.

Architectural Distinctions and Network Implications

The significance of having two standardized protocols (HSR and PRP) under IEC 62439-3 lies in their complementary approaches to topology, which address different deployment scenarios [15]. HSR is designed primarily for ring topologies, where every node is connected to two neighbors, forming a closed loop. In an HSR ring, each node acts as a "redbox" (redundancy box), receiving, forwarding, and discarding frames. A frame sent from a source node is duplicated and injected into both directions of the ring, ensuring it reaches the destination via two paths. As noted earlier, every node in the ring receives both copies. This ring-based approach is highly efficient for closed, dedicated systems like within a single substation bay or controller cabinet, and it can be extended to more complex topologies such as interconnected rings (rings of rings) [7][7]. PRP, conversely, operates on the principle of parallel, independent networks (Network A and Network B). PRP nodes, or "DANP" (Dual-Attached Nodes with PRP), possess two independent network interfaces attached to these separate LANs. A frame is duplicated and sent simultaneously onto both networks. The key architectural advantage of PRP is that the networks themselves are standard, non-redundant Ethernet LANs of any topology (star, mesh, etc.). Furthermore, PRP networks can include "Single-Attached Nodes" (SANs) that connect to only one network, allowing for gradual integration and interoperability with legacy non-redundant equipment [15]. This flexibility makes PRP ideal for plant-wide or geographically distributed systems where building two separate, parallel network infrastructures is feasible.

Implementation and Performance Determinism

The performance characteristics of HSR and PRP are heavily dependent on their implementation, which is why they are typically realized in dedicated hardware. When implemented in FPGAs or specialized firmware processors like the PRU-ICSS (Programmable Real-Time Unit and Industrial Communication Subsystem), the protocol operations—including frame duplication, duplicate discarding, and cut-through switching—are offloaded from the main CPU [7]. This hardware offloading provides several critical benefits:

  • Deterministic Latency: Processing in hardware eliminates jitter and delays introduced by software stacks and operating system scheduling.
  • High-Throughput, Low-Load: The host processor is largely relieved from the burden of managing redundancy traffic, freeing processing bandwidth for the actual application [7].
  • Efficient Cut-Through Switching (HSR): In HSR rings, nodes can forward frames before fully receiving them (cut-through switching), drastically reducing forwarding latency within the ring compared to store-and-forward methods [7]. The duplicate discard algorithm, though not specified in detail by the standard, is a critical hardware-managed function. The LRE maintains a Duplicate Discard Table (DDT) that uses a frame signature—typically the source MAC address and a 16-bit sequence number that increments for each frame and its duplicate—to identify and discard superfluous copies [7][7][7]. The standard mandates that the algorithm must never reject a legitimate frame, while occasionally accepting a duplicate is tolerable, prioritizing data integrity over perfect filtering [7]. In HSR, this discard mechanism is applied not only for frames destined to the host ("Port to Host") but also for frames being forwarded ("Port to Port") to prevent frames from looping indefinitely in the ring [7][7].

Foundation for Future Resilient Systems

Beyond their immediate application in power systems, the principles embodied by HSR and PRP hold significant importance for the future of industrial IoT (IIoT), automotive networks, and other domains requiring ultra-high reliability. The model of proactive, parallel transmission with hardware-managed redundancy presents a viable path for achieving the "zero downtime" demanded by next-generation autonomous systems. Their standardization under IEC 62439-3 ensures interoperability and provides a stable foundation upon which other time-sensitive networking technologies, such as Time-Sensitive Networking (TSN), can integrate redundancy features. By solving the problem of seamless failover at the Ethernet link layer, HSR and PRP have established a benchmark for communication resilience in critical infrastructure, demonstrating that with appropriate hardware integration, the theoretical goal of zero recovery time is practically achievable.

Significance

The development of seamless redundancy protocols, particularly High-availability Seamless Redundancy (HSR) and Parallel Redundancy Protocol (PRP), represents a critical advancement in industrial and utility network design, enabling fault-tolerant communication with zero recovery time for applications where even brief data interruptions are unacceptable [15]. Their significance stems from a hardware-centric architectural philosophy, their foundational role in modern automation standards, and the sophisticated mechanisms they employ to manage network traffic and ensure deterministic performance.

Hardware-Based Integration and Deterministic Performance

A defining characteristic of HSR and PRP is their implementation primarily in hardware, such as Field-Programmable Gate Arrays (FPGAs) or specialized firmware on processors like the PRU-ICSS (Programmable Real-Time Unit and Industrial Communication Subsystem) [7]. This approach is fundamentally significant because it decouples the stringent, deterministic timing requirements of redundancy and precise clock synchronization from the general-purpose operating system and application processor. By integrating these functions fully in hardware, the protocols achieve several key performance benefits:

  • Deterministic Latency and Zero Recovery Time: Processing redundancy logic in hardware eliminates the variable latency and potential scheduling delays associated with software-based implementations. This ensures the consistent, sub-microsecond switching and frame forwarding required to guarantee zero packet loss during a single fault event, as mandated by the standards [15][7].
  • Processor Offloading: The hardware implementation offloads the computationally intensive tasks of duplicate frame recognition, discarding, and, in HSR, cut-through switching from the main CPU [7][7]. This preserves valuable processing bandwidth for the actual industrial application, allowing complex control algorithms to run without interference from network management overhead.
  • Enhanced Data Integrity: Hardware allows for continuous Cyclic Redundancy Check (CRC) validation during frame forwarding operations (e.g., port-to-port in HSR), ensuring corrupted data is not propagated through the network [7]. The interaction between redundancy and clock synchronization is particularly noteworthy. In these hardware implementations, the redundant synchronization messages (e.g., for IEEE 1588 Precision Time Protocol) are not treated as network congestion to be discarded. Instead, both copies are utilized by the clock synchronization algorithm to improve the accuracy and robustness of the time reference, leveraging the redundant paths to filter out path-dependent asymmetry and delay variations [15].

Foundation for Critical Infrastructure Automation

The first and most prominent application of HSR is in electrical substation automation, driven by the IEC 61850 standard for power utility communications [15]. IEC 61850 references IEC 62439-3 (which defines HSR and PRP) for applications requiring high availability, such as busbar protection, transformer differential protection, and other teleprotection schemes where communication failure can lead to equipment damage or widespread power outages [7]. The protocol's ability to provide seamless, drop-out free communication fulfills a core requirement in this safety-critical field, enabling the transition from traditional, hard-wired relay logic to networked, interoperable Intelligent Electronic Devices (IEDs) without sacrificing reliability. This has facilitated more flexible, cost-effective, and intelligent grid designs.

Architectural Distinctions and Network Implications

While both HSR and PRP provide seamless redundancy through frame duplication, their differing network architectures carry significant implications for deployment and operation. As noted earlier, PRP's primary advantage is its flexibility in supporting any network topology through simple physical duplication of links and networks [15]. This structural difference leads to distinct operational characteristics:

  • HSR Ring Operation: In an HSR ring, each node acts as a bidirectional switch. Every node, including intermediate ones, receives both copies, accepts the first valid arrival, and forwards the other copy onward [7][7]. This creates a flooding behavior within the ring. A critical significance of this design is the need for a Duplicate Discard Table (DDT) on the Port-to-Port path to prevent frames from looping indefinitely [7][7]. The standard does not specify the exact algorithm but mandates it must never reject a legitimate frame, while occasionally accepting a duplicate is permissible [7].
  • Frame Identification and Duplicate Management: Handling duplicate frames is a core task. Both protocols use a frame signature—typically the source MAC address and a 16-bit sequence number—to uniquely identify frames [7]. The sequence number increments each time a source sends a frame and its duplicate, allowing receiving nodes to register signatures in a table and discard subsequent frames with matching signatures destined for the host [7][7][7]. This mechanism is crucial for offloading the host processor from handling duplicates.

Enabling Complex and Scalable Topologies

Although the basic HSR unit is a ring, its significance extends to supporting more complex and scalable network infrastructures essential for large installations like power plants or factory-wide systems. The protocol can be extended to "rings of rings" and other meshed configurations [7][7]. This is achieved by interconnecting multiple HSR rings through specialized RedBox (Redundancy Box) or QuadBox nodes, which manage the forwarding and duplicate discard logic between segments. This scalability allows network designers to create hierarchical, fault-tolerant networks that balance performance, cost, and coverage area, moving beyond the limitations of a single ring while maintaining the zero-recovery-time guarantee across the entire network. In summary, the significance of seamless redundancy protocols lies in their engineered synthesis of hardware-based deterministic performance, rigorous standardization for critical infrastructure, and sophisticated network management mechanisms. They transform standard Ethernet into a medium capable of supporting the most demanding real-time control and protection systems, forming the communication backbone for the modern, automated industrial world.

Applications and Uses

Seamless redundancy protocols, specifically High-availability Seamless Redundancy (HSR) and Parallel Redundancy Protocol (PRP), are engineered to meet the stringent reliability requirements of industrial and utility automation systems where communication interruptions are intolerable. Their primary application domain is characterized by the need for drop-out free fault-tolerance, a requirement that conventional Ethernet redundancy mechanisms like Spanning Tree Protocol cannot fulfill due to their inherent convergence delays [3]. To achieve zero recovery time upon a single network fault, these protocols are typically implemented in dedicated hardware, such as Field-Programmable Gate Arrays (FPGAs) or specialized processing units like Programmable Real-time Units (PRUs), ensuring deterministic performance independent of general-purpose processor load [3].

Integration with Precision Clock Synchronization

A critical advancement in seamless redundancy implementations is the full hardware integration of redundancy mechanisms with precise clock synchronization, particularly the Precision Time Protocol (PTP) as profiled in IEC/IEEE 61850-9-3 for power utility automation [16]. This integration is performed without processor support, creating a deterministic and jitter-free synchronization path. The interaction between redundancy and synchronization is synergistic: redundant synchronization messages are not discarded as network overhead but are actively used to improve clock accuracy. By receiving timing packets over two independent paths, the clock recovery algorithm can filter out path-dependent asymmetries and anomalies, leading to a more stable and accurate time reference. This hardware-level coupling ensures that the fault tolerance of the data path extends equally to the timing distribution, which is paramount for applications like sampled value transmission in electrical substations or coordinated motion control in industrial automation.

Substation Automation and IEC 61850

The seminal and most prominent application of HSR is within electrical substation automation systems governed by the IEC 61850 standard. IEC 61850-9-2 and related profiles define the communication requirements for protection, control, and monitoring equipment, mandating high availability and deterministic performance. Recognizing this need, the IEC 62439-3 standard, which covers industrial communication networks, incorporated HSR as Clause 5, providing a formalized protocol specification for seamless redundancy [17]. Within the substation architecture, HSR is often deployed in ring topologies connecting intelligent electronic devices (IEDs) such as protection relays, bay controllers, and merging units. The protocol's zero-recovery-time characteristic ensures that critical Generic Object Oriented Substation Event (GOOSE) messages and Sampled Value (SV) streams, which are essential for differential protection and real-time control, are delivered without interruption even during a cable cut or node failure. This fulfills the requirement for applications where redundancy is mandated by the standard's reliability objectives [3].

Hardware Implementation and Performance Characteristics

The performance guarantees of HSR and PRP are intrinsically tied to their hardware-based implementation. Key protocol operations are offloaded from the host CPU to dedicated logic, enabling deterministic latency and seamless operation. For instance, frame duplication for transmission is handled in hardware. As noted in implementation documentation, frames sent by the host are duplicated and sent to both network ports nearly simultaneously. In a typical hardware approach, the host places a frame in one queue, and the PRU waits until both physical ports are available before transmitting the frame concurrently on both links, minimizing the inter-frame delay between duplicates [3]. Node table management is another central element implemented in hardware. Although the standard declares node tables optional, they are crucial for efficient operation. The firmware implements these tables within the PRU, handling the registration of incoming supervision and non-supervision frames, as well as the ageing and deletion of stale entries [3]. This hardware-managed table allows for rapid forwarding decisions. Upon reception of a frame, the PRU updates the node table and statistical counters concurrently [3]. For HSR rings, a critical optimization is applied to supervision frames: if a received supervision frame originates from the receiver itself—indicating it has traversed the entire ring—it is discarded to prevent endless circulation. Otherwise, it is forwarded to the next node [3]. Advanced switching techniques like cut-through forwarding are also implemented at the hardware level to minimize latency. In cut-through mode, the switch firmware bypasses the normal transmit queues, copying data directly from the receive FIFO to the transmit FIFO. Theoretical analysis for HSR suggests the minimum number of bytes required to make a forwarding decision is 22 bytes, comprising the 12-byte MAC addresses, a 4-byte VLAN tag, and the 6-byte HSR tag [3]. This enables the switch to begin forwarding a frame before it is fully received, drastically reducing transit delay through intermediate nodes.

Protocol-Specific Mechanisms and Network Optimization

HSR and PRP employ distinct packet formats to facilitate duplicate detection. PRP appends a Redundancy Control Trailer (RCT) to standard Ethernet frames, while HSR inserts an HSR header into the frame. Both contain a sequence number that is the primary datum used by the receiving node to identify and discard duplicate frames [11]. Building on the concept discussed above, the receiving node is responsible for this duplicate elimination to ensure uninterrupted data flow. Network optimization to prevent broadcast storms and conserve bandwidth is also handled intelligently. In HSR rings, a device known as a RedBox (Redundancy Box) is often used to interconnect the HSR ring with a conventional Ethernet network. To avoid loops and use bandwidth effectively, the RedBox does not transmit frames that are already propagating in the same direction on the ring [11]. Loop avoidance for internally originated traffic is managed by specific rules:

  • A unicast packet with a destination inside the ring is consumed by the destination node and is not forwarded further. - A unicast packet with a destination outside the ring (e.g., beyond the RedBox) is forwarded until it exits the ring via the RedBox [11].

Broader Industrial Applications

While substation automation remains the flagship application, the principles of HSR and PRP have found use in other industrial sectors requiring ultra-high availability. These include:

  • Factory automation, particularly in safety-critical systems and highly synchronized motion control networks. - Transportation systems, such as railway signaling and onboard train control networks. - Oil and gas pipeline control and safety instrumented systems. - Any process automation environment where communication loss could lead to hazardous situations or significant economic loss. The choice between HSR and PRP often depends on network architecture and migration strategies. This makes PRP suitable for retrofitting redundancy into existing dual-backbone designs. HSR, with its ring topology, is often favored in new, compact deployments like within a single substation bay or a machine cell, where its integrated switching capability reduces the need for external network switches. Both protocols, through their hardware-centric design and rigorous standardization, provide the deterministic, zero-recovery-time redundancy required by the most demanding industrial and utility applications.

References

  1. [1]PRP and HSR version 1 (IEC 62439-3 Ed.2), improvements and a prototype implementationhttps://ieeexplore.ieee.org/document/6699845/
  2. [2]A brief introduction to High Availability Seamless Redundancy (HSR) and some of its drawbacks : An insight into the functioning of HSR Protocolhttps://ieeexplore.ieee.org/document/8724055/
  3. [3]4.2. HSR_PRP — Processor SDK Linux Documentationhttps://software-dl.ti.com/processor-sdk-linux/esd/AM65X/07_00_01_06/exports/docs/linux/Industrial_Protocols_HSR_PRP.html
  4. [4]4.2. HSR_PRP — Processor SDK Linux Documentationhttps://software-dl.ti.com/processor-sdk-linux/esd/docs/05_03_00_07/linux/Industrial_Protocols_HSR_PRP.html
  5. [5]Performance of a full-hardware PTP implementation for an IEC 62439-3 redundant IEC 61850 substation automation networkhttps://ieeexplore.ieee.org/document/6336631
  6. [6]The High-Availability Seamless redundancy protocol (HSR): Robust fault-tolerant networking and loop prevention through duplicate discardhttps://ieeexplore.ieee.org/document/6242569/
  7. [7]Seamless and low-cost redundancy for substation automation systems (high availability seamless redundancy, HSR)https://ieeexplore.ieee.org/document/6038906
  8. [8]IEC 62439-3:2016 ED3https://online.standard.no/en/iec-62439-3-2016-ed3
  9. [9]IEC 62439-3:2021/COR1:2023https://webstore.iec.ch/en/publication/76473
  10. [10]IEC 62439-3:2021https://webstore.iec.ch/en/publication/64423
  11. [11]High-Availability Seamless Redundancy (HSR) for IE 4000, IE 4010, and IE 5000https://www.cisco.com/c/en/us/td/docs/switches/connectedgrid/cg-switch-sw-master/software/configuration/guide/hsr/b_hsr_ie4k.html
  12. [12]Linux Plumbers Conference 2024https://lpc.events/event/18/contributions/1969/
  13. [13]HSR: Zero recovery time and low-cost redundancy for Industrial Ethernet (High availability seamless redundancy, IEC 62439-3)https://ieeexplore.ieee.org/document/5347037
  14. [14]Resilience technologies in Ethernethttps://www.academia.edu/8047095/Resilience_technologies_in_Ethernet
  15. [15]High-availability Seamless Redundancyhttps://grokipedia.com/page/High-availability_Seamless_Redundancy
  16. [16]IEC/IEEE 61850-9-3:2016https://webstore.iec.ch/en/publication/24998
  17. [17]IEC 62439-3:2010https://webstore.iec.ch/en/publication/20490