Chiplet
A chiplet is a discrete, modular integrated circuit (IC) that, when combined with other chiplets in a single package, forms a larger, more complex system [2]. This design paradigm, a form of heterogeneous integration, represents a fundamental shift in semiconductor manufacturing by partitioning a traditional monolithic system-on-a-chip (SoC) into smaller functional blocks [8]. These individual chiplets are then interconnected using advanced packaging technologies to create a complete processing unit. The approach is central to modern high-performance computing as it overcomes key physical and economic limitations of scaling single, large dies, enabling continued performance gains beyond the constraints of Moore's Law [4]. The core principle of chiplet-based design is to disaggregate system functions into optimized silicon pieces. This allows each chiplet to be fabricated using the semiconductor process node most suitable for its specific task, whether for high-performance logic, dense memory, or analog I/O, improving yield and cost-effectiveness [3]. These chiplets are integrated into a multi-chip module (MCM), an electronic assembly that combines multiple ICs or semiconductor dies on a common substrate to function as a single, larger integrated circuit [6]. Key characteristics include the use of high-density, high-bandwidth interconnects between chiplets, such as silicon interposers or embedded bridges, to facilitate rapid communication that approaches on-die speeds. The methodology directly addresses the reticle limit—the maximum size of a pattern that can be projected onto a silicon wafer by lithography equipment, which is approximately 800 mm²—allowing for the creation of systems that would be impossible to manufacture as a single die [4]. This is exemplified by devices like NVIDIA's Blackwell-architecture GPUs, which integrate 208 billion transistors through advanced packaging of multiple silicon dies [1]. Chiplet technology has become critically important across multiple computing domains, including high-performance computing (HPC), data centers, artificial intelligence accelerators, and advanced client processors. Its significance lies in enabling heterogeneous integration, where different technologies like logic, memory (including stacked designs like Samsung's 24-layer V-NAND), and specialized accelerators can be combined in novel ways [5][8]. This architectural shift supports the development of highly specialized and scalable systems while managing manufacturing complexity and cost. The historical evolution of modular system packaging, seen in earlier mainframe and MCM designs, finds its advanced, miniaturized continuation in the chiplet model [7]. By allowing for design reuse, faster time-to-market for new product variants, and performance scaling beyond monolithic die limits, chiplet-based design is a cornerstone of contemporary and future semiconductor innovation.
Overview
A chiplet is a discrete, functional block of integrated circuitry designed to be combined with other chiplets within a single package to form a complete system-on-chip (SoC) or processor [14]. This design paradigm, known as chiplet-based architecture or heterogeneous integration, represents a fundamental shift from the traditional monolithic semiconductor manufacturing model. In monolithic design, all components of a processor—such as CPU cores, memory controllers, and I/O interfaces—are fabricated as a single, large piece of silicon on one die. Chiplet architectures, by contrast, partition these functions into smaller, specialized dies that are manufactured independently and later integrated using advanced packaging technologies [14]. This approach allows designers to mix and match semiconductor process nodes, materials, and intellectual property (IP) blocks to optimize performance, power efficiency, yield, and cost for each subsystem.
Definition and Core Concept
At its most fundamental level, a chiplet is a physical piece of silicon that contains a specific, well-defined functional unit [14]. It is a reusable, modular component that adheres to standardized physical and electrical interfaces, enabling interoperability within a multi-die system. The chiplet methodology treats these silicon blocks as "Lego-like" building blocks that can be assembled in various configurations to create different end products [14]. This modularity stands in contrast to the application-specific integrated circuit (ASIC) model, where a new, fully custom monolithic die must be designed and fabricated for each major product iteration or market segment. The chiplet model decouples the design and manufacturing of functional blocks, allowing for:
- Process Optimization: Different chiplets can be fabricated on the semiconductor process technology best suited for their function. For example, high-performance compute chiplets can be built on the latest, most dense transistor nodes (e.g., 3nm or 2nm), while analog/RF or I/O chiplets, which benefit less from extreme scaling, can be built on older, more mature, and less expensive nodes (e.g., 16nm or 28nm) [14].
- Improved Yield and Cost: Semiconductor manufacturing yield—the percentage of functional dies on a wafer—decreases significantly as die size increases due to the higher probability of a random defect occurring on the larger area. By partitioning a large monolithic design into several smaller chiplets, each individual die has a much higher yield, reducing overall cost per functional unit [14].
- Design Reuse and Accelerated Time-to-Market: Once a chiplet IP block (e.g., a memory controller, a high-speed SerDes transceiver, or a neural processing unit) is designed, validated, and manufactured, it can be reused across multiple product generations and families. This reuse amortizes non-recurring engineering (NRE) costs and drastically reduces the design cycle for new systems [14].
Historical Context and Evolution
The conceptual foundations for modular, multi-chip systems predate the modern term "chiplet." Early mainframe computers from companies like IBM in the 1960s and 1970s often comprised multiple printed circuit boards populated with discrete semiconductor packages, each performing a specific function [13]. This was a form of system-level modularity, albeit at a much larger physical scale and lower level of integration. The evolution towards greater integration led to the monolithic microprocessor, which consolidated these functions onto a single die. However, as the limits of Moore's Law and Dennard scaling became apparent in the 2010s, the industry began to re-explore modular integration to overcome the rising costs and technical challenges of monolithic scaling. The development of advanced packaging techniques, such as 2.5D and 3D integration with silicon interposers and through-silicon vias (TSVs), provided the physical enablers to connect multiple dies at densities and bandwidths approaching on-die interconnect levels, making the chiplet model commercially viable [14].
Enabling Technologies: Packaging and Interconnects
The viability of chiplet-based systems is wholly dependent on advanced packaging and high-density, low-latency die-to-die (D2D) interconnects. These technologies allow chiplets to communicate with each other as if they were different regions on a single monolithic die. Key packaging approaches include:
- 2.5D Integration: Chiplets are placed side-by-side on a passive silicon interposer—a thin slice of silicon containing a dense network of wiring layers. The chiplets connect to the interposer using micro-bumps, and the interposer routes signals between them. This provides very high interconnect density and short wiring paths, enabling high-bandwidth communication [14].
- 3D Integration: Chiplets are stacked vertically and bonded together using techniques like hybrid bonding, where copper pads on the top of one die are directly fused to pads on the bottom of another. This provides the shortest possible interconnect paths, minimal latency, and the highest bandwidth density, ideal for connecting memory stacks to processors or for partitioning logic functions across layers [14].
- Advanced Fan-Out Wafer-Level Packaging (FOWLP): Chiplets are embedded in a molding compound, and a redistribution layer (RDL) is built on top to fan out connections. This can be a cost-effective method for integrating a smaller number of chiplets without a silicon interposer. The performance of these systems hinges on standardized D2D interconnect protocols. Industry consortia, such as the Universal Chiplet Interconnect Express (UCIe), have emerged to define open standards for the physical layer, protocol stack, and software model, ensuring interoperability between chiplets from different suppliers [14].
Architectural Implications and System Composition
A chiplet-based processor is an exercise in heterogeneous system architecture. A typical high-performance computing package might integrate several distinct types of chiplets:
- Compute Chiplets: Contain high-performance CPU or GPU cores, often built on the leading-edge process node for maximum transistor density and speed.
- I/O and SerDes Chiplets: Handle external communications (e.g., PCI Express, Ethernet, USB) and are frequently built on a specialized node optimized for analog/mixed-signal performance.
- Memory Chiplets/Base Dies: In 3D-stacked memory like High Bandwidth Memory (HBM), the logic die that manages memory operations and interfaces with the processor can be considered a chiplet. It is stacked beneath the DRAM dies.
- Specialized Accelerator Chiplets: Dedicated blocks for functions like AI inference, cryptography, video encoding, or networking, which can be integrated as needed for target markets. The system's overall functionality is defined by the selection, number, and arrangement of these chiplets within the package. This disaggregation allows for product segmentation and customization without the need for full monolithic redesigns. Building on the concept mentioned previously, this architectural approach enables the integration of an unprecedented number of transistors by combining multiple optimized silicon dies, rather than attempting to fabricate them all on one impossibly large and low-yielding piece of silicon [14].
Historical Development
Early Concepts and Predecessors (1960s–1990s)
The conceptual foundations for chiplets emerged from the broader field of multi-chip modules (MCMs), which date back to the 1960s. These early systems-in-package (SiP) approaches involved placing multiple discrete semiconductor dies, often from different process technologies, onto a common substrate within a single package to create a functional electronic system [14]. While MCMs provided a form of heterogeneous integration, they were typically limited to a small number of relatively large dies and faced significant challenges in interconnect density, power efficiency, and thermal management. The dominant paradigm throughout the late 20th century remained the monolithic system-on-chip (SoC), driven by the relentless progress of Moore's Law, which promised ever-smaller transistors and higher levels of integration on a single piece of silicon. However, as design and manufacturing complexities for monolithic chips escalated, the industry began to explore more modular approaches. Key figures in packaging research, such as R. R. Tummala at Georgia Tech and teams at IBM's packaging research divisions, laid essential groundwork by advancing substrate technologies and high-density interconnect methods that would later become critical for chiplet ecosystems [14].
The Rise of Economic and Technical Drivers (Early 2000s–2010s)
The modern chiplet movement gained substantive momentum in the early 21st century, propelled by converging economic and technical pressures. The skyrocketing cost of designing and fabricating leading-edge monolithic SoCs at successive process nodes (e.g., 28nm, 16/14nm) created a significant barrier for many semiconductor companies [14]. Simultaneously, the physical limitations of semiconductor scaling, often described as the slowdown of Moore's Law, became more pronounced. Performance and power efficiency gains from transistor shrinkage diminished while process variability and defect rates on large dies increased yield challenges. In this environment, a modular design philosophy began to appear more attractive. Pioneering work by researchers and engineers, including contributions from Ramune Nagisetty and teams at Intel, AMD, and DARPA, focused on defining the chiplet as a discrete, reusable IP block hardened into a physical piece of silicon [14]. This concept promised to decouple the design of functional blocks from the manufacturing process node, allowing designers to mix and match chiplets fabricated on the optimal—and often different—technology for each function (e.g., analog, memory, high-speed I/O, compute cores). The economic model shifted from yield on a single, large die to the known-good-die (KGD) yield of smaller, more manufacturable chiplets [14].
Commercialization and Ecosystem Formation (Mid-2010s–2020)
The transition from research concept to commercial reality was led by several key product introductions and industry alliances. A pivotal moment arrived in 2017 with AMD's launch of its first-generation EPYC server processors based on the "Zen" architecture. These processors employed a multi-chip design where several smaller CPU core dies, fabricated on a leading-edge process, were interconnected on a larger I/O die using an older, more cost-effective process via a proprietary high-density interconnect (Infinity Fabric) [14]. This approach allowed AMD to compete effectively in the high-performance server market against monolithic competitors, demonstrating the performance and economic viability of disaggregated designs. This success catalyzed broader industry action. In 2018, a consortium of key industry players, including AMD, Arm, Intel, TSMC, and Samsung, formed the Universal Chiplet Interconnect Express (UCIe) consortium. The goal of UCIe was to establish a standardized, open die-to-die interconnect protocol and physical layer specification, which is crucial for creating a vibrant multi-vendor chiplet ecosystem and overcoming proprietary interconnect barriers [14]. Parallel advancements in advanced packaging technologies, such as TSMC's CoWoS (Chip-on-Wafer-on-Substrate) and Intel's Foveros, provided the physical "plumbing" necessary for high-bandwidth, low-latency, and power-efficient communication between chiplets within a single package [14].
Heterogeneous Integration and the AI Acceleration Era (2020–Present)
The historical development of chiplets has culminated in the current era, defined by sophisticated heterogeneous integration for specialized computing, particularly artificial intelligence. The architectural paradigm has evolved from simply disaggregating a CPU to co-packaging diverse silicon elements—processors, accelerators, memory, and I/O—into a unified system. As noted earlier, this is exemplified by devices like NVIDIA's Blackwell-architecture GPUs. The demand for such systems has been explosively driven by AI, creating specific bottlenecks. For instance, the huge demand for computing for AI applications has made High Bandwidth Memory (HBM) a critical and scarce resource, selling for approximately six times the price of conventional DDR5 memory, with production sold out through 2025 in a fast-growing market [15]. This economic and supply-chain pressure further incentivizes chiplet architectures, as they allow for the flexible integration of expensive, specialized components like HBM stacks alongside compute dies. The vision for the semiconductor industry now involves targeted end-products that may become small, specialized chiplets meant to be combined in the same package with a general-purpose processor and many other specialty chiplets [14]. This represents the full realization of the chiplet concept: a Lego-like approach to system design where optimal process technology, functional specialization, and yield economics converge within a single package, moving beyond the limitations of monolithic scaling.
Principles of Operation
The operational principles of chiplets are founded on the architectural disaggregation of a traditional monolithic system-on-chip (SoC) into discrete functional blocks, each fabricated on an optimized semiconductor process and subsequently reintegrated into a single package using advanced packaging technologies. This paradigm shift enables heterogeneous integration, where disparate process nodes, materials, and intellectual property (IP) blocks can be combined to achieve performance, power efficiency, and cost characteristics unattainable with a single-die design [2][4].
Heterogeneous Integration and Modularity
At its core, the chiplet methodology is an application of modular design principles to semiconductor engineering. Instead of integrating all functions—such as central processing cores, memory controllers, input/output (I/O) interfaces, and specialized accelerators—onto one large die, these functions are partitioned into smaller, standalone "chiplets." Each chiplet can be designed, verified, and manufactured independently, often on a process technology best suited to its function. For example, high-performance CPU cores may be fabricated on the latest, most dense FinFET or Gate-All-Around (GAA) node (e.g., 3 nm or 2 nm), while analog I/O or power management chiplets can be produced on a more mature, cost-effective node (e.g., 28 nm or 40 nm) [2][3]. This targeted optimization avoids the prohibitive cost of fabricating an entire, massive die on the leading-edge node, a challenge noted in earlier sections of this article. The modularity extends to the system architecture, where a targeted end-product might become a small, specialized chiplet meant to be combined in the same package with both a general-purpose processor and many other specialty chiplets [2]. This creates a composable system-in-package (SiP) where functionality can be scaled or customized by varying the number and type of chiplets assembled, a concept moving beyond the physical limitations of a single photolithographic reticle [4].
Advanced Packaging and Interconnect Fabric
The physical and electrical integration of chiplets is achieved through advanced packaging techniques that provide high-density, low-latency, and high-bandwidth interconnects between dies. These interconnects form the critical "fabric" that binds the chiplets into a cohesive system, and their performance parameters are fundamental to the overall system operation. Key packaging approaches include:
- 2.5D Integration: Chiplets are placed side-by-side on a silicon interposer—a passive silicon substrate containing a dense network of metallic interconnects (typically copper lines and through-silicon vias, or TSVs). The interposer provides short, high-speed electrical pathways between chiplets. Interconnect pitch on advanced interposers can be less than 2 µm, enabling thousands of connections per square millimeter. Signal propagation delay (tpd) between chiplets on an interposer can be approximated by tpd = l√(LC), where l is the interconnect length, and L and C are the per-unit-length inductance and capacitance, respectively. This delay is typically an order of magnitude lower than for signals traveling off-package.
- 3D Integration: Chiplets are stacked vertically and bonded using techniques like hybrid bonding (direct copper-to-copper bonding) or microbump connections. This provides the shortest possible interconnect lengths, measured in micrometers or tens of micrometers, resulting in ultra-high bandwidth and minimal latency. Power density becomes a critical design constraint in 3D stacks, with heat flux often exceeding 100 W/cm², necessitating sophisticated thermal management solutions like integrated microfluidic channels or thermally conductive through-silicon vias (TTSVs).
- Fan-Out Wafer-Level Packaging (FOWLP): Chiplets are embedded in a mold compound, and a redistribution layer (RDL) is built on top to fan out the connections to a standard ball-grid-array (BGA) pitch. This offers a cost-effective integration platform with moderate interconnect density. The bandwidth (B) of an inter-chiplet link is governed by B = N × f × b, where N is the number of parallel lanes, f is the signaling frequency, and b is the bits per clock cycle (e.g., 1 for NRZ, 2 for PAM-4). State-of-the-art die-to-die interfaces, such as AMD's Infinity Fabric, Intel's Advanced Interface Bus (AIB), and the open-standard Universal Chiplet Interconnect Express (UCIe), operate at data rates exceeding 16 Gbps per lane, with aggregate bandwidths between chiplets reaching into the terabit-per-second range [3].
Electrical and Physical Design Considerations
The operation of a chiplet-based system introduces unique electrical challenges compared to a monolithic die. Signal integrity, power integrity, and thermal management must be co-designed across the package and die boundaries.
- Power Delivery Network (PDN): Delivering stable, low-voltage power (typically 0.6V to 1.2V for core logic) to multiple chiplets with high, transient current demands (often 100-300 A per high-performance compute chiplet) requires a meticulously designed package-level PDN. The target impedance (Ztarget) of the PDN must be maintained across a broad frequency spectrum (from DC to ~1 GHz) to prevent voltage droop (ΔV), calculated as ΔV = Itransient × ZPDN. This is achieved through a hierarchy of on-die decoupling capacitance (typically 100s of nF/mm²), embedded package capacitors, and discrete capacitors on the substrate.
- Thermal Management: The power dissipation of the integrated system, which can range from 50W for client devices to over 1000W for datacenter accelerators, must be effectively removed. The thermal resistance network from the junction to the ambient (θJA) includes contributions from the die itself, the die attach material, the package substrate, and the heat sink. In 2.5D and 3D configurations, thermal coupling between stacked chiplets is significant, requiring careful floorplanning to avoid placing high-power-density blocks directly atop each other. Thermal interface materials (TIMs) with conductivities ranging from 3 to 80 W/m·K are critical for efficient heat transfer.
- Clock Distribution and Synchronization: Maintaining precise clock synchronization across multiple chiplets is non-trivial. Systems often employ a source-synchronous clocking scheme or a mesochronous architecture, where a common reference clock is distributed, and local phase-locked loops (PLLs) in each chiplet align their internal clocks. Clock skew between chiplets must be managed to within a few picoseconds to ensure reliable operation of high-speed interconnects.
System-Level Architecture and Ecosystem
Building on the modular concept, the chiplet model enables novel system architectures. A processor can be constructed from multiple identical compute chiplets connected via a high-bandwidth fabric, allowing core count and cache size to scale linearly with the number of chiplets [3]. This approach, pioneered commercially in the datacenter, allows for the creation of optimized platforms supported by a broad ecosystem of server OEMs and partners [17]. Furthermore, the paradigm enables "disaggregated" or "chiplet-based" SoCs, where chiplets from different vendors, designed to a common interconnect standard, can be mixed and matched. This fosters a specialized chiplet supply chain and requires standardized protocols for physical layer, die-to-die adapter, and software enumeration to ensure interoperability [16]. The operational success of the entire system therefore depends not only on the electrical and physical integration but also on the adherence to these layered interface standards, which manage coherency, memory semantics, and error handling across the chiplet boundary.
Types and Classification
Chiplets can be systematically classified along several dimensions, including their primary function, the nature of their integration, the interconnect standard used, and their role within a broader system architecture. This multi-faceted taxonomy reflects the diverse applications and design philosophies enabled by heterogeneous integration.
By Primary Function and Design Role
A fundamental classification distinguishes chiplets based on their computational or functional purpose within a multi-die system. This approach moves beyond the traditional monolithic system-on-chip (SoC) model by disaggregating functions into specialized silicon dies.
- Compute Chiplets: These are processing units responsible for executing computational workloads. They can be further subdivided:
- General-Purpose Compute Chiplets: Typically central processing unit (CPU) cores or clusters. As noted earlier, AMD's EPYC server processors pioneered this model, employing multiple identical CPU chiplets [17]. The industry trend is moving towards combining these with specialized accelerators [7].
- Specialized Accelerator Chiplets: Dedicated processors for specific tasks like artificial intelligence (AI) inference and training, graphics rendering, or cryptographic operations. These are designed for maximum performance-per-watt on targeted algorithms. For instance, modern AI datacenter platforms integrate dedicated AI accelerator chiplets alongside CPU chiplets for control and data management [7].
- I/O and Interface Chiplets: These dies manage communication between the chiplet package and the external system. Functions include high-speed serial transceivers (e.g., for PCIe, Ethernet), memory controllers (for DDR, HBM), and other physical layer interfaces. A key advantage is that these functions, which often do not benefit from the latest transistor shrinks, can be fabricated on older, more cost-effective process nodes, while compute chiplets use leading-edge nodes.
- Memory Chiplets: While high-bandwidth memory (HBM) stacks are integrated in a 2.5D fashion, the concept extends to cache chiplets or buffers. Emerging architectures explore using chiplets based on different memory technologies—such as mainstream volatile memory and emerging non-volatile memory—for optimized data storage and processing within the package [20].
- Analog/Mixed-Signal Chiplets: These contain sensitive analog circuitry (e.g., power management, clock generation, analog-to-digital converters) that is difficult and expensive to scale with digital logic. Disaggregating analog functions into their own chiplet allows them to be built on optimized analog process nodes, improving performance and yield. The acquisition of analog design IP firms by chiplet-focused companies underscores the strategic importance of this category [19].
By Integration Methodology and Topology
The physical and logical arrangement of chiplets within a package defines another critical classification axis, directly impacting bandwidth, latency, and power efficiency.
- 2D Multi-Chip Modules (MCMs): This is a foundational approach where multiple dies are placed side-by-side on a common substrate (e.g., organic laminate or silicon interposer) and connected through wire bonds or flip-chip bumps. In a classical MCM, the components are typically full functional chips that could be sold separately in packaged form [21]. Modern chiplet-based MCMs, such as those used in many CPU designs, employ highly optimized, short inter-die interconnects on the package substrate.
- 2.5D Integration with Interposers: This advanced form of MCM employs a silicon interposer—a passive silicon layer with dense wiring—between the chiplets and the package substrate. The interposer provides extremely high-density interconnects, enabling thousands of connections between adjacent chiplets with very short electrical paths. This is the technology enabling high-bandwidth communication between logic dies and HBM memory stacks, as seen in advanced GPUs and AI accelerators.
- 3D Stacking: This involves vertically stacking chiplets and connecting them with through-silicon vias (TSVs), enabling the highest possible interconnect density and shortest vertical paths. Stacks can be homogeneous (e.g., memory-on-memory) or heterogeneous (e.g., logic-on-memory, logic-on-logic). 3D stacking is a key technique for continuing performance scaling, as it allows for novel architectures that overcome data movement bottlenecks. It has also been identified as a strategic method for achieving high performance using mature process nodes when access to leading-edge fabrication is constrained [18].
- Fan-Out Wafer-Level Packaging (FOWLP): In this approach, chiplets are embedded in a molding compound, and a redistribution layer (RDL) is built on top to route connections between them. This allows for a very compact package footprint and is often used for mobile and consumer applications.
By Interconnect Standard and Protocol
The communication interface between chiplets is a defining characteristic, with industry standards emerging to ensure interoperability and ecosystem growth.
- Proprietary Interconnects: Early chiplet implementations often used proprietary, vendor-specific physical layers and protocols optimized for a particular product family. These offer high performance but limit the ability to mix and match chiplets from different designers.
- Open Standard Interconnects: The development of open standards is critical for creating a vibrant chiplet ecosystem. The dominant emerging standard is the Universal Chiplet Interconnect Express™ (UCIe™). UCIe is an open industry standard that establishes a ubiquitous interconnect at the package level, covering the die-to-die I/O physical layer, die-to-die protocols, and software stack [8]. It leverages and extends the well-established PCI Express® (PCIe®) and Compute Express Link™ (CXL™) protocols, enabling compatibility with existing software and infrastructure. The adoption of UCIe by major foundries, IP providers, and chip designers is a primary driver for the future of heterogeneous integration, facilitating the "mix-and-match" vision of chiplets from different vendors [7][19].
By System Architecture and Design Philosophy
Finally, chiplets can be classified by their architectural role in system design, which influences their size, complexity, and reusability.
- Monolithic Disaggregation: This is the decomposition of what would have been a single, large monolithic die into several smaller, functionally partitioned chiplets integrated in one package. The primary goals are to improve yield and enable the use of different process technologies for different functions. Building on the cost challenges discussed previously, this approach directly addresses the economic barriers of monolithic scaling.
- Modular Building Blocks: In this model, chiplets are designed from the outset as reusable, standardized components—often called "IP blocks in silicon form." These chiplets, such as standardized compute subsystems, memory interfaces, or accelerator blocks, can be assembled in various configurations to create different end products. This philosophy is exemplified by initiatives like Arm Total Design, which aims to accelerate AI silicon development through standards-based compute subsystems [7]. The goal is to dramatically reduce design time and cost for complex systems.
- Chiplet-Based Platforms: This classification refers to a pre-defined set of chiplet types (e.g., a specific CPU core complex, a set of I/O dies, an AI accelerator block) that form a compatible family. Designers can then create differentiated products by selecting and integrating different combinations and numbers of these platform chiplets. This provides a balance between standardization and customization. The classification of chiplets is thus not monolithic but a matrix of functional, physical, interface, and architectural characteristics. This flexibility is the core strength of the chiplet paradigm, enabling tailored solutions for applications ranging from mobile devices to exascale datacenter accelerators, where performance, power, and cost must be optimized simultaneously [1][17][20].
Key Characteristics
The defining characteristics of chiplet-based architectures stem from their fundamental departure from monolithic system-on-chip (SoC) design. These characteristics enable the performance, efficiency, and economic benefits discussed in earlier sections, while introducing unique technical considerations centered on interconnect technology, heterogeneous integration, and standardized interfaces.
Standardized Die-to-Die Interconnects
A core characteristic of modern chiplet systems is the reliance on standardized, high-bandwidth, low-latency die-to-die (D2D) interconnects. These interfaces are the foundational technology that enables disparate chiplets to function as a cohesive system. The physical layer (PHY) of these interconnects is a critical engineering challenge, requiring expertise in high-speed analog and mixed-signal design to manage signal integrity, power efficiency, and timing across separate silicon dies [19]. Industry consortia have emerged to develop open standards for these interfaces, with the Universal Chiplet Interconnect Express (UCIe) establishing a specification that defines the protocol stack, physical layer, and compliance tests to ensure interoperability between chiplets from different designers and manufacturers [14]. Some definitions within the industry posit that only silicon dies employing such a standardized interface truly qualify as chiplets, distinguishing them from proprietary multi-die solutions [21]. These interconnects are engineered for extreme bandwidth density. For example, advanced implementations utilizing parallel, single-ended signaling schemes—sometimes colloquially referred to as "bunch of wires" (BoW) PHYs—can achieve data rates exceeding several gigabits per second per pin, with aggregate bandwidths scaling into the terabit-per-second range for wide interfaces [19]. The performance of the overall multi-die system is directly contingent on the bandwidth and latency of these links, as they must facilitate cache-coherent memory traffic and high-throughput data movement between computational, memory, and I/O chiplets with minimal overhead [10].
Heterogeneous Process Node Integration
Building on the economic rationale noted earlier, a key technical characteristic is the ability to integrate silicon dies fabricated on different semiconductor process nodes within a single package. This allows each functional block to be manufactured on the optimal technology node for its purpose, a strategy proposed to meet divergent requirements for computing power, energy efficiency, and cost simultaneously [20]. For instance, high-performance CPU cores may be fabricated on the latest 3nm or 2nm FinFET process for maximum speed and transistor density, while analog I/O interfaces, power management circuits, or certain memory controllers might be better suited to older, more cost-effective nodes like 16nm or 22nm, where analog design characteristics are more mature and predictable [20][10]. This targeted optimization stands in contrast to the monolithic approach, where the entire die is constrained to a single process node, often forcing compromises for analog or high-voltage components.
Advanced Packaging and Vertical Integration
Chiplet architectures are intrinsically linked to innovations in advanced packaging technologies, which provide the physical and electrical infrastructure for multi-die systems. Two primary packaging paradigms enable chiplet integration:
- 2.5D Packaging: In this approach, chiplets are placed side-by-side on a silicon interposer—a passive silicon substrate that contains a dense network of interconnecting wires. The interposer provides extremely fine-pitch routing between chiplets, far exceeding the density possible on a traditional organic package substrate. Through-silicon vias (TSVs) are metallic vertical conduits that pass through the silicon interposer to connect the top-side wiring to the package's ball grid array (BGA), completing the connection to the printed circuit board [9]. This structure provides the required connectivity for high-bandwidth communication between adjacent chiplets [9].
- 3D Packaging: This approach stacks chiplets vertically using direct, ultra-high-density vertical interconnects, such as micro-bumps or hybrid bonding. A prominent example is Intel's Foveros technology, which enables logic-on-logic stacking. In such a 3D configuration, a base chiplet fabricated on a mature node acts as a passive interposer or active bridge, while performance chiplets are stacked on top, connected by thousands of vertical interconnects per square millimeter [23]. This dramatically reduces interconnect length and power consumption for communication between stacked dies, enabling new architectures where, for example, a compute chiplet is stacked directly on top of a high-bandwidth memory (HBM) chiplet.
Emergence of Optical I/O Chiplets
To address the growing bottleneck of electrical I/O power and reach at extreme data rates, a developing characteristic of leading-edge systems is the integration of optical I/O chiplets. As data transfer rates between systems—and eventually between chiplets within a system—approach and exceed terabits per second, the power required for electrical serialization/deserialization (SerDes) becomes prohibitive. Optical I/O chiplets, such as those based on the TeraPHY™ technology, integrate silicon photonics directly into a chiplet form factor [22]. These chiplets convert electrical signals from compute dies into optical signals for transmission over fiber optics, offering orders-of-magnitude higher bandwidth density and lower power per bit over longer distances compared to electrical interfaces [22]. This characteristic is particularly critical for scaling up AI cluster performance and profitability, where inter-node communication can dominate power budgets and limit scalability [22].
Modular and Reusable Design Paradigm
A fundamental shift enabled by chiplet characteristics is the move toward a modular design philosophy. Chiplets are designed to be reusable, IP-like blocks that can be combined in various configurations to create different end products [10]. A single, validated compute chiplet, for instance, could be integrated into products ranging from a client laptop processor to a multi-socket datacenter CPU by combining it with different I/O, memory, or accelerator chiplets on appropriate package substrates. This reusability amortizes the high non-recurring engineering (NRE) cost of designing a complex die over multiple products and generations, accelerating design cycles. This modularity is a key enabler for custom silicon solutions, allowing companies to assemble optimized systems from best-in-class components sourced from multiple semiconductor vendors, as highlighted by collaborations within ecosystems like Arm Total Design [Source: Collaboration with Samsung Foundry, ADTechnology and Rebellions].
System-Level Performance and Power Management
While offering significant advantages, chiplet systems introduce system-level design complexities. Performance is no longer determined by a single clock domain or uniform process characteristics; instead, it depends on the coordinated operation of multiple independent dies with potentially different voltage and frequency operating points. Sophisticated system-level power management and fabric controllers are required to orchestrate workload distribution, manage thermal hotspots that may form on specific chiplets, and ensure coherent data flow across the D2D links [10]. The aggregate power dissipation of the integrated system, which can be substantial, necessitates advanced thermal management solutions that consider the non-uniform power map across the package, building on the thermal challenges noted previously.
Applications
Chiplet architectures have transitioned from a conceptual alternative to a foundational design paradigm, enabling performance scaling and functional specialization across diverse computing domains where traditional monolithic system-on-chip (SoC) approaches face significant barriers [15]. This shift is driven by the convergence of several critical factors: the physical and performance limits of traditional interconnect technologies as data rates climb [12], the divergent innovation paths for digital logic and analog I/O circuitry [24], and the need for a cohesive process to align various semiconductor technologies [25]. The applications of chiplets are characterized by their ability to disaggregate system functions into optimized silicon dies, which are then integrated using advanced packaging to form a cohesive, high-performance system.
Enabling Next-Generation High-Performance Computing and Artificial Intelligence
The relentless demands of high-performance computing (HPC) and artificial intelligence (AI) workloads for greater computational density and energy efficiency are primary drivers for chiplet adoption. As process nodes have shrunk below 7nm, physical limitations and exponentially rising manufacturing costs have necessitated alternative approaches to continue performance scaling [26]. Chiplets address this by allowing critical compute units, such as CPU cores or AI accelerator tiles, to be fabricated on the most advanced process nodes—like the 2nm node incorporating Gate-All-Around (GAA) transistor structures optimized for HPC and AI—while other system components reside on more cost-effective or functionally superior older nodes [24]. This targeted optimization, building on the cost challenges noted earlier, is essential for managing the complexity and expense of designing massive, monolithic dies. The architecture facilitates the construction of processors with extreme core counts, as exemplified by server CPUs integrating numerous core-die chiplets to deliver high thread counts per socket [27]. For AI accelerators, chiplets enable the integration of vast numbers of specialized tensor cores and high-bandwidth memory interfaces, creating systems capable of processing trillion-parameter models, though effective thermal management of these high-power systems remains a critical engineering challenge.
Overcoming Interconnect and I/O Performance Bottlenecks
A significant application domain for chiplets lies in mitigating interconnect bottlenecks, particularly for memory and high-speed I/O. Traditional bump-based interconnect technologies are reaching their physical limits as data rates for interfaces like High Bandwidth Memory (HBM) and peripheral component interconnect express (PCIe) continue to climb [12]. Chiplet design allows the physical layer (PHY) circuitry for these high-speed interfaces to be partitioned onto separate dies. These I/O chiplets can be fabricated using process technologies specifically optimized for analog/mixed-signal performance, which often differ from the pure digital CMOS processes best suited for compute logic [24]. This separation enables independent optimization and innovation for each functional block. Furthermore, standardized die-to-die interconnect protocols, such as Universal Chiplet Interconnect Express (UCIe), are being developed to ensure interoperability between chiplets from different vendors. These specifications include detailed architectural attributes to define system setups and registers for use in test plans and compliance testing, which is crucial for establishing a robust ecosystem of compatible components [15][14].
Fostering Heterogeneous Integration and Supply Chain Innovation
Beyond performance, chiplet architectures enable a new model of heterogeneous integration and supply chain collaboration. They allow system designers to mix and match best-in-class components—such as compute dies from one foundry, memory stacks from another, and photonic I/O engines from a third—into a single package [25]. This modularity reduces time-to-market and design risk by allowing proven intellectual property (IP) blocks to be reused as hardened chiplets across multiple product generations. The industry challenge is to align the various packaging, testing, and interoperability standards into a cohesive process that drives innovation forward across the entire supply chain, from EDA tool vendors to OSATs (Outsourced Semiconductor Assembly and Test providers) [25]. This collaborative model is essential for realizing the full potential of chiplets, as it moves beyond single-company vertical integration to a more flexible, horizontal ecosystem.
Specific Implementation Domains
The practical applications of chiplets manifest across several key market segments:
- Data Center Processors: Modern server CPUs extensively utilize chiplet designs to combine high core-count compute dies with centralized I/O dies. This approach, building on the architectural shift discussed previously, allows for scalable core configurations and the integration of specialized accelerators for cryptography, compression, or AI inference within the same package [27].
- Advanced GPUs and AI Accelerators: The largest graphics processing units and dedicated AI training chips employ chiplet architectures to surpass the reticle limit of photolithography steppers. By connecting multiple identical or complementary compute dies via high-bandwidth interconnects like NVLink, these systems achieve aggregate transistor counts in the hundreds of billions, necessary for cutting-edge AI model training and scientific simulation.
- Network and Edge Devices: In networking switches, routers, and edge AI appliances, chiplets enable the integration of programmable packet processors, traffic managers, and switch fabric interfaces on different silicon optimized for their respective tasks. This allows for greater feature flexibility and power efficiency compared to a monolithic approach.
- Automotive and Aerospace: For applications requiring high reliability or radiation tolerance, chiplets permit the use of legacy, proven process nodes for critical safety functions while still incorporating advanced computing capabilities from newer nodes, all within a single system-in-package. The evolution of chiplet applications continues to be shaped by ongoing developments in interconnect density, packaging technology (such as 3D integration), and ecosystem standardization. As the industry addresses challenges related to testing, security, and thermal design power (TDP) management for these complex systems, the scope of chiplet-based designs is expected to expand further into mainstream consumer electronics and other cost-sensitive applications [15][14].
Design Considerations
The architectural shift from monolithic system-on-chips (SoCs) to multi-die chiplet systems introduces a distinct set of engineering challenges that must be addressed to realize their performance, yield, and cost benefits. These considerations span the physical implementation of high-density interconnects, the management of power delivery and heat dissipation, the establishment of standardized communication protocols, and the complexities of system-level testing and validation.
Interconnect Technology and Signal Integrity
The performance of a chiplet-based system is fundamentally constrained by the bandwidth, latency, and energy efficiency of the connections between dies. As data rates climb, traditional bump technologies—long relied upon as the primary interconnect method for flip-chip packages—are reaching their physical and performance limits [3]. For advanced applications, the industry is transitioning to finer-pitch, higher-density solutions. Two primary approaches dominate:
- Direct Bond Interconnect (DBI): This technique uses hybrid bonding to create direct copper-to-copper connections between dies at a sub-10-micron pitch, enabling massive interconnect densities exceeding 10,000 connections per square millimeter. This facilitates high-bandwidth, energy-efficient communication with latencies approaching those of on-die wiring [1].
- Redistribution Layer (RDL) Fan-Out: In this packaging-led approach, chiplets are embedded in a mold compound, and a high-density RDL is fabricated on top to route signals between them. This allows for interconnect pitches in the 2-micron range and is particularly suited for integrating heterogeneous dies, such as logic with high-bandwidth memory (HBM) [2]. Signal integrity becomes paramount at these densities and speeds. Engineers must model and mitigate crosstalk, insertion loss, and impedance discontinuities across the entire channel, from the transmitter on one die to the receiver on another. This often requires advanced equalization techniques like decision feedback equalization (DFE) and feed-forward equalization (FFE) operating at data rates of 32 Gbps per lane and beyond [3].
Power Delivery Network (PDN) and Thermal Co-Design
Delivering stable, high-current power to multiple high-performance chiplets within a single package is a significant challenge. The power delivery network must manage:
- Current Demand: Modern compute chiplets can have transient current demands exceeding 500 amps, requiring an extremely low-impedance path from the package substrate to the silicon [1].
- Voltage Droop: Simultaneous switching activity across multiple dies can cause localized voltage droop (ΔV), potentially leading to timing violations or functional failure. This necessitates a dense network of on-package decoupling capacitors and careful co-design of the chiplet's internal PDN with the package-level power planes [2].
- Thermal Dissipation: As noted earlier, total system power can exceed 1000W. This heat must be conducted away from multiple hot spots (often the compute chiplets) through the package lid and into a heatsink. Thermal interface materials (TIMs) with high conductivity (>5 W/m·K) are critical, and the package substrate must be designed to minimize thermal resistance between dies and the cooling solution. Computational fluid dynamics (CFD) simulations are essential to model airflow and prevent thermal throttling [1]. The interdependency of power and thermal behavior requires co-design. A chiplet's placement affects both the electrical length of power delivery paths and the thermal coupling between dies, where a hot logic chiplet can elevate the temperature of an adjacent memory die, degrading its performance and reliability.
Standardized Die-to-Die Interfaces
For chiplets from different vendors to interoperate, standardized physical and protocol-layer interfaces are required. An open standard facilitates a multi-vendor ecosystem, whereas proprietary interfaces can offer optimized performance for a specific vendor's product stack. The Universal Chiplet Interconnect Express (UCIe) consortium has established one such open standard, defining a complete stack from the physical layer (PHY) to the protocol layer. UCIe 1.0 specified a bump pitch of 25-55 microns for standard packages and 9-25 microns for advanced packages, with raw bit rates up to 32 GT/s per lane using NRZ modulation [1]. However, achieving robust interoperability under UCIe 1.0 presented challenges related to testing methodologies, compatibility of forward error correction (FEC) schemes, and managing skew across links [1]. The subsequent UCIe 2.0 specification, released in 2023, addressed these gaps and pushed performance further. It introduced an "Advanced" package profile supporting bump pitches down to 9 microns and data rates up to 63 GT/s per lane using PAM4 modulation, effectively doubling the bandwidth density. It also enhanced reliability features with more robust FEC and improved testing and compliance guidelines to ensure interoperability [1]. Despite these advances, integrating and validating a multi-vendor chiplet system using UCIe remains complex, involving challenges in link training, protocol negotiation, and system-level bring-up [1].
System Architecture and Partitioning
Deciding how to partition a system into chiplets is a high-level design choice with cascading implications. Key factors include:
- Functional Cohesion: Grouping tightly coupled functions (e.g., a CPU core complex with its cache hierarchy) on the same die minimizes latency and power consumption of cross-chiplet communication.
- Process Node Suitability: Functions like analog/RF, high-voltage I/O, or dense memory do not always benefit from the latest CMOS logic node and can be fabricated on older, more cost-effective nodes, as noted in earlier strategic discussions.
- Reticle Limit: Partitioning is necessary to build systems whose total silicon area exceeds the maximum reticle size of approximately 850 mm² for deep ultraviolet (DUV) lithography scanners.
- Yield and Cost: The yield of a smaller die is exponentially higher than that of a large monolithic die. The system cost becomes a function of the individual chiplet yields and the assembly yield, which must be modeled carefully. The system architecture must also define the network-on-package (NoP) that connects the chiplets. This can range from a simple point-to-point mesh to a more complex packet-switched fabric, requiring trade-offs between latency, bandwidth, and architectural complexity.
Test, Validation, and Reliability
Testing a chiplet-based system is inherently more complex than testing a monolithic die. The strategy typically involves multiple stages:
- Known Good Die (KGD): Each chiplet must be tested at wafer-level and post-singulation to a very high fault coverage before assembly, as repairing or replacing an individual die after packaging is difficult or impossible.
- Post-Assembly Structural Test: After packaging, tests validate the integrity of the inter-chiplet interconnects for defects like opens or shorts.
- System-Level Functional Test: The fully assembled system must be tested for correct interoperability and performance at speed. This requires access to test modes and built-in self-test (BIST) engines designed into each chiplet. Reliability considerations are also magnified. The coefficient of thermal expansion (CTE) mismatch between different materials (silicon, organic substrate, mold compound) induces mechanical stress during power cycling, which can fatigue fine-pitch interconnects. System-level reliability metrics, such as mean time between failures (MTBF), must account for the failure rates of all constituent chiplets and their interconnections.