In 2012, Elpida, a leading global DRAM manufacturer, officially declared bankruptcy. As a former benchmark company in Japan's semiconductor industry, Elpida integrated the core technologies of three giants—NEC, Hitachi, and Mitsubishi—but even with government intervention, it couldn't reverse its decline. Burdened with a massive debt of 430 billion yen, the company filed for bankruptcy protection and was eventually acquired by Micron Technology for 200 billion yen. After integration and assimilation, it completely exited the industry. Looking back at the industry's development, Intel, Texas Instruments, and Motorola all entered the DRAM market, only to subsequently withdraw. The entire Japanese semiconductor memory industry went from its peak to its collapse in less than twenty years. Following this, South Korean companies rose to prominence. Samsung and SK Hynix, relying on government subsidies and aggressive price wars, swept the global market, squeezing out the survival space of all competitors. Micron Technology emerged as the ultimate survivor and remains the only company in the United States with the capability for large-scale production of advanced memory chips. This company, headquartered in Boise, Idaho, had long remained in the shadow of Nvidia and TSMC, neither involved in GPU design nor logic chip manufacturing. However, with the explosive growth in AI computing power demand, a decades-old physical bottleneck became increasingly apparent: the time spent by computing units waiting for data transmission now exceeded the time spent on computation itself. This industry pain point cannot be solved through software optimization and can only be addressed through hardware technology breakthroughs—precisely the core area that Micron has cultivated for forty years. AI Computing Core Constraints: The Memory Wall Becomes a Common Industry Problem Under the von Neumann architecture, GPUs, TPUs, and main memory are physically independent. The computing units have built-in small-capacity SRAM as on-chip cache, while large model weights and input data are mainly stored in off-chip DRAM. Data needs to be transferred across regions via an intermediary layer in the form of electrical signals. Taking a large language model with 70 billion parameters as an example, at FP16 precision, the model weights alone require approximately 140GB of physical memory. Currently, mainstream high-end AI computing cards have memory capacities ranging from 80GB to 192GB, meaning large models can only be split across multiple cards for collaborative operation. Over the past decade, chip computing power has achieved an exponential leap, but memory bandwidth, constrained by the number of physical pins, signal frequency, and heat dissipation limits, has lagged far behind the iteration of computing power. When computing performance exceeds memory supply capacity, computing units become idle and wait, resulting in a significant drop in hardware utilization. AI is divided into two core scenarios: training and inference, with significantly different underlying logic. The training phase focuses on large-scale parallel processing, with the same data repeatedly accessed in the computing core cache, resulting in high arithmetic intensity. The core bottleneck is concentrated on computing speed rather than memory, making it a typical computationally intensive scenario where NVIDIA's computing power advantage is fully utilized. The inference phase has entirely different logic. Large language models rely on autoregressive mechanisms to generate text token by token. To avoid repeatedly calculating historical attention scores, the system needs to build a KV cache mechanism in the video memory. Taking a context length of 4096 as an example, a single user request occupies approximately 1.34GB of video memory; after deducting the space occupied by model weights, only 20GB remains for the KV cache after deducting the space occupied by two A100 chips, which can only support a maximum of 14 concurrent requests. The arithmetic intensity of the inference phase is extremely low, and the performance is entirely limited by memory bandwidth, making it a memory-intensive task. The physical transfer rate of HBM directly determines the upper limit of business throughput. From an energy consumption perspective, reading each bit of data from off-chip HBM consumes approximately 10-20 pJ/bit, while a single FP16 floating-point operation consumes only 0.1 pJ. The energy consumption of data transfer is 100 to 200 times that of computation. In large-scale inference scenarios, if the memory access mode cannot be optimized, a large amount of power in the data center will be consumed in bus data transmission rather than actual logical operations. This has become the core driving force for Micron's continued deep cultivation of HBM technology. Micron Technology Fundamentals and AI Supply Chain Positioning Micron is a typical IDM (Integrated Device Manufacturer) that achieves full-chain self-development and self-production of chip design, wafer manufacturing, and packaging and testing. Its wafer fabs focus on the memory chip sector, not CPUs or GPUs, concentrating on the R&D and production of memory and flash memory products. In terms of revenue structure, DRAM contributes over 70% of revenue, NAND flash memory accounts for 20-30%, and NOR flash memory is relatively small. DRAM is the core carrier of general-purpose memory modules, NAND is a key medium for solid-state drives, and NOR flash memory is mostly used in automotive electronics and industrial equipment, undertaking the function of fast execution of boot code. This niche market possesses irreplaceable value. In terms of business layout, Micron divides its business into four major segments: computing networking for data centers and servers, mobile terminal business for smartphones, solid-state drive business for enterprise-level storage, and embedded business for the automotive and industrial sectors. In the AI supply chain, NVIDIA is responsible for GPU design, TSMC handles wafer foundry, and while Micron does not participate in these two stages, it is an indispensable core component supplier for AI accelerator cards. GPU logic chips alone cannot support the operation of large models; the performance bottleneck in inference scenarios is locked in memory bandwidth. Therefore, NVIDIA GPUs need to be tightly integrated with HBM high-bandwidth memory. Micron, along with SK Hynix and Samsung, are core HBM suppliers. Their products are integrated with GPUs into complete AI computing modules using TSMC's CoWoS advanced packaging. The GPU is equivalent to the brain of AI computing power, while HBM is the high-speed data transmission channel; both are indispensable. In terms of competitive logic, Nvidia builds its moat on architecture and ecosystem, while Micron continuously iterates its process technology and stacked packaging technology to establish industry barriers. Each generation of bandwidth upgrades by HBM relies on more precise TSV (Through Silicon Via) technology and higher stacking layers, resulting in extremely high technical entry barriers. DRAM: The Underlying Infrastructure Behind AI Computing Power In traditional computer architectures, DRAM, as main memory, perfectly addresses the speed difference between large-capacity, low-speed hard drives and high-speed CPUs with small caches. During program execution, the system loads data from the hard drive into DRAM, and the CPU completes data read and write operations with nanosecond-level latency and ultra-high bandwidth. The system kernel and background processes reside in DRAM in real time. DRAM is characterized by data loss when power is off, and its internal capacitors have natural leakage current, requiring continuous refresh to maintain data storage. A basic unit consists of one transistor paired with one capacitor. Entering the AI era, the application form and demand logic of DRAM have been completely restructured. The computing core has shifted from CPU to GPU, and DRAM is no longer limited to the form of DDR memory modules on motherboards. Instead, it takes the form of HBM high-bandwidth memory, vertically stacked using TSV (Through Silicon Vias) technology, and co-encapsulated with the GPU in a silicon interposer. Currently, the core value of DRAM focuses on two dimensions: first, the loading of large model weights; a 70 billion parameter model in FP16 format requires 140GB of storage space, and all of it must be loaded into HBM before inference; second, dynamic KV cache usage; large models generating text require caching historical context, and the larger the context length, the higher the memory usage, limiting the concurrent handling capacity of a single high-end server. Training scenarios consume significantly more GPU memory. Besides model parameters, multiple layers of intermediate computation results need to be stored, and the optimizer adds extra data usage, resulting in memory requirements three to four times that of inference scenarios. Due to memory limitations, GPU computing power growth far exceeds memory bandwidth iteration speed. During inference, GPUs frequently idle. Upgrading HBM bandwidth directly determines the upper limit of AI inference server throughput, which is the underlying logic behind Micron's increased investment in HBM R&D. The global DRAM market is dominated by three oligopolies: Samsung, SK Hynix, and Micron, collectively holding approximately 95% of the market share. Each company has its own core strengths. In terms of process technology iteration, Micron leads the industry, consistently achieving mass production of next-generation high-density DRAM from 1-alpha, 1-beta to 1-gamma nodes. This results in higher single-wafer chip output, lower per-bit manufacturing costs, and a significant gross margin advantage. Samsung's processes below 14nm have encountered yield bottlenecks, slowing its iteration pace. SK Hynix's process technology progress is in the same tier as Micron's. The HBM (Hybrid Machine Model) market presents a stark contrast. SK Hynix firmly holds the industry leadership, commanding over 50% market share. It is the exclusive supplier for NVIDIA's high-end GPUs, leveraging its MR-MUF packaging technology to achieve absolute advantages in multi-layer stacking heat dissipation and yield control. Micron, as a latecomer, skipped HBM3 and went straight to HBM3E, leveraging its energy efficiency advantage to enter NVIDIA's supply chain. It uses the TC-NCF packaging process, which is more difficult to manufacture, resulting in lower production capacity and market share compared to SK Hynix. Samsung failed NVIDIA's tests in the HBM3 and HBM3E stages due to heat and power consumption issues, missing the window of opportunity for AI memory growth. Currently, it is betting on HBM4 to try and overtake its competitors. Energy efficiency has become Micron's differentiating factor. At the same bandwidth, Micron's HBM power consumption is 20% to 30% lower than its competitors. While the difference per card may seem small, large-scale deployment in data centers with tens of thousands of cards can significantly reduce electricity and cooling costs. Meanwhile, its 1-gamma process LPDDR5X achieves speeds of 9.6Gbps with a 30% reduction in power consumption, perfectly meeting the battery life requirements of mobile local AI models. In terms of production capacity, Samsung maintains its leading position due to its absolute size, allowing it to control the market through price wars. Micron, with the lowest production capacity, avoids homogeneous price competition, focusing on a technology premium strategy and securing its market position through leading process technology and energy efficiency. Besides DRAM and HBM, NAND and NOR flash memory constitute Micron's second growth curve. In the NAND flash memory market, Micron maintains its fourth to fifth position globally, with a market share of 10% to 15%. In the NOR flash memory sector, Micron has abandoned the low-end consumer market, focusing on automotive-grade and industrial-grade high-end applications. It leads the Octal xSPI high-speed interface standard, and its products have passed the highest ASIL-D safety certification. Relying on its own wafer fabs, it provides long-term supply for more than ten years, securing core automotive and industrial customers, avoiding the red ocean of price wars, and earning industry premiums through reliability and performance. Micron's Valuation Logic and Peer Comparisons Micron's current stock price is around $600, with a P/E ratio of 21.44 and a market capitalization of approximately $650 billion. Wall Street investment banks' 12-month target price range is $400 to $675, with an average close to $500, indicating an overall undervaluation. Historically, memory chips have been a highly cyclical industry. Industry booms drive capacity expansion, leading to overcapacity and price crashes, with the market generally assigning only a P/E ratio of 8 to 10. Micron's valuation has surged recently, primarily due to HBM's restructuring of its revenue structure: traditional DDR memory is highly susceptible to market supply and demand fluctuations, while HBM employs a contract-based production model, having signed long-term, irrevocable supply agreements with leading clients like Nvidia before production even began. By 2026, HBM's capacity was fully sold out, shifting revenue from cyclical fluctuations to stable, contractual income. The market has redefined Micron as an AI infrastructure supplier, leading to a corresponding increase in its valuation multiple. Furthermore, coupled with policy and financial support, as the only advanced memory manufacturer in the US, Micron benefits from the Chip Act and the trend of supply chain localization, attracting continuous institutional investment and resulting in a liquidity premium. Compared to its peers, SK Hynix has a P/E ratio of only 12.17. Although it holds over half of the HBM market share and is tied to Nvidia's high-end supply chain, its shareholder dividends and share buyback ratios are low due to the governance structure of South Korean conglomerates. Furthermore, with nearly 40% of its conventional DRAM production capacity located in its Wuxi, China production line, it is constrained by overseas equipment export bans, preventing the line from iterating on advanced processes and facing potential risks of capacity relocation and asset devaluation, thus its valuation is continuously suppressed. Samsung Electronics' P/E ratio of 34.18 is not a valuation premium, but rather a result of a collapse in its net profit denominator. Samsung's business covers diverse sectors including memory, wafer foundry, smartphones, and display panels. Its foundry business has invested heavily in pursuing advanced processes, but yield rates remain low, resulting in continuous losses that drag down the group's net profit. Its stock price remains stable due to support from domestic funds, thus pushing up its P/E ratio. The core logic behind institutional optimism towards Micron is clear: increased HBM revenue share drives up gross margin; long-term supply agreements secure revenue certainty; capacity shift towards HBM compresses conventional DRAM supply, supporting price increases across the entire product line; and the 1-gamma process enters its capital expenditure return period after mass production, leading to continued improvement in free cash flow. It's important to note that the memory industry cycle hasn't completely disappeared; it's only been smoothed out by long-term HBM contracts. If AI infrastructure investment slows down and Samsung's HBM4 achieves technological overtaking, the industry's supply and demand landscape may be reshaped.
HBM Core Evaluation Criteria and Next-Generation Interconnect Technology CXL
Each manufacturer in the industry emphasizes the advantages of its own HBM products. The core of evaluating HBM quality lies in three key parameters:
First, pin rate, which determines the data transmission bandwidth. HBM relies on thousands of micro-bumps to interconnect with the GPU. Pin rate represents the data transmission rate per second per channel. The industry standard bus width is fixed at 1024 pins, and the total bandwidth follows a fixed conversion formula. Micron's HBM3E is rated at 9.2Gbps, with a single-stack bandwidth of approximately 1.2TB/s, superior to the mainstream level of 8.0 to 8.5Gbps for competing products.
However, increased speeds come with the risk of higher power consumption and signal distortion. Frequent voltage flips generate heat, and excessively high speeds can lead to signal corruption, affecting data transmission stability. Secondly, there's the energy efficiency indicator, measured in pJ/bit; the lower the value, the better the energy control. HBM and GPUs are co-packaged, and excessive power consumption will exacerbate heat dissipation pressure, forcing the GPU to reduce its clock speed and computing power. Micron, relying on its 1-beta process and low-voltage design, achieves approximately 30% higher energy efficiency than competitors, significantly reducing data center electricity and cooling costs. Thirdly, thermal resistance and packaging technology are also core competitive advantages for SK Hynix. Temperature rise is determined by both power consumption and thermal resistance. The multi-layered stacked structure of HBM makes heat conduction difficult, and the interlayer filler material directly affects the thermal resistance. The industry mainstream is divided into two main processes: TC-NCF and MR-MUF. Micron and Samsung use the TC-NCF process, which is prone to residual air bubbles and has relatively high thermal resistance during lamination. SK Hynix's MR-MUF process uses liquid filler to fill the gaps, eliminating air bubble residue and resulting in lower thermal resistance. High thermal resistance will trigger a chain reaction: increased ambient temperature will accelerate DRAM capacitor leakage, forcing the memory controller to frequently refresh data and squeezing out effective bandwidth. At the same time, the packaging process determines the upper limit of the number of stacked layers; the more layers, the greater the difficulty in adapting mechanical stress and thermal expansion, and the more exponentially the pressure on yield control. Studying the manufacturer's HBM technical data only requires focusing on three points: the test voltage for the nominal speed, the number of stacked layers and the capacity per chip, and the end-user core supplier. Customer testing and acceptance are the ultimate verification of technical strength. CXL: The Next Battleground for AI Cluster Memory Pooling While HBM addresses the bandwidth bottleneck within a single GPU, as AI clusters expand to thousands of GPUs, inefficient memory resource allocation and inconsistent cross-device caching become new pain points. CXL technology emerged to address this. Traditional data center memory is physically bound to a single server, making cross-device sharing impossible. This often results in memory stalls where some nodes have full KV caches and others have idle memory, with an industry stall rate as high as 20% to 30%, causing significant capital waste. Simultaneously, CPU and GPU cache data are out of sync. Traditional software synchronization methods suffer from high latency, significant performance loss, require manual code adaptation, and have low fault tolerance. The root cause of the aforementioned pain points lies in the limitations of the PCIe protocol, which is only suitable for large-block data transfer and lacks a cache consistency mechanism. The CXL protocol, based on a reconstructed PCIe physical layer logic, specifically optimizes memory semantics and cache consistency. It relies on hardware to automatically maintain cache state markers, completing data synchronization in nanoseconds without system or code intervention. It adopts a fixed FLIT transmission format, simplifying the data parsing process and significantly reducing remote memory access latency to 170 to 250 nanoseconds. Furthermore, CXL can build a shared memory pool through a switch, breaking free from the physical binding of a single server and dynamically allocating idle memory resources in microseconds, completely solving the memory stall problem. Micron has launched the CXL Type 3 memory expansion module, built on its proprietary DDR5 process, forming a high-low pairing with HBM: HBM focuses on ultra-high bandwidth and low latency scenarios for single cards; CXL is designed for large-capacity memory expansion across nodes, supporting TB-level resource pooling. In industry deployment, hot data is retained on local HBM, while long-context cold data is offloaded to the CXL memory pool. A prefetching mechanism masks transmission latency, facilitating the deployment of ultra-long context models with millions of tokens. In terms of market structure, the HBM market is becoming increasingly competitive, while CXL memory expansion is still in its early stages, with the industry landscape undefined. Micron, as a pure storage vendor, has no historical baggage, and the CXL module uses standard DDR5 technology, eliminating the need for complex stacking and packaging. Yield and production capacity pressures are controllable, giving it a potential first-mover advantage in the market. The underlying economics and technological bottlenecks of the industry: Advanced DRAM wafer fabs cost between $15 billion and $20 billion, with a single EUV lithography machine costing over $200 million. Adding to this the investment in supporting power supply and cooling systems, and considering the 5-year depreciation cycle of the equipment, the daily amortization costs are enormous. Equipment utilization needs to be maintained above 95% to reduce manufacturing costs. When demand declines, manufacturers find it difficult to reduce production and can only endure the pressure and launch price wars. This is the underlying cause of the strong cyclicality of the memory industry. The high cost of HBM also stems from physical constraints. Multiple layers of DRAM dies are vertically stacked; a defect in any layer renders the entire module unusable, and yield declines exponentially with the number of stacked layers. Even with a single-wafer yield of 95% and an interlayer bonding yield of 99%, the overall yield of 8-layer HBM3E is only about 61%, and the yield of 12-layer HBM4 is less than 50%. SK Hynix's liquid packaging and Micron's process yield ramp-up are essentially aimed at improving overall yield and reducing unit cost. However, yield ramp-up and capacity expansion cannot be accelerated, which determines that HBM prices are unlikely to fall significantly in the short term. In-memory computing (PIM) technology, proposed twenty years ago, has still not achieved large-scale commercialization, primarily due to the contradiction in physical processes. DRAM transistors require low leakage current and high threshold voltage to ensure charge storage, resulting in slower switching speeds; CPU and GPU logic chips pursue low threshold voltage and high switching frequencies, leading to higher leakage currents. These two process requirements are inherently conflicting. Forcibly embedding computing units into DRAM would result in significantly lower computing power compared to GPUs, and the heat generated during computation would accelerate capacitor leakage, affecting data reliability. The current compromise in the industry is to integrate lightweight AI computing power into the base die at the bottom of HBM (Hardware-Built-In-Memory), using TSMC's advanced logic process to avoid DRAM process constraints. However, this is still far from true in-memory computing. In the long run, Micron's core competitive logic is clear: leveraging the 1-gamma process to reduce per-bit cost, securing pricing power through the high profits of HBM, and smoothing industry cycles with long-term supply contracts. However, the industry still faces structural bottlenecks. DRAM planar miniaturization is approaching physical limits, 3D stacking yield losses increase with the number of layers, and there is no short-term commercial breakthrough path for in-memory computing. Future industry competition will no longer rely on a single technology node advantage, but rather on a comprehensive competition of yield engineering, packaging processes, and system integration capabilities. This is the deep moat built by storage giants through decades of technological accumulation. Industry analysis reveals a recurring cycle in the chip industry: insufficient computing power leads to increased chip size, which in turn restricts yield; shifting to interconnect architectures results in data transfer delays; chip stacking solves interconnect challenges, but creates heat dissipation issues, which further drag down yield. Ultimately, the chip industry's final competition will return to materials science, and photonic interconnects, two-dimensional semiconductor materials, and disruptive computing architectures may become the core directions for breaking through existing physical constraints.