CXLTM: Getting Ready for Take-Off - SEO-Optimized Title for CXL Forum Document

CXL™: Getting
Ready for Take-Off
Full-Day Forum at Flash Memory Summit
Hosted by The CXL Consortium and MemVerge
Slides and video now available at
https://memverge.com/cxl-forum/

Morning Agenda
2
Start End Name Title Organization
8:35 8:50 Siamak Tavallaei
President, CXL Consortium, Chief System
Architect, Google Infrastructure
8:50 9:10 Willie Nelson Technology Enabling Architect
9:10 9:30 Steve Glaser Principal Engineer, PCI-SIG Board Member
9:30 9:50 Shalesh Thusoo VP, CXL Product Development
9:50 10:10 Jonathan Prout Sr. Manager, Memory Product Planning
10:10 10:30 Uksong Kang Vice President, DRAM Product Planning
10:30 10:50 Ryan Baxter Sr. Director of Marketing
Session SPOS-101-1 on the FMS program

Afternoon Agenda
3
3:25 3:45 Arvind Jagannath Cloud Platform Product Management
3:45 4:05 Mahesh Wagh Senior Fellow
4:05 4:25 Charles Fan CEO & Co-founder
4:25 4:45 Manoj Wadekar
SW-Defined Memory Workstream Lead,
OCP, Storage Architect, Meta
4:45 5:10
Siamak Tavallaei
Panel Moderator
5:10 5:35
Chris Mellor
Panel Moderator
Editor

Update from the CXL Consortium
4
Siamak Tavallaei
CXL President
Chief Systems Architect at
Google Systems Infrastructure

5 | ©2022 Flash Memory Summit. All Rights Reserved.
CXL™ Consortium Update
Siamak Tavallaei, CXL President

Introducing the CXL Consortium
CXL Board of
Directors
200+ MemberCompanies
IndustryOpen Standardfor
HighSpeedCommunications

CXL Specification Release Timeline
March
2019
CXL 1.0
Specificatio
n Released
September 2019
CXL Consortium
Officially
Incorporates
CXL 1.1
Specification
Released
November 2020
CXL 2.0
Specification
Released
August
2022
CXL 3.0
Specification
Released
Press Release
August 2, 2022, Flash Memory Summit
CXL Consortium releases Compute
Express Link 3.0 specification to
expand fabric capabilities and
management
Members: 130+
Members: 15+ Members: 200+

Compute Express Link ™ (CXL™) Overview

Industry Landscape
Proliferation
of Cloud
Computing
Growth of
AI &
Analytics
Cloudification
of
the Network &
Edge

Data Center: Expanding Scope of CXL
CXL 2.0 across Multiple Nodes inside a Rack/ Chassis
supporting pooling of resources
Future - CXL 3.0
Fabric growth for
disaggregation/pooling/accelerator

Growing Industry Momentum
• CXL Consortium showcased first public demonstrations of CXL
technology at SC’21
• View virtual and live demos from CXL Consortium members here:
https://www.computeexpresslink.org/videos
• Demos showcase CXL usages, including memory development, memory
expansion and memory disaggregation

Industry Focal Point
CXL is emerging as the industry focal point
for coherent IO
• CXL Consortium and OpenCAPI sign letter of
intent to transfer OpenCAPI specification and
assets to the CXL Consortium
• In February 2022, CXL Consortium and Gen-
Z Consortium signed agreement to transfer
Gen-Z specification and assets to CXL
Consortium
CXL Consortium and OpenCAPI Consortium
Sign Letter of Intent to Transfer OpenCAPI
Assets to CXL

Unveiling the CXL 3.0 specification
Press Release
CXL Consortium releases Compute
Express Link 3.0 specification to
expand fabric capabilities and
management

Industry trends
• Use cases driving need for higher
bandwidth include: high performance
accelerators, system memory, SmartNIC
and leading edge networking
• CPU efficiency is declining due to
reduced memory capacity and bandwidth
per core
• Efficient peer-to-peer resource sharing
across multiple domains
• Memory bottlenecks due to CPU pin and
thermal constraints
CXL 3.0 introduces
• Fabric capabilities
• Multi-headed and fabric attached devices
• Enhance fabric management
• Composable disaggregated infrastructure
• Improved capability for better scalability
and resource utilization
• Enhanced memory pooling
• Multi-level switching
• New enhanced coherency capabilities
• Improved software capabilities
• Double the bandwidth
• Zero added latency over CXL 2.0
• Full backward compatibility with CXL 2.0,
CXL 1.1, and CXL 1.0
CXL 3.0 Specification

CXL 3.0 Specification Feature Summary

CXL 3.0: Expanding CXL Use Cases
• Enabling new usage models
• Memory sharing between hosts and peer devices
• Support for multi-headed devices
• Expanded support for Type-1 and Type-2 devices
• GFAM provides expansion capabilities for current and future memory
Download the CXL 3.0 specification on www.ComputeExpressLink.org

Call to Action
• Join the CXL Consortium, visit www.computeexpresslink.org/join
• Attend CXL Consortium presentations at the Systems Architecture
Track on Wednesday, August 3 for a deep-dive into the CXL 3.0
specification
• Engage with us on social media
@ComputeExLink www.linkedin.com/company/cxl-consortium/ CXL Consortium Channel

Thank you!

Backup

Multiple Devices of all Types per Root Port
Each host’s root port can
connect to more than one
device type
1

Fabrics Overview
CXL 3.0 enables non-tree
architectures
• Each node can be a CXL
Host, CXL device or PCIe
device
1

Switch Cascade/Fanout
Supporting vast array of switch topologies
Multiple switch
levels (aka cascade)
• Supports fanout
of all device types
1

Device to Device Comms
CXL 3.0 enables peer-to-
peer communication
(P2P) within a virtual
hierarchy of devices
• Virtual hierarchies are
associations of devices
that maintains a
coherency domain
1

Coherent Memory Sharing
Device memory can be
shared by all hosts to
increase data flow efficiency
and improve memory utilization
Host can have a coherent
copy of the shared region or
portions of shared region in
host cache
CXL 3.0 defined mechanisms
to enforce hardware cache
coherency between copies
1
2
3

Memory Pooling and Sharing
Expanded use case
showing memory sharing
and pooling
CXL Fabric Manager is
available to setup, deploy,
and modify the
environment
1
2

26
Willie Nelson
Architect
Intel
Steve Glaser
Principal Architect, PIC
SIG Board Member
NVIDIA
Shalesh Thusoo
CXL Business Unit
Marvell

CXL – Industry Enablement
Willie Nelson
Technology Enabling Architect - Intel
August 2022

Introducing the CXL Consortium
CXL Board of
Directors
200+ MemberCompanies
IndustryOpen Standardfor
HighSpeedCommunications

Growing Industry Momentum
• CXL Consortium showcased first public demonstrations of CXL
technology at SC’21
• View virtual and live demos from CXL Consortium members here:
https://www.computeexpresslink.org/videos
• Demos showcase CXL usages, including memory development, memory
expansion and memory disaggregation

Industry Focal Point
CXL is emerging as the industry focal point
for coherent IO
• CXL Consortium and OpenCAPI sign letter of
intent to transfer OpenCAPI specification and
assets to the CXL Consortium
• In February 2022, CXL Consortium and Gen-
Z Consortium signed agreement to transfer
Gen-Z specification and assets to CXL
Consortium
CXL Consortium and OpenCAPI Consortium
Sign Letter of Intent to Transfer OpenCAPI
Assets to CXL

CXL Specification Release Timeline
March
2019
CXL 1.0
Specificatio
n Released
September 2019
CXL Consortium
Officially
Incorporates
CXL 1.1
Specification
Released
November 2020
CXL 2.0
Specification
Released
August
2022
CXL 3.0
Specification
Released
Press Release
CXL Consortium releases Compute Express
Link 3.0 specification to expand fabric
capabilities and management
Members: 130+
Members: 15+ Members: 200+

New Technology Enabling – Key Contributors
Revolutionary
New
Technology
HWDevelopmentTools(Analyzers,etc.)
SuccessfulNewTechnologyEnablingRequiresALL Contributors to beViableforIndustry Adoption
HWSilicon/ControllerVendors
SiIPProviders(incl.pre-sisimulation)
HardwareProductionProductVendors
SWDevelopmentTools(testing,debug,perf.,etc.)
Device/UseCaseOSDrivers
OperatingSystemSupport
UseCaseApplications(tangiblebenefitsw/newtech)
Standards/Consortiums/etc…
IndustryAdoption

Intel CXL Memory Enablement & Validation
DDR PCIe CXL Memory
POR Platform
Configurations
• Large matrix of POR
configurations
• “Open socket” (extensive
variety of technology
and use cases
• Plans to validate specific POR configurations
of CXL memory per platform, with several
vendors and modules – not exhaustive
Engagement Model • Direct engagement and
collaboration with Tier1
suppliers
• SIG-based engagement
with PCIe IHVs
• Targeted engagement with numerous CXL
memory device & module IHVs, as well as key
customers, plus multiple Consortium based
compliance workshops and various
interactions
Validation Model • Early and exhaustive
Host-based validation
spanning electrical,
protocol, functional
• SIG-led compliance
workshops & plugfests
• Host PCIe validation focus
on PCIe channel, protocol
features/function
• Limited platform
validation with PCIe
products
• Host validation focus on CXL channel, features
& function of CXL memory as part of
platform’s memory subsystem
• CXL memory device & module IHV validation
focus on device+media channel,
function/features
• Long term plan: Consortium-led compliance
testing
Comparing CXL memory validation to DDR/PCIe efforts
Approach for CXL memory expected to evolve over generations to be PCIe-like

Industry CXL Memory HW Enabling & Validation
CPU Vendor focus:
• Work with device & module vendors to enable key features
• Provided CXL vendors an open, bridge architecture reference
document as an initial guide, covering Bridge/module
operation/features recommendations
• Device/module platform integration (focused configs)
• For initial AIC CEM Modules – focused validation of
the media interface
• Validation:
• Host-side CXL functions
• Memory features – RAS, etc.
• CXL channel
• Specificconfigs and vendors (# of ports, capacity, etc.)
• SW Enabling:
• Intel providing reference system FW/BIOS
• Part of the industry effort to develop an open-source driver
• SW guide for type 3 devices
CXL
IP
Media
IP
DRAM
DRAM
DRAM
DRAM
DRAM
DRAM
DRAM
DRAM
DRAM
DRAM
CXL Memory Bridge
(aka controller, buffer)
CXL Memory Module*
Bridge-media channel
Host CXL Channel
CXL
IP
Controller/Module Vendor focus: (bridge
or module)
• Memory media interface, channel
electricals, media training/MRC
• CXL compliance and interoperability
testing
*Standardization of CXL memory
module form factors – EDSFF E3.s &
E1.s, PCI CEM and mezzanine in process
OEM/System provider focus:
• Device/module platform integration
• Configuration testing
• In-rack level testing
• Usage models testing/debug
• System Validation:
• SW integration including system FW/BIOS,
OS, generic driver
• Generate integrator list
Config-1………….. Config-N
A Massive Coordinated Industry Effort
CPU/Host

Q & A
Willie Nelson
Technology Enabling Architect - Intel
August 2022

CXL Delivers the Right Features & Architecture
CXL
Anopen industry-supported
cache-coherent
interconnectforprocessors,
memoryexpansion and
accelerators
CoherentInterface
LeveragesPCIewith3mix-and-match
protocols
LowLatency
.Cacheand.MemorytargetedatnearCPU
cachecoherentlatency
AsymmetricComplexity
Easesburdensofcachecoherent
interfacedesigns
Challenges
Industrytrendsdriving demandforfasterdataprocessingandnext-gen
datacenterperformance
Increasingdemandforheterogeneouscomputing andserver
disaggregation
Needforincreasedmemorycapacityandbandwidth
Lackofopenindustrystandardtoaddressnext-geninterconnect
challenges
https://www.computeexpresslink.org/resource-library

Representative CXL Usages
Memory
CXL • CXL.io
• CXL.memory
PROTOCOLS
Memory
Memory
Memory
Memory
MemoryBuffer
Processor
DDR
DDR
• MemoryBW expansion
• Memorycapacity expansion
• Storage classmemory
USAGES
Accelerators with Memory
CXL • CXL.io
• CXL.cache
• CXL.memory
PROTOCOLS
• GP GPU
• Densecomputation
USAGES
HBM
Accelerator
Cache
Processor
DDR
DDR
CachingDevices /Accelerators
CXL • CXL.io
• CXL.cache
PROTOCOLS
• PGAS NIC
• NIC atomics
USAGES
Accelerator
NIC
Cache
Processor
DDR
DDR
TYPE1 TYPE2 TYPE3
HBM

Usage Local Bandwidth or Capacity Expansion Memory Pooling
Main memory expansion Two-Tier Memory
Value Prop Scale performance or enable use of higher core counts via
added bandwidth and/or capacity
Flexible memory assignment, enabling:
- Lower total memory cost
- Platform SKU reduction & OpEx efficiency
CXL Memory Attributes Bandwidth and features
similar to direct attach DDR
Lower bandwidth, higher
latency vs. direct attach DDR
Bandwidth and features similar to direct
attach DDR, latency similar to remote socket
access
Software Considerations OS version must support CXL
memory.
CXL memory visible either in
same region as direct attach
DDR5 or as a separate region
OS version must support CXL
memory.
SW-visible as Persistent next-
tier memory
OS version must support CXL memory.
Additional software layer for orchestration
of pooled memory and multi-port controller
CXL Memory Overview
Pool
CPU
Direct Attach DDR5
EDSFF E3
or E1
PCI CEM/Custom Board
Pooled
Memory
Controller

39
CXL: BEYOND JUST ANOTHER INTERCONNECT
PROTOCOL
STEVE GLASER

40
AGENDA
 Cache Coherence for Accelerators
 Expansion Memory for CPUs
 Flexible Tiered Memory Configurations
 Security

41
CPU-GPU CACHE COHERENCE
Unified programming model across CPU architectures
 CPU-GPU coherence provides programmability
benefits
 Ease of porting applications to GPU
 Rapid development for new applications
 Grace + Hopper Superchip introduces cache-
coherent programming to GPUs
 CXL enables the same programming benefits
for our GPUs in systems based on 3rd-party
CPUs
Grace
CPU
Hopper
GPU
Coherent
NVLink C2C
x86/ Arm
CPU
NVIDIA
GPU
Coherent
CXL Link

42
PROGRAMMABILITY BENEFITS
CXL CPU-GPU cache coherence reduces barrier to entry
 Without Shared Virtual Memory (SVM) + coherence, nothing
works until everything works
 Enables single allocator for all types of memory: Host, Host-
accelerator coherent, accelerator-only
 Eases porting complicated pipelines in stages
 Many SW layers exist between frameworks and drivers
 Example: start with malloc, keep using malloc until you choose
otherwise
 Vendor-provided allocators remain fully supported and functional
 Workloads are pushing scaling boundaries  fine-grained
synchronization is on the rise
 Synchronization latency matters
 Avoid setup latency, do it in-memory when possible
 Host/device synchronization in device’s memory
 Concurrent algorithms and data structures become available
 Example: full C++ atomics support across host and device
 Locks
 Any suballocation can be used for synchronization, regardless of
placement
Ap
pP
erf
Programmi
ng Effort
Star
t
v1 with
SVM +
Coherenc
e
v1 without
SVM or
Coherence

43
CXL FOR CPU MEMORY EXPANSION
 SOC DDR channel count is becoming
constrained
 CXL-enabled PCIe ports can be used for
additional memory capacity
 Flexibility in underlying media choice, trading
off capacity/latency/persistence
 DRAM
 DRAM + cache
 Storage-class memory
 DDR/SCM + NVMe
DDR
Host
SOC0
Host
SOC1
DDR
DDR
DDR
DDR
DDR
DDR
DDR
DDR
DDR
CXL
Mem
CXL
Mem
CXL
Mem
CXL
Mem

44
CXL FOR MEMORY DISAGGREGATION
 Currently, data center servers are often over-provisioned with
memory
 All Hosts must be have enough DRAM to handle the demands of worst-
case workloads
 Under less memory-intensive workloads, DRAM is unused and wasted
 DRAM is very expensive at data center scale
 Large banks of CXL memory can be distributed among several Hosts
 Memory pools may be attached to Hosts via CXL switches, or directly
attached using multi-port memory devices
 Pooling
 Each Host is allocated a portion of the disaggregated memory
 Memory pools can be reallocated as needed
 Reduces memory over-provisioning on each Host while allowing
flexibility to handle a range of workloads with differing memory
demands
 Sharing
 Address ranges which may be accessed by multiple Hosts simultaneously
 Coherence may be provided in hardware by the CXL Device or may be
software-managed
CXL Switch Fabric
Host Host Host
CXL
Memory
Pool
CXL
Memory
Pool

45
CXL FOR GPU EXPANSION MEMORY
Tackling AI with very large memory capacity demands
 Accelerator workloads with large memory footprints are currently
challenged
 Constrained by bandwidth available to Host over PCIe
 Contention with Host SW for memory bandwidth
 CXL memory expanders may be directly attached to accelerators for
private use
 Tiered memory for GPUs: HBM and CXL tradeoffs
 Bandwidth
 Capacity
 Cost
 Flexibility
GPU
Host
CXL
Memory
HBM
HBM
Coherent CPU-GPUCXL Link
Private GPU-MemoryCXL Link(s)

46
CXL FOR GPU MEMORY POOLING
Streamlined Accelerator Data Sharing
 Memory pools may provide flexibility to apportion
memory to individual GPUs as needed
 Provides solution to workloads where capacity is
important and bandwidth is secondary
 Large data sets can be stored in CXL memory and
shared as needed among accelerators, without
burdening interface to Host
GPU GPU GPU
CXLSwitch Fabric
Host Host Host
CXL
Memory
Pool
CXL
Memory
Pool
HBM
HBM
HBM
HBM
HBM
HBM

47
SHARED EXPANSION MEMORY
CPU-GPU Shared Memory Pools
 CXL enables sharing of expansion memory
between Host and GPU
 Future capabilities may allow expansion memory
to simultaneously be shared
 Among Hosts
 Between Hosts and Accelerators
 Flexibility in provisioning under varying demands
 Ease of programming model
 CXL Switch could be local physical switch or
virtual switch over other physical transport
enabling remote disaggregated memory
CXLSwitch
Host
CXL
Memory
GPU
CXLSwitch
Host
GPU
Host Host
CXL
Memory
Pool
GPU CXL
Memory
Pool
GPU CXL
Memory
Pool

48
CXL FOR CONFIDENTIAL COMPUTING
Vision for secure accelerated computing
 Confidential computing components will be
 Partitionable and assignable to Trusted
Execution Environment Virtual Machines (TVM)
 TVMs can create their own secure virtual
environments including
 Host resources
 Accelerator partition
 Shared memory partitions
 Data transfers encrypted and integrity
protected
 Components are securely authenticated
 Partitions are secure from accesses by
untrusted entities
 Other VMs/TVMs
 Firmware
 VMM
 All CXL capabilities are enabled in secure
domains
Memory Pool
Memory Pool
CXL
Confidential Compute Host
Confidential Compute GPU
TVM
GPU
Partition
TVM TVM
GPU
Partition
GPU
Partition
Memory
Partition
Memory
Partition
Memory
Partition
Memory
Partition

Transforming Cloud
Data Centers with CXL
Shalesh Thusoo
VP, CXL Product Development
July 2022

© 2022 Marvell. All rights reserved. 51
Cloud data center memory challenges
CXL is poised to address these issues
Bandwidth per core declining
Normalized
growth
rate
3.5
3
1.5
1
0.5
0
2
2.5
2012 2014 2016 2018 2020
CPU core count Memory channel BW per core
Source: Meta, OCP Summit Presentation Nov 2021
Degrades efficiency
No near-memory compute
DRAM DIMM
Limits performance
Increasing gap
Memory tied-down to xPUs
DRAM
DRAM
DPU
DRAM
DRAM
CPU
GPU
Cannot share

Cloud data center memory challenges
CXL accelerator
Bandwidth per core declining No near-memory compute
CXL is poised to address these issues
CXL expander
CXL pooling
CXL expander
CXL pooling
CXL switch

CXL
Expander
CXL
Expander
Addressing memory expansion
 Scalable
 Pluggable
 Telemetry
 Improved thermals
 Mix-and-match DRAM
 Config flexibility
DRAM DIMM
 Limited scalability
 Not serviceable
 No telemetry
DIMM challenges CXL solution
CXL expander controller
CXL expander module
Standard form factors

CXL memory expanders improve performance
Same capacity with greater bandwidth and utilization
1DPC + CXL expanders
Today: 2 DIMMs per channel (2DPC)
xPU
128GB  256GB
1DPC same bandwidth as 2DPC
xPU
128GB
Use PCI express to open bandwidth
 256GB

Sharing memory with CXL
 Pool memory across multiple xPUs
 Rescue under-utilized DRAM
 Scale memory independent of xPUs
Direct
CXL memory pool
56 Core
xPU 0 xPU N
CXL
pooling
…
xPUs
CXL pooling
 Flexible to connect resources into fabric
 Scalable, serviceable
 Enables fully composable infrastructure
56 Core
xPU 0 xPU N
CXL
switch
…
Memory expanders Memory accelerators
xPUs
…
…
CXL
Expander
CXL
Accelerator
CXL switch
CXL
pooling

Accelerating with CXL
xPU
CXL
Accelerator
CXL accelerator
Compute engines
 Coherent, efficient
 Accelerate analytics, ML, search, etc.
 Improves efficiency and TCO
CXL I/O acceleration
 DPU/NIC, SSD, …
 Accelerate protocol processing
 Composable I/O devices
xPU

Bandwidth per core declining No near-memory compute
CXL solves data center memory challenges
CXL is disrupting cloud data center architectures
More bandwidth per core
Optimize efficiency
xPU
Memory
Compute
Storage
Fully composable
Disaggregated memory
xPU
Ultimate performance
CXL
Accelerator
Near-memory computation

Comprehensive end-to-end CXL solutions
 Expanders
 Pooling
 Switch
 Accelerators
 Custom compute
 DPUs / SmartNICs
 Electro-optics
 Re-timers
 SSD controllers
Multi-billion $ opportunity
CXL opportunities
xPU 0 xPU N
…
Re-timer
DPU
CXL
Accelerator
CXL
Expander
CXL
Switch
…
↑ in the box
↓ out of box
Optics
CXL
Pooling
DPU
CXL
Expander
SSD
Cntrl

Summary
1 CXL is disrupting cloud data center architectures
2 Uniquely positioned to enable end-to-end CXL in data center
3 CXL is driving the next multi-billion-dollar opportunity
4 CXL memory pooling demo at FMS Marvell Booth #607

60
Memory pooling
demo chassis
Server
Intel Archer City
Sapphire Rapid Hosts
Intel Archer City
Sapphire Rapid Hosts
Memory Appliance
 Up to 6 Memory Devices
(3 installed)
 Up to 2 E3s Memory cards

63
Jonathan Prout
Senior Manager, Memory
Product Planning
Samsung Electronics
Uksong Kang
Vice President,
DRAM Product Planning
SK Hynix
Ryan Baxter
Sr. Director Marketing
Micron

Expanding Beyond Limits
With CXL™-based Memory
August 2nd 2022
Jonathan Prout
Memory New Business Planning Team

Industry Trends and Challenges
CXL™ (Compute Express Link) Introduction
CXL™ Memory Use Cases
Samsung’s CXL™ -based Memory Expander and SMDK (Scalable Memory
Development Kit)
Agenda

Industry Trends and Challenges
Artificial
Intelligence
Big Data
Edge
Cloud
5G
Massive demand for data-centric
technologies and applications
Memory bandwidth and density not
keeping up with increasing CPU core
count
Need a next gen interconnect for
heterogeneous computing and
server disaggregation

Industry Trends and Challenges
Normalized
growth
rate
0
0.5
1
1.5
2
2.5
3
3.5
4
2012 2013 2014 2015 2016 2017 2018 2019 2020 2021
CPU core count Memory channel BW per core
New memory
scaling solution is
needed

CXL™ Introduction
CXL is a high performance, low
latency protocol that leverages
PCIe physical layer
CXL is an open industry standard with
broad industry support
Processor
PCle Connector
PCIe
channel
PCIe Card CXL Card Type 1 Type 2 Type 3
Processor Processor Processor
CXL CXL CXL
Usages Usages Usages
Protocols Protocols
• CXL.io
• CXL.cache
• CXL.io
• CXL.cache
• CXL.memory
• CXL.io
• CXL.memory
Accelerator
Accelerator
NIC
Memory Buffer
Cache Cache
• PGAS NIC
• NIC atomics
• GP GPU
• Dense computation
• Memory BW expansion
• Memory capacity expansion
• Storage class memory
DDR
DDR
DDR
DDR
DDR
DDR
HBM
HBM
Memory
Memory
Memory
Memory
Caching Devices /
Accelerators Accelerators with Memory Memory Buffers
Protocols

CXL™ Type 3 Device
Home Agent
DDR
DDR
Host/CPU
CXL Memory
Expander
CXL.io
CXL.mem
Device
Memory
Memory
Controller
Memory
Controller
CXL is a cache coherent standard, meaning the host and the CXL device
see the same data seamlessly

CXL™ Type 3 Device - Memory Expansion
CXL enables systems to significantly scale memory capacity and bandwidth
8x 2DPC
(DIMM/channels)
Max. 8TB for 1CPU
DDRx 512GB
DDRx 512GB
DDRx 512GB
DDRx 512GB
Max. 12TB for 1CPU
DDRx 512GB
DDRx 512GB
DDRx 512GB
DDRx 512GB
Mem Ex 1TB
Mem Ex 1TB
Mem Ex 1TB
Mem Ex 1TB
8x 2DPC
(DIMM/channels)
CPU
CPU

Current Use Cases: Capacity / Bandwidth Expansion
IMDB Server
xTB
DRAM
CPU 0
xTB
DRAM
CPU 1
IMDB Server
xTB
DRAM
CPU 0
xTB
DRAM
CPU 1
IMDB Server
yTB
DRAM
CPU 0
yTB
DRAM
CPU 1
ZTB
CXL
ZTB
CXL
IMC Server
xGB
DRAM
CPU 0
xGB
DRAM
CPU 1
IMC Server
xGB
DRAM
CPU 0
xGB
DRAM
CPU 1
IMC Server
yGB
DRAM
CPU 0
yGB
DRAM
CPU 1
zGB
CXL
zGB
CXL
IMC Server
xGB
DRAM
CPU 0
xGB
DRAM
CPU 1
IMC Server
yGB
DRAM
CPU 0
yGB
DRAM
CPU 1
zGB
CXL
zGB
CXL
Capacity Expansion - TCO reduction
Bandwidth Expansion – performance improvement
zGB
CXL
zGB
CXL
zGB
CXL
zGB
CXL

CXL™ Memory Switching & Pooling
CXL supports pooling for increased system efficiency
Host
CXL Switch
CXL
Memory
Expander
CXL
Memory
Expander
DDR
DDR
Host Host Host
DDR
DDR
CXL Switch
CXL
Memory
Expander
DDR
DDR
DDR
DDR
CXL
Memory
Expander
CXL
Memory
Expander
CXL
Memory
Expander
CXL supports switching to enable memory expansion

Future Use Cases: Tiering and Pooling
IMC Server
xTB
DRAM
CPU 0
xTB
DRAM
CPU 1
IMC Server
xTB
DRAM
CPU 0
xTB
DRAM
CPU 1
IMC Server
xTB
DRAM
CPU 0
xTB
DRAM
CPU 1
IMC Server
yTB
DRAM
CPU 0
yTB
DRAM
CPU 1
IMC Server
yTB
DRAM
CPU 0
yTB
DRAM
CPU 1
IMC Server
yTB
DRAM
CPU 0
yTB
DRAM
CPU 1
zTB
CXL
zTB
CXL
MEMORY BOX
zTB
CXL
zTB
CXL
Memory Tiering* – Efficient expansion
Memory Pooling - Increased utilization
*Hot data on DRAM
Warm data on cost-optimized, CXL-attached media

Samsung CXL™ Proof of Concept
Supporting ecosystem growth with
CXL-based memory functional sample
Form Factor – EDSFF (E3.S) / AIC
Media – DDR4
Module Capacity – 128 GB
CXL Link Width – x16
Specification: CXL 2.0
Product Features
Ecosystem enablement success
Shipped 100+ samples since availability in 3Q ‘21
Successfully tested with a broad range of server, system,
and software providers across the industry

Samsung CXL™ Solution
Leading the industry toward mainstream
adoption of CXL-based memory Form Factor – EDSFF (E3.S)
Media – DDR5
Module Capacity – 512 GB
CXL Link Width – x8
Maximum CXL Bandwidth – 32GB/s
Specification – CXL 2.0
Other Features – RAS, Interleaving, Diagnostics, and more
Availability – Q3’22 for evaluation/testing
Product Features

SMDK- Scalable Memory Development Kit
Datacenterto EdgeApplications(IMDB,DLRM,ML/AI,etc)
CXLKernel
Compatible
API
Intelligent Tiering Engine
Optimization
API
Memory Pool Mgmt
Normal ZONE CXL.Mem ZONE
CPU
DRAM
Server
Main
Board
Memory Expander
SMDK
CXL
Allocator

Application Benchmark Test
System #1
Redis
DDR5 (32GB)
CXLMem (64GB)
Client
Set
60GB
Get
60GB
System #1
Client
System #2
Ethernet
Redis
DDR5 (32GB)
Redis
DDR5 (32GB)
Redis
DDR5 (32GB)
Cluster
Set
60GB
Get
60GB
vs
Redis Single Node
(DDR+CXL)
Redis Cluster
(DDR x 3)
CXL Link
Test Scenario Test Result
30
455
699
27
496
659
49
172 186
66
173 189
128B 4KB 1MB 128B 4KB 1MB
Set GET
Scale-up vs Scale-out (Redis)
Single Node(DRAM + CXL) 2 Node Cluster(DRAM)
MB/s
Scale-up performance 2.7x better than scale-out (4KB
chunk size)

 Memory capacity and bandwidth per core is lagging industry demand
 Conventional scaling technologies unable to meet the challenge
 CXL is the most promising technology to address the gap
 Capacity/bandwidth expansion, tiering, pooling use cases
 Samsung is leading the advancement of CXL-based memory solutions
 PoC, ASIC-based module, and SMDK
 Tested PoC with a broad range of partners for more than 1 year
 Samsung enthusiastically welcomes further collaboration with the industry
 Visit Samsung booth to learn more about Samsung’s Memory Expander and SMDK
Key Takeaways

Uksong Kang
VP, DRAM Product Planning
August 2, 2022
Adding New Value to Memory
Subsystems through CXL

© SK hynix Inc.
CXL Creating New Gateway for Increased Efficiency
• CXL is opening up a new gateway toward efficient use of computing, acceleration and
memory resources resulting in overall TCO reduction in data centers
ASIC (NPU/DPU)
GPU FPGA
CPU
CXL Memory
Computing
Acceleration
Memory Pools
Servers in
Remote Rack
81/16

© SK hynix Inc.
New Values in Memory through CXL
• CXL creates many new additional opportunities in memory subsystems beyond what
is possible today in existing server platforms
#1: Memory bandwidth
and capacity expansion
#4: Memory-as-a-
Service (MaaS)
#2: Memory media
differentiation
#3: Controller
differentiation
82/16

© SK hynix Inc.
Memory Bandwidth and Capacity Gap
• Increase in SoC core counts requires continued increase in memory bandwidth and
capacity, but the gap between such requirements and platform provisioning
capability is growing
0
1
2
3
4
5
6
7
8
9
10
'20 '21 '22 '23 '24 '25 '26 '27 '28 '29 '30
Capacity
per
Socket
[A.U.]
CAP(General Purpose) CAP(Memory Intensive)
2 DIMM per CH 1 DIMM per CH
12CH per SKT
8 CH. per SKT 16 CH
(??)
0
1
2
3
4
5
6
7
8
9
10
'20 '21 '22 '23 '24 '25 '26 '27 '28 '29 '30
Bandwidth
per
Socket
[A.U.]
BW(Intel SKU DIMM 지원 예상)
12CH per SKT
8 CH. per SKT 16 CH
(??)
BW provisioning capability
BW requirement
256GB/DIMM, 12CH, 1DPC
6.4Gbps/IO, 12CH
6.4Gbps/IO, 8CH
9.6Gbps/IO, 12CH
11.2Gbps/IO, 12CH
Memory intensive WL
General purpose WL
Compute intensive WL
Year Year
Memory Capacity Requirement
Memory Bandwidth Requirement
Gap growing
83/16

© SK hynix Inc.
#1: Memory Bandwidth and Capacity Expansion
• CXL memories allow continued scale-out in memory bandwidth and capacity beyond
physical limitations of traditional server platforms
9
8
7
6
5
4
3
2
1
L/RDIMM
L/RDIMM
L/RDIMM
L/RDIMM
L/RDIMM
L/RDIMM
L/RDIMM
L/RDIMM
CPU with
6.4Gbps+
Local DDRx
L/RDIMM
L/RDIMM
L/RDIMM
L/RDIMM
L/RDIMM
L/RDIMM
L/RDIMM
L/RDIMM
CXL
Solution
CXL
Solution
CXL
Solution
CXL
Solution
# of channels
per CPU socket
does not scale-
out due to FF
limit
# of DIMMs per
channel
decreased
from 2 to 1 due
to I/O SI issues
Reaching max
DIMM power
and thermal
limit
PCIe-Gen5~6
Motivation for capacity expansion with
CXL
DDR5
x8ea
+
CXL
x1ea
DDR5(x8ea)
CXL-BME
CXL-BME
CXL-BME
CXL-BME
CPU
DDR5
x8ea
+
CXL
x2ea
DDR5
x8ea
+
CXL
x3ea
DDR5
x8ea
+
CXL
x4ea
1R1W
2R1W
1R1W
2R1W
1R1W
2R1W
1R1W
2R1W
4800
5600
6400
BW Expansion
Motivation for bandwidth expansion with
CXL
Memory
B/W [A.U]
84/16

© SK hynix Inc.
#2: Memory Media Differentiation
• CXL is a memory agnostic non-deterministic protocol allowing differentiation in memories fulfilling the
demands of various server workloads for the future
• Different memory media can provide different performance, capacity, and power design trade-offs
CPU
D
D
R
Standard or
Custom memory
media
.
..…
DDR I/F: Memory aware
and deterministic
CXL I/F: Memory agnostic and non-deterministic
(allows decoupling of memory media from CPU)
..…
D
D
R
C
X
L
..…
C
X
L
…
[1st Tier] [2nd Tier]
64b
64b
85/16

© SK hynix Inc.
#3: Controller Differentiation
• Enhanced RAS (ECC, PPR, ECS), security, lower power, computation, processing,
acceleration features can be included inside the CXL controller for added values
CPU
DIMM
DIMM
DIMM
DIMM
Memory
Media
CXL (PCIe)
CXL CMS
Engine
CXL
CTRL
JEDEC
Computational memory solution
Enhanced RAS
ECC
Low Power
Processing
PPR
Controller Computation
Acceleration
Security
Value-add features in CXL controller
86/16

© SK hynix Inc.
#4: Memory-as-a-Service (MaaS)
• CXL allows building composable scalable rack-scale memory pool appliances which
can be populated with different types of memory media
• Variable memory capacity can be effectively allocated within the memory pool to
different xPUs through memory virtualization
CXL Memory
CXL Fabric
Disaggregated
Memory Pool
Fabric
Networ
k
DRAM / SCM
DRAM / SCM
Disaggregated
Storage Pools Another
HPC
Server
CXL Switch
Accelerator Interconnect (e.g.,NVLink)
NAND / SCM
SSD
NAND / SCM
DDR
HBM
CP
U
DDR
HBM
CP
U
HBM
GP
U
HBM
NP
U
HBM
xP
U
DPU
CPU Interconnect
Ethernet (Service)
HPC Server
Memory pool appliance:
• Populate with different memory
media based on user’s choice
• Allocate variable memory
capacity through memory
virtualization
87/16

© SK hynix Inc.
SK hynix’s Future Paths in CXL for Increased Values
• Start with CXL memory market enabling, followed by market expansion, and value
addition
• First product is planned with standard DDR DRAMs followed by products with new
value-added features to be defined further through close collaboration with eco-
system partners and customers
STAGE 1:
Ecosystem Enabling
Evaluate pilot devices and launch
first CXL memory for TTM
Role
AIC riser card, EDSFF E3.S
Memor
y F/F
Standard DRAM:
DDR5, DDR4
Memor
y
Media
CXL2.0 on PCIe gen5 x8/x16
Host
I/F
STAGE 2:
Market Expansion
Introduce value-added CXL memory
for expansion & basic pooling
AIC riser card, EDSFF E3.S, E1.S
Optimized memory for better TCO
CXL2.0 on PCIe gen5 x8 or
CXL3.0 on PCIe gen6 x4
STAGE 3:
Value Addition
Expand value-add with stronger
memory RAS, security, near-
memory processing, fabric-attach
etc.
AIC riser card, EDSFF + many more
Optimized memory for better TCO
CXL3.0 on PCIe gen6 x4/x8
88/16

© SK hynix Inc.
SK hynix’s Vision toward Future CXL Memory Solutions
• Envisioning four different types of CXL memory solutions for different use cases
• First CXL memory will be bandwidth memory expansion based on DDR5 DRAM media
followed by capacity expansion, memory pooling, and computational memory
solution
CXL-CMS
(Computational Memory Solution)
CXL-CME
(Capacity Memory Expansion)
CXL-BME
(Bandwidth Memory Expansion)
DDR5 class BW & energy
alleviates loaded latency
Memory expansion w/o
tiering
Capacity low
Higher capacity, lower W/GB,
advanced RAS than DDR5
Memory expansion w/ tiering
Capacity mid/high
Optimized memory media
and module FF for pooling
Memory pooling
Capacity high
Near memory processing
for AI and data analytics
New value for heterogeneous
computing era
TBD
CXL-MPS
(Memory Pooling Solution)
89/16

© SK hynix Inc.
SK hynix’s First CXL Memory now Ready for Take-off
• CXL-BME is a 96GB bandwidth memory expansion module integrated with cost-
effective single-die packaged DRAMs
• DDR5-class bandwidth, DDR5-class latency within 1 NUMA hop, outperforming in
BW/$ and BW/power
EDSFF E3.S 2T Product:
- CXL2.0 on PCIe gen5 x8
- 96GB (2Rx4-like), 1CH 80-bit DDR5
- SDP x4 PKG with 24Gb DDR5 die
- 30GB/s+ random BW
DDR
5
SDP
CTRL
PMIC,
etc.
PMIC,
etc.
DDR
5
SDP
DDR
5
SDP
DDR
5
SDP
DDR
5
SDP
DDR
5
SDP
DDR
5
SDP
DDR
5
SDP
DDR
5
SDP
DDR
5
SDP
DDR
5
SDP
DDR
5
SDP
DDR
5
SDP
DDR
5
SDP
DDR
5
SDP
DDR
5
SDP
DDR
5
SDP
DDR
5
SDP
DDR
5
SDP
DDR
5
SDP
DDR
5
SDP
DDR
5
SDP
DDR
5
SDP
DDR
5
SDP
DDR
5
SDP
DDR
5
SDP
DDR
5
SDP
DDR
5
SDP
DDR
5
SDP
DDR
5
SDP
DDR
5
SDP
DDR
5
SDP
DDR
5
SDP
DDR
5
SDP
DDR
5
SDP
DDR
5
SDP
DDR
5
SDP
DDR
5
SDP
DDR
5
SDP
DDR
5
SDP
Front side Back side
90/16

© SK hynix Inc.
SK hynix’s First CXL Memory in E3.S Form Factor
91/16
• 96GB E3.S module with cost-
effective single-die packages
• Based on DDR5 24Gb DRAM
with most advanced 1anm
process technology
SK hynix Newsroom, 08/01/22

© SK hynix Inc.
HMSDK for Increased Memory Performance
• Performance improved with CXL memory expansion + HMSDK (SW solution) on
high-bandwidth workloads
• HMSDK supports memory use ratio configuration, optimizing page interleaving to
be more BW-aware
Performance of CXL-BME w/ system memory BW & Allocated Memory Size
HMSDK (Heterogeneous Memory Software Development Kit)
92/16

© SK hynix Inc.
CXL Memory System Demo with HMSDK
• HMSDK’s library stores data to both DRAM and CXL memory based on user-defined
ratio configuration
• Check out the SK hynix booth (#509) for the video demo presentation
• Also, at the booth check out the live demo on research studies regarding dynamic, elastic CXL memory
allocation
Advanced Research Project: Elastic CXL Memory Solution
HMSDK Example: 2:1 Ratio in DRAM:CXL Memory
93/16

© SK hynix Inc.
Building Strong CXL Eco-system with Industry Partners
• Close collaboration with all CXL eco-system partners across the entire system
hierarchy is essential for successful launch of future CXL products
• SK hynix is committed to be a key player in building such eco-system by delivering
differentiated value-added memory products to the industry
Software
HW Platform
Memory
Controller
xPU
Storage
IP
94/16

© SK hynix Inc.
Summary
• CXL is creating new values through memory bandwidth and capacity expansion,
memory differentiation, controller differentiation, and Memory-as-a-Service
• SK hynix is excited and committed to contribute to the entire CXL eco-system by
providing many efficient scalable CXL memory solutions with differentiated value-
added memory products from memory expansion to memory pooling, such as BME,
CME, MPS, and CMS
• SK hynix is pleased to be able to announce its first cost-effective 24Gb 1anm DDR5
based 96GB CXL memory in E3.S form factor, which is just the beginning toward
providing more valuable scalable memory solutions to the entire industry in the future
“Check out the SK hynix FMS demo on CXL memory with HMSDK, showcasing performance
improvements of CXL memory with optimized BW-aware SW solutions”
95/16

© 2022 Micron Technology, Inc. All rights reserved. Information, products, and/or specifications are
subject to change without notice. Micron, the Micron logo, and all other Micron trademarks are the
property of Micron Technology, Inc. All other trademarks are the property of their respective owners.
97
Ryan Baxter– Senior Director of Marketing, Data Center
Flash Memory Summit | August, 2022
CXL: Enabling New Pliability
in the Modern Data Center

Micron Confidential
Micron Confidential
DDR
memory
HBM
memory
NAND
storage
Compute
optimized server
AI optimized
server
Memor
y
Storage
CY20 CY25 CY30
AI
servers
Other
servers
CY20 CY25
Data centers = memory centers
Micron’s global data
center market
Memory and storage growth will never be as slow as before, and possibly never as fast as now.
Hyperscale adoption of AI
2x
16%
CAGR
7x storage
6x memory
Drives memory &
storage growth
$180B
$140B
$100B
$60B
$20B
Sources: 1. Hyperscale AI Adoption: Internal Bain research
2. Server Content referencing two published AWS EC2 hardware configs:
AWS Instance types 3/1/22
Standard server Config = 256GB DRAM , 0GB HBM, 1.2TB SSD Storage
AI Server Config = 1152 DRAM+ 320GB HBM, 8TB SSD Storage
3. Global Data Center Market = Micron MI Market Model

Micron Confidential
Micron Confidential
Memory-centric innovations in the data center
99
99
Applying the power of software-defined infrastructure
Increasing
latency
and
capacity
Increasing
cost
and
bandwidth
Hot
data
Warm
Data
Cold
data
Capacity storage
Fast storage
CXL-attached memory
Archival storage
In-pkg
direct attach
Near memory
Far memory
Compute
Compute
Memory
Memory
Storage
Storage
Storage
Memory
Memory
Memory
Storage
Storage
Compute
Compute
Compute
Storage
Storage
Storage
Memory
Memory
Compute
Compute
Compute
Memory
Memory
Memory
Compute
Compute
Storage
Storage
Storage
Compute
Modular
Composable
Performance
Efficient

Micron Confidential
Micron Confidential
CXL Use Cases
100
Alternative to Stacking
Stacking drives non-linear cost/bit
Provide Ultra-High Capacity
Expansion beyond 4H 3DS TSV
Add Memory Bandwidth
CXL enables memory attached points
Balance Memory
Capacity/BW
 DRAM capacity/BW on demand
 Balances GB/core and BW
Reduce System
Complexity
 Fewer memory channels
 Thermally optimized solutions
Enablement After
2DPC
Future 50% slot reduction

Micron Confidential
Micron Confidential
101
Increasing
Latency
and
Capacity
Increasing
Cost
and
Bandwidth
Hot
Dat
a
Warm
Data
Col
d
Dat
a
Capacity Storage
Fast Storage
Memory Expansion
Bandwidth Memory
Archival Storage
HDD Storage
Near Memory
Cache
Memory
Ultra Wide Memory Bus
SATA/Ethernet
Ethernet
Traditional Memory Bus
SSD (QLC)
SSD (TLC)
CXL Attached Memory
The Industry’s
Fully Composable,
Scalable Vision
Memory and Storage Hierarchy
Memory
Expansion
Memory
Expansion
Memory
Expansion
Bandwidth
Memory
Bandwidth
Memory
Near
Memory
Near
Memory

Micron Confidential
Micron Confidential
Micron’s “data centric” portfolio
102
Compute Storage
Networking
Hyperscale Enterprise & gov Communication Edge
Acceleration
Deep customer relationships Ecosystem enablement
Silicon
technology
Emerging
memory
Advanced
packaging
Tech node
leadership
HBM GDDR LPDDR DDR TLC NAND QLC NAND
Standards body leadership
A complete portfolio built on silicon technology, world class manufacturing, and a diversified supply chain.

Micron Confidential
Micron Confidential
Micron is committed to partnering with the industry;
ultimately serving and delighting our customers
103
Strategic ecosystem
partnerships
 Define, develop and prove
technologies
 DDR, LP, & GDDR
 GPU Direct Storage
 Enable differentiated solutions
 Extend the ecosystem
Industry organizations
 Provide leadership in industry
organizations to enable scalable
advancement

Arvind Jagannath
Product Management
VMware
Charles Fan
CEO & Co-founder
MemVerge
Manoj Wadekar
SW-Defined Memory
Workstream Lead, OCP,
Storage Architect, Meta

Afternoon Agenda
106
3:25 3:45 Arvind Jagannath Cloud Platform Product Management
3:45 4:05 Mahesh Wagh Senior Fellow
4:05 4:25 Charles Fan CEO & Co-founder
4:25 4:45 Manoj Wadekar
SW-Defined Memory Workstream Lead,
OCP, Storage Architect, Meta
4:45 5:10
Siamak Tavallaei
Panel Moderator
5:10 5:35
Chris Mellor
Panel Moderator
Editor

Virtual CXL presentations now available on the
MemVerge YouTube channel
107

Confidential │ © VMware, Inc. 108
Towards a CXL future with
VMware

• This presentation may contain product features or functionality that are currently under development.
• This overview of new technology represents no commitment from VMware to deliver these features in any generally
available product.
• Features are subject to change, and must not be included in contracts, purchase orders, or sales agreements of any
kind.
• Technical feasibility and market demand will affect final delivery.
• Pricing and packaging for any new features/functionality/technology discussed or presented, have not been
determined.
Disclaimer

VMware Competencies
SmartNICs and Accelerators
Virtualization ideal for transparent tiering Cluster-wide DRS helps load balance and mitigate risks Strong Ecosystem of partners
Passthrough devices GPUs, sharing and Assignable hardware

Digital Transformation of Businesses
Explosive growth in data
1 NetworkWorld. “IDC: Expect 175 zettabytes of data worldwide by 2025.” December 2018. networkworld.com/article/3325397/idc-expect-175-zettabytes-of-data-worldwide-by-2025.html.
2 IBM. “3D NAND Technology – Implications to Enterprise Storage Applications.” 2015 Flash Memory Summit. 2014. flashmemorysummit.com/English/Collaterals/Proceedings/2015/20150811_FM12_Yoon.pdf.
By 2025, IDC predicts 30% of global data will be real time!
Low Latency for
Mission-critical
Transactions
Need to Deliver
Business Value
in Real-time
Highly Parallel
Processing
on Very Large
Set Of Data
Deliver Risk and
Opportunity
for Future
175 ZB by
2025, With
26% CAGR1,2
Artificial
Intelligence
Business
Intelligence
Real-time
Analytics
Machine
Learning
Big Data
Analytics
Transactional
Processing
Predictive
Analytics
Edge
Processing
Time Series
Virtualization
Hybrid
Cloud

Trends
Explosive Growth of the
Data
Desire to get more out
of the data
More data need to be
processed in real-time
Software has led the
innovation in the cloud,
Hardware is catching up
Need to scale infrastructure to
address data growth
More in-memory computing to
process faster
Need Enterprise class monitoring
and remediation
DRAM is expensive and lacks
high densities
Customer Needs
Trends Vs Customer Needs
Digital
Transformation

VMware’s Big Memory Vision
starts with Software Tiering

vCenter
Tiered Memory
ESXi
Memory Hardware
DDR CXL attached/
Remote Memory/ Slower
Memory
CXL or RDMA
over Ethernet
NVMe Pooled
NVMe
Container
CRX
• Higher Density, more capacity
• Lower TCO
• Negligible Performance Degradation
• Transparent – Single volatile memory address
• No Guest or Application changes
• Run any Operating System
• ESX internally handles page placement
• DRS and vMotion to mitigate risks
• Tiering heuristics fed to DRS
• Ensures fairness across workloads
• Consistent performance
• Zero Configuration changes
• No special tiering settings
• Processor specific monitoring
• vMMR monitors at both VM- and Host-levels
Software Tiering

Phase-1: Host Local memory tiering
Software Tiering: How Does it Work?
ESXi Kernel
Memory
Management
Distributed
Resource
Scheduler
(DRS)
Scheduler
Hypervisor
ESXi Kernel
Memory
Management
DRS Scheduler
Hypervisor
Tiering aware
Tiering aware Tiering aware
Container
CRX
DDR Lower Cost Memory
Container
CRX
DDR
Memory Hardware Memory Hardware
Windows Linux
Windows Linux
1st tier 1st tier 2nd tier
Lower Cost Memory

Future
Software Tiering: How Does it Work?
ESXi Kernel
Host Memory
Management
Cluster-level DRS Scheduler
Hypervisor
Tiering aware
Tiering aware Tiering aware
Container
CRX
DDR CXL-attached DRAM CXL-attached PMem
Pooled NVMe
NVMe
Lower Cost
Memory

Future
Various Tiering Approaches
DRAM
Lower cost/slower memory
1
DRAM
NVMe
2
DRAM
Lower cost/slower memory
3
DRAM
Remote Memory/Host sharing
4
NVMe
DRAM
CXL-attached device/Pool
5
DRAM
6
NVMe-OF
ESX Kernel

Future
Software Tiering with CXL 2.0
ESX Host
Single
Uniform
Memory
Address
Space
DRAM
CXL/NVMeOF/Shared
DRAM/ *
Remote memory/
Slower Memory
CXL
CXL Switch
X Shared and
Pooled Memory
*Prototyping with CXL1.1

How it all fits together?
Managed part of end-to-end vSphere workflow!
Host
VM 1
VM 2
Tier
2
Tier
1
VM
pages
Tier 2
Tier 1
Host 1 Host 2 Host 3
Cluster
VMs
DRS
Tier
sizing
Page
placement
choose
host for
VM
choose
size for
tier
choose
tier for
page
VM
monitor
Host
monitor
DRS
monitor
Administrator
monitor
tier
bandwidth
of VM
monitor
tier
bandwidth
of host

| AMD | Data Center Group| 2022
[Public]
AGENDA
◢ Paradigm Shift and Memory Composability Progression
◢ Runtime Memory Management
◢ Tiered Memory
◢ NUMA domains and Page Migration
◢ Runtime Memory Pooling

[Public]
PARADIGM SHIFT
◢ Scalable, high-speed CXL™ Interconnect and
PIM (Processing in Memory) contribute to the
paradigm shift in memory intensive computations
◢ Efficiency Boost of the next generation data
center
◢ Management of the Host/Accelerator
subsystems combined with the terabytes of the
Fabric Attached Memory
◢ Reduced complexity of the SW stack combined
with direct access to multiple memory
technologies

[Public]
MEMORY COMPOSABILITY PROGRESSION
Host R
P
Buffer
Host R
P
End
Point
View
Mem Direct
Attach
Memory Scale-
Out
Mem Pooling & Disaggregation
• Addresses the cost and
underutilization of the memory
• Multi-domain Pooled Memory -
memory in the pool is allocated/
released when required
• Workloads/ applications
benefiting from memory capacity
• Design optimization for {BW/$,
Memory Capacity/$, BW/core}

[Public]
RUNTIME MEMORY MANAGEMENT

[Public]
TIERED MEMORY
NUMA Domains
Page Migration

[Public]
TIERED MEMORY
NUMA DOMAINS
• Exposed to the HV, Guest OS, Apps
• OS-assisted optimization of the
memory subsystem
• Base on ACPI objects -
SRAT/SLIT/HMAT

[Public]
TIERED MEMORY
PAGE MIGRATION
CCD CCD CCD CCD
IOD
CCD CCD CCD CCD
IOD
Near
Mem
Far Mem
NUMA domains
PROC
CXL mem
Far mem
CXL
CXL
Far Mem
Near Mem
Memory Expansion
PROC
Far Mem
Near Mem
Mem as a Cache
CCD CCD CCD CCD
IOD
CCD CCD CCD CCD
IOD
Near Mem
Far Mem CXL mem
Far Mem
CXL
CXL
Near Mem
NUMA domains
MISS
Shorter latency Longer latency
Near
Mem
‒ Active page migration between Far and Near memories
‒ HV/Guest migrates hot pages into Near Mem and retire cold
pages into Far Mem
‒ Focused DMA to transfer required datasets from the Far to
Near Mem
SW Assisted Page Migration
‒ HW managed Hot Dataset
‒ Near Mem Miss redirected to the Far Mem
‒ App/ HV unawareness
DRAM as a cache optimization

[Public]
TIERED MEMORY
SW ASSISTED PAGE MIGRATION
Combined HW /SW tracking of the
Memory Page Activity/ “hotness”
Detecting Page(s) candidates for
migration
Requesting HV/Guest permission to
migrate
HV/Guest API to Security Processor to
Migrate the Page(s)
Migration – stalling accessed to specific
pages/ copying the data
Page “hotness” –combined action
of the HW and SW tracking
HV/Guest authorization of the
migration
Security Processor as a root of
trust for performing the migration

[Public]
RUNTIME MEMORY ALLOCATION/POOLING
FABRIC ATTACHED MEMORY
Host Host
Tier2 Mem
Multi-Headed CXL
controller
 Multiple structures serve for fabric level memory pooling
 Combination of the private (dedicated to specific host) and shareable memory ranges
 Protection of the memory regions from unauthorized guests and hypervisor
 Allocation/Pooling of the memory ranges between Hosts is regulated by the fabric
aware SW layer (i.e., Fabric Manager)

[Public]
RUNTIME MEMORY ALLOCATION/POOLING
FABRIC ATTACHED MEMORY
 Memory Allocation Layer – communicates
<new memory allocation per Host> based
on the system/apps needs
 Fabric Manager – adjusts the fabric
settings and communicates new memory
allocations to the Host SW
 Host SW - Invokes Hot Add/Hot Removal
method to increase/ reduce (or offline) an
amount of memory allocated to the Host
 In some instances, Host SW can directly
invoke SP to adjust the memory size allocated
to the Host
 On–die Security Processor (Root of Trust)
is involved in securing an exclusive access
to the memory range

[Public]
SUMMARY
Composable Disaggregated Memory is the key approach to address
the cost and underutilization of the System Memory
Further investment in the Runtime Management of the Composable &
Multi-Type memory structures is required to maximize the system level
performance across multiple use-cases
Application Transparency is another goal of efficient Runtime
Management by abstracting away an underlying fabric/memory
infrastructure

CXL: The Dawn of Big Memory
Charles Fan
Co-founder & CEO
MemVerge

The Rise of Modern Data-Centric Applications
135
EDA Simulation
AI/ML Video Rendering
Geophysical
Genomics Risk Analysis
CFD
Financial Analytics

Opening the Door to the
Era of Big Memory
136
Abundant
Composable
Available

What happened 30+ years ago
137
Just Bunch
of Disks
Storage Area
Network
(SAN)
Advanced
Storage Services
Fiber Channel Storage Data Services

Where We Are Going…
138
Just Bunch
of Disks
New
Memory
Storage Area
Network (SAN)
Pooled Memory
Advanced
Storage Services
Memory-as-
a-Service
Fiber Channel Storage Data Services
CXL Memory Data Services

Disaggregated & Pooled Memory
Memory Pool
Computing Servers
Pool Manager
CXL Switch

Dynamic Memory Expansion
Reduces Stranded Memory
Before CXL
Use Case #1
Used Memory Memory not used
* H. Li et. Al. First-generation Memory Disaggregation for Cloud Platforms.
arXiv:2203.00241v2 [cs.OS], March 5, 2022
Azure Paper*:
• Up to 50% of server costs is from DRAM alone
• Up to 25% of memory is stranded
• 50% of all VMs never touch 50% of their rented memory

Dynamic Memory Expansion
Reduces Stranded Memory
After CXL
Used Memory Memory not used
Use Case #1
Memory disaggregation can save billions of dollars per year.

Memory Auto-healing
With Transparent Migration
2. Provision a new memory
region from the pool
1. A memory module is becoming bad:
error rate going up.
3. Transparent
migration of
memory data
4. Memory Auto-healing
complete
Use Case #2

Distributed Data Shuffling
Local
SSD
Use Case #3
Before CXL
Local
SSD
Local
SSD
Network Network
Storage I/O w/
Serialization
Deserialization

Using Shared Memory Read
Use Case #3
After CXL
S. Chen, et. Al. Optimizing Performance and Computing Resource Management of
in-memory Big Data Analytics with Disaggregated Persistent Memory. CCGRID'19
Project Splash is open source: https://github.com/MemVerge/splash

Key Software Components
145
Memory
Snapshot
Memory
Tiering
Resource
management
Transparent Memory Service
Operating Systems
App App App App
CXL Switch
CXL
Computing Hosts Memory Pool
Memory Provisioning &
Sharing
Capacity Optimization
Global Insights
Security
Data
Protection
Memory Machine Pool Manager
Operating System
Pool Server
Memory
Viewer
App profiler
Hardware API Integration
Memory
Sharing

146
Memory
Snapshot
Memory
Tiering
Resource
management
Operating Systems
App App App App
CXL Switch
CXL
Sharing
Global Insights
Security
Data
Protection
Operating System
Pool Server
Memory
Viewer
App profiler
Memory
Sharing

14
7
Memory Capacity Expansion
• Software-defined Memory Pool
with intelligent Auto-tiering
• No application change required
Accelerate Time-to-discovery
• Transparent checkpointing
• Roll-back, restore and clone
anywhere any time
Reduce Cloud Cost by up to 70%
• Enable long-running applications
to use low-cost Spot instances
• Integration with cloud automation
and scheduler to auto-recover
from CSP preemptions
Memory Machine™
Memory Snapshot Service Memory Tiering Service System & Cloud
Orchestration Service
Linux
Compute Memory Storage
HBM
DDR
CXL
Genomics EDA Geophysics Risk Analysis Video Rendering Others
CPU GPU xPU
SSD
HDD
Announcing Memory Machine Cloud Edition

14
8
Memory Machine™
Memory Snapshot Service Memory Tiering Service System & Cloud
Orchestration Service
Linux
64GB of DDR5 DRAM 64GB of CXL DRAM Expander Card
(Montage Technologies)
MLC
Memory Latency Checking
Early Results Running Memory Machine on CXL
Next-Gen Server
Streams
Microbenchmark
Application
Redis

Early Results Running Memory Machine on CXL
149
0
5
10
15
20
25
30
35
40
45
ALL Reads 3:1 Reads-
Writes
2:1 Reads-
Writes
1:1 Reads-
Writes
Stream-triad
like
Throughput
(GB/S)
Workload Types
MLC (Memory Latency Checker) Results
DDR5 Only CXL Only DDR+CXL Memory Machine Auto-Tiering
0
5
10
15
20
25
Copy: Scale: Add: Triad:
Throughput
(GB/s)
Workload Types
Stream Results
DDR5 Only CXL Only DDR+CXL Memory Machine Auto-Tiering

Live Demos at MemVerge Booth
150

151
Memory
Snapshot
Memory
Tiering
Resource
management
Linux
App App App App
CXL Switch
CXL
Sharing
Global Insights
Security
Data
Protection
Linux
Pool Server
Memory
Viewer
App profiler

153
Application Memory Heatmap
Memory Viewer Free Download:
http://www.memverge.com/MemoryViewer

Software Partner to the CXL Ecosystem
154
Founded in 2017 to develop Big Memory software
Memory
Snapshot
Memory
Tiering
Resource
management
Memory Provisioning & Sharing Capacity Optimization
Global Insights
Security
Data Protection
Memory
Viewer
App profiler
Memory
Sharing
Big Memory Software
Processors:
Servers:
Switches:
Memory
Systems:
Clouds
Big Memory Apps
Standards Bodies

Memorize the future.
Please visit our booth for the live demos

Enabling Software
Defined Memory
Real World Use
Cases with CXL
156
Manoj Wadekar
Hardware System Technologist

Agenda
• SDM Workstream within OCP
• Hyperscale Infra - Needs
• Memory Hierarchy to address the needs
• SDM Use cases
• SDM Activities and Status
157

SDM Team Charter
- SDM (Software Defined Memory) is a workstream within
Future Technology Initiatives within OCP
Charter:
- Identify key applications driving adoption of Hierarchical/ Hybrid
memory solutions
- Establish architecture and nomenclature for such Systems
- Offer benchmarks that enable validation of novel ideas for
HW/SW solutions for such systems

Hyperscale Infrastructure
159
• Application performance and growth depends on
⎻ DC, System, Component performance and growth
⎻ Compute, Memory, Storage, Network..
• Focusing on Memory discussion
Ads
FE
Web
Database/
Cache
Inference
Ads
Data
Wearhouse
Data
Wearhouse
Database/
Cache
Storage
Training

Memory Challenges
160
Bandwidth and Capacity
• The Gap between bandwidth
and capacity is widening
• Applications ready to trade
between bandwidth and
capacity
Power
• DIMMs consume significant
share of rack power
⎻ DDR5 exacerbates this
• Applications co-design to
achieve higher capacity at
optimized power
TCO
• Cost impact of min
capacity increase and
Die/ECC overheads
• Applications can trade
performance/capacity to
achieve optimal TCO

“Memory” Pyramid today
161
Capacity driven
Bandwidth driven
DRAM
NAND SSD
Cache
HBM
Databases,
Caching..
GP Compute,
Training..
Inference,
Caching..

Use Case Examples
162
• Caching (e.g. Memcache/Memtier (Cachelib), Redis etc.)
⎻ Need to achieve higher QPS while satisfying “retention time”
⎻ Higher memory capacity needed
⎻ Current solutions include ”tiered memory” with DRAM+NAND, but need load/store
• Databases (E.g. RocksDB/MongoDB etc.)
⎻ Need to achieve efficient storage capacity per node and deliver QPS SLA
⎻ Higher amount of memory enables more storage per node
• Inference (E.g. DLRM)
⎻ Petaflops and Number of parameters are increasing rapidly
⎻ AI Models are scaling faster than the underlying memory technology
⎻ Current solutions include ”tiered memory” with DRAM+NAND, but need load/store

AI at Meta
● Across many applications/services and at scale → driving a portion of
our overall infrastructure (both HW and SW)
● From data centers to the edge
Keypoint
Segmentation
Augmented Reality
with Smart Camera

● Compute, Memory BW, Memory Capacity, all scale for frontier models
○ Scaling typically is faster than scaling of technology
● The rapid scaling requires more vertical integration from SW requirements to HW
design
Problem Statement: AI workloads scale
rapidly

DLRM Memory Requirements
● Bandwidth
1. Considerable portion of capacity needs high BW
Accelerator memory.
2. Inference has a bigger portion of the capacity at
low Bandwidth. More so than training.
● Latency
3. Inference has a tight latency requirement, even on
the low BW end

System Implications of DLRM Requirements
● A tier of memory beyond HBM and DRAM can be
leveraged, particularly for inference
○ Higher latency than main memory. But still tight
latency profile (e.g TLC Nand Flash does not
work)
○ Trade off perf for density
○ This does not negate the Capacity and BW
demand for HBM and DRAM

“Tiered Memory” Pyramid with CXL
167
Capacity driven
Bandwidth driven
Databases,
Caching..
GP Compute,
Training..
Inference,
Caching..
BW Memory
NAND SSD
Capacity
Memory
DRAM
Cache
CXL Attached
HBM
• Load/store interface
• Cache line read/writes
• Scalable
• Heterogeneous
• Standard interfaces

Memory Technologies
168
Compute Storage Training Inference
DDR4
DDR5
HBM
CXL+DDR
SCM
(PCIe/CXL)
[Exploration
Phase]

169
OCP SDM activity and progress
• SDM’s focus: Apply emerging memory technologies in the development of use cases
• The OCP SDM group has three real-world memory focus areas:
⎻ Databases/Caching
⎻ AI/ML & HPC
⎻ Virtualized Servers
• SDM Team Members: AMD, ARM, Intel, Meta, Micron, Microsoft, Omdia, Samsung,
VMWare,
• Vendors are demonstrating CXL Capable CPUs and devices
• Meta and others are investigating solutions to real world memory problems
SDM – Enabling memory solutions from emerging memory
technologies

170
Ben Bolles
Executive Director,
Product Management
Liqid
Gerry Fan
Founder, CEO
Xconn Technologies
Siamak Tavallaei
Panel Moderator
President
CXL Consortium
George Apostol
CEO
Elastics.cloud
Christopher Cox
VP Technology
Montage
Composable Memory Panel

171
Chris Mellor
Editor
Blocks and Files
Manoj Wadekar
SW-Defined Memory
Workstream Lead, OCP,
Storage Architect, Meta
Richard Solomon
Tech Mktg Mgr., PCIe/CXL
Synopsys
Bernie Wu
VP Strategic Alliances
MemVerge
James Cuff
Distinguished Engineer
Harvard University (retired)
Industry Expert, HPC & AI
Industr
y
Expert
Big Memory App Panel

CXLTM: Getting Ready for Take-Off - SEO-Optimized Title for CXL Forum Document

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to CXLTM: Getting Ready for Take-Off - SEO-Optimized Title for CXL Forum Document

Similar to CXLTM: Getting Ready for Take-Off - SEO-Optimized Title for CXL Forum Document (20)

More from Memory Fabric Forum

More from Memory Fabric Forum (20)

Recently uploaded

Recently uploaded (20)

CXLTM: Getting Ready for Take-Off - SEO-Optimized Title for CXL Forum Document