SlideShare a Scribd company logo
1 of 172
CXL™: Getting
Ready for Take-Off
Full-Day Forum at Flash Memory Summit
Hosted by The CXL Consortium and MemVerge
Slides and video now available at
https://memverge.com/cxl-forum/
Morning Agenda
2
Start End Name Title Organization
8:35 8:50 Siamak Tavallaei
President, CXL Consortium, Chief System
Architect, Google Infrastructure
8:50 9:10 Willie Nelson Technology Enabling Architect
9:10 9:30 Steve Glaser Principal Engineer, PCI-SIG Board Member
9:30 9:50 Shalesh Thusoo VP, CXL Product Development
9:50 10:10 Jonathan Prout Sr. Manager, Memory Product Planning
10:10 10:30 Uksong Kang Vice President, DRAM Product Planning
10:30 10:50 Ryan Baxter Sr. Director of Marketing
Session SPOS-101-1 on the FMS program
Afternoon Agenda
3
Start End Name Title Organization
3:25 3:45 Arvind Jagannath Cloud Platform Product Management
3:45 4:05 Mahesh Wagh Senior Fellow
4:05 4:25 Charles Fan CEO & Co-founder
4:25 4:45 Manoj Wadekar
SW-Defined Memory Workstream Lead,
OCP, Storage Architect, Meta
4:45 5:10
Siamak Tavallaei
Panel Moderator
President, CXL Consortium, Chief System
Architect, Google Infrastructure
5:10 5:35
Chris Mellor
Panel Moderator
Editor
Session SPOS-102-1 on the FMS program
Update from the CXL Consortium
4
Siamak Tavallaei
CXL President
Chief Systems Architect at
Google Systems Infrastructure
5 | ©2022 Flash Memory Summit. All Rights Reserved.
CXL™ Consortium Update
Siamak Tavallaei, CXL President
6 | ©2022 Flash Memory Summit. All Rights Reserved.
Introducing the CXL Consortium
CXL Board of
Directors
200+ MemberCompanies
IndustryOpen Standardfor
HighSpeedCommunications
7 | ©2022 Flash Memory Summit. All Rights Reserved.
CXL Specification Release Timeline
March
2019
CXL 1.0
Specificatio
n Released
September 2019
CXL Consortium
Officially
Incorporates
CXL 1.1
Specification
Released
November 2020
CXL 2.0
Specification
Released
August
2022
CXL 3.0
Specification
Released
Press Release
August 2, 2022, Flash Memory Summit
CXL Consortium releases Compute
Express Link 3.0 specification to
expand fabric capabilities and
management
Members: 130+
Members: 15+ Members: 200+
8 | ©2022 Flash Memory Summit. All Rights Reserved.
Compute Express Link ™ (CXL™) Overview
9 | ©2022 Flash Memory Summit. All Rights Reserved.
Industry Landscape
Proliferation
of Cloud
Computing
Growth of
AI &
Analytics
Cloudification
of
the Network &
Edge
10 | ©2022 Flash Memory Summit. All Rights Reserved.
Data Center: Expanding Scope of CXL
CXL 2.0 across Multiple Nodes inside a Rack/ Chassis
supporting pooling of resources
Future - CXL 3.0
Fabric growth for
disaggregation/pooling/accelerator
11 | ©2022 Flash Memory Summit. All Rights Reserved.
Growing Industry Momentum
• CXL Consortium showcased first public demonstrations of CXL
technology at SC’21
• View virtual and live demos from CXL Consortium members here:
https://www.computeexpresslink.org/videos
• Demos showcase CXL usages, including memory development, memory
expansion and memory disaggregation
12 | ©2022 Flash Memory Summit. All Rights Reserved.
Industry Focal Point
CXL is emerging as the industry focal point
for coherent IO
• CXL Consortium and OpenCAPI sign letter of
intent to transfer OpenCAPI specification and
assets to the CXL Consortium
• In February 2022, CXL Consortium and Gen-
Z Consortium signed agreement to transfer
Gen-Z specification and assets to CXL
Consortium
August 1, 2022, Flash Memory Summit
CXL Consortium and OpenCAPI Consortium
Sign Letter of Intent to Transfer OpenCAPI
Assets to CXL
13 | ©2022 Flash Memory Summit. All Rights Reserved.
Unveiling the CXL 3.0 specification
Press Release
August 2, 2022, Flash Memory Summit
CXL Consortium releases Compute
Express Link 3.0 specification to
expand fabric capabilities and
management
14 | ©2022 Flash Memory Summit. All Rights Reserved.
Industry trends
• Use cases driving need for higher
bandwidth include: high performance
accelerators, system memory, SmartNIC
and leading edge networking
• CPU efficiency is declining due to
reduced memory capacity and bandwidth
per core
• Efficient peer-to-peer resource sharing
across multiple domains
• Memory bottlenecks due to CPU pin and
thermal constraints
CXL 3.0 introduces
• Fabric capabilities
• Multi-headed and fabric attached devices
• Enhance fabric management
• Composable disaggregated infrastructure
• Improved capability for better scalability
and resource utilization
• Enhanced memory pooling
• Multi-level switching
• New enhanced coherency capabilities
• Improved software capabilities
• Double the bandwidth
• Zero added latency over CXL 2.0
• Full backward compatibility with CXL 2.0,
CXL 1.1, and CXL 1.0
CXL 3.0 Specification
15 | ©2022 Flash Memory Summit. All Rights Reserved.
CXL 3.0 Specification Feature Summary
16 | ©2022 Flash Memory Summit. All Rights Reserved.
CXL 3.0: Expanding CXL Use Cases
• Enabling new usage models
• Memory sharing between hosts and peer devices
• Support for multi-headed devices
• Expanded support for Type-1 and Type-2 devices
• GFAM provides expansion capabilities for current and future memory
Download the CXL 3.0 specification on www.ComputeExpressLink.org
17 | ©2022 Flash Memory Summit. All Rights Reserved.
Call to Action
• Join the CXL Consortium, visit www.computeexpresslink.org/join
• Attend CXL Consortium presentations at the Systems Architecture
Track on Wednesday, August 3 for a deep-dive into the CXL 3.0
specification
• Engage with us on social media
@ComputeExLink www.linkedin.com/company/cxl-consortium/ CXL Consortium Channel
18 | ©2022 Flash Memory Summit. All Rights Reserved.
Thank you!
19 | ©2022 Flash Memory Summit. All Rights Reserved.
Backup
20 | ©2022 Flash Memory Summit. All Rights Reserved.
Multiple Devices of all Types per Root Port
Each host’s root port can
connect to more than one
device type
1
21 | ©2022 Flash Memory Summit. All Rights Reserved.
Fabrics Overview
CXL 3.0 enables non-tree
architectures
• Each node can be a CXL
Host, CXL device or PCIe
device
1
22 | ©2022 Flash Memory Summit. All Rights Reserved.
Switch Cascade/Fanout
Supporting vast array of switch topologies
Multiple switch
levels (aka cascade)
• Supports fanout
of all device types
1
23 | ©2022 Flash Memory Summit. All Rights Reserved.
Device to Device Comms
CXL 3.0 enables peer-to-
peer communication
(P2P) within a virtual
hierarchy of devices
• Virtual hierarchies are
associations of devices
that maintains a
coherency domain
1
24 | ©2022 Flash Memory Summit. All Rights Reserved.
Coherent Memory Sharing
Device memory can be
shared by all hosts to
increase data flow efficiency
and improve memory utilization
Host can have a coherent
copy of the shared region or
portions of shared region in
host cache
CXL 3.0 defined mechanisms
to enforce hardware cache
coherency between copies
1
2
3
25 | ©2022 Flash Memory Summit. All Rights Reserved.
Memory Pooling and Sharing
Expanded use case
showing memory sharing
and pooling
CXL Fabric Manager is
available to setup, deploy,
and modify the
environment
1
2
26
Willie Nelson
Architect
Intel
Steve Glaser
Principal Architect, PIC
SIG Board Member
NVIDIA
Shalesh Thusoo
CXL Business Unit
Marvell
27 | ©2022 Flash Memory Summit. All Rights Reserved.
CXL – Industry Enablement
Willie Nelson
Technology Enabling Architect - Intel
August 2022
28 | ©2022 Flash Memory Summit. All Rights Reserved.
Introducing the CXL Consortium
CXL Board of
Directors
200+ MemberCompanies
IndustryOpen Standardfor
HighSpeedCommunications
29 | ©2022 Flash Memory Summit. All Rights Reserved.
Growing Industry Momentum
• CXL Consortium showcased first public demonstrations of CXL
technology at SC’21
• View virtual and live demos from CXL Consortium members here:
https://www.computeexpresslink.org/videos
• Demos showcase CXL usages, including memory development, memory
expansion and memory disaggregation
30 | ©2022 Flash Memory Summit. All Rights Reserved.
Industry Focal Point
CXL is emerging as the industry focal point
for coherent IO
• CXL Consortium and OpenCAPI sign letter of
intent to transfer OpenCAPI specification and
assets to the CXL Consortium
• In February 2022, CXL Consortium and Gen-
Z Consortium signed agreement to transfer
Gen-Z specification and assets to CXL
Consortium
August 1, 2022, Flash Memory Summit
CXL Consortium and OpenCAPI Consortium
Sign Letter of Intent to Transfer OpenCAPI
Assets to CXL
31 | ©2022 Flash Memory Summit. All Rights Reserved.
CXL Specification Release Timeline
March
2019
CXL 1.0
Specificatio
n Released
September 2019
CXL Consortium
Officially
Incorporates
CXL 1.1
Specification
Released
November 2020
CXL 2.0
Specification
Released
August
2022
CXL 3.0
Specification
Released
Press Release
August 2, 2022, Flash Memory Summit
CXL Consortium releases Compute Express
Link 3.0 specification to expand fabric
capabilities and management
Members: 130+
Members: 15+ Members: 200+
32 | ©2022 Flash Memory Summit. All Rights Reserved.
New Technology Enabling – Key Contributors
Revolutionary
New
Technology
HWDevelopmentTools(Analyzers,etc.)
SuccessfulNewTechnologyEnablingRequiresALL Contributors to beViableforIndustry Adoption
HWSilicon/ControllerVendors
SiIPProviders(incl.pre-sisimulation)
HardwareProductionProductVendors
SWDevelopmentTools(testing,debug,perf.,etc.)
Device/UseCaseOSDrivers
OperatingSystemSupport
UseCaseApplications(tangiblebenefitsw/newtech)
Standards/Consortiums/etc…
IndustryAdoption
33 | ©2022 Flash Memory Summit. All Rights Reserved.
Intel CXL Memory Enablement & Validation
DDR PCIe CXL Memory
POR Platform
Configurations
• Large matrix of POR
configurations
• “Open socket” (extensive
variety of technology
and use cases
• Plans to validate specific POR configurations
of CXL memory per platform, with several
vendors and modules – not exhaustive
Engagement Model • Direct engagement and
collaboration with Tier1
suppliers
• SIG-based engagement
with PCIe IHVs
• Targeted engagement with numerous CXL
memory device & module IHVs, as well as key
customers, plus multiple Consortium based
compliance workshops and various
interactions
Validation Model • Early and exhaustive
Host-based validation
spanning electrical,
protocol, functional
• SIG-led compliance
workshops & plugfests
• Host PCIe validation focus
on PCIe channel, protocol
features/function
• Limited platform
validation with PCIe
products
• Host validation focus on CXL channel, features
& function of CXL memory as part of
platform’s memory subsystem
• CXL memory device & module IHV validation
focus on device+media channel,
function/features
• Long term plan: Consortium-led compliance
testing
Comparing CXL memory validation to DDR/PCIe efforts
Approach for CXL memory expected to evolve over generations to be PCIe-like
34 | ©2022 Flash Memory Summit. All Rights Reserved.
Industry CXL Memory HW Enabling & Validation
CPU Vendor focus:
• Work with device & module vendors to enable key features
• Provided CXL vendors an open, bridge architecture reference
document as an initial guide, covering Bridge/module
operation/features recommendations
• Device/module platform integration (focused configs)
• For initial AIC CEM Modules – focused validation of
the media interface
• Validation:
• Host-side CXL functions
• Memory features – RAS, etc.
• CXL channel
• Specificconfigs and vendors (# of ports, capacity, etc.)
• SW Enabling:
• Intel providing reference system FW/BIOS
• Part of the industry effort to develop an open-source driver
• SW guide for type 3 devices
CXL
IP
Media
IP
DRAM
DRAM
DRAM
DRAM
DRAM
DRAM
DRAM
DRAM
DRAM
DRAM
CXL Memory Bridge
(aka controller, buffer)
CXL Memory Module*
Bridge-media channel
Host CXL Channel
CXL
IP
Controller/Module Vendor focus: (bridge
or module)
• Memory media interface, channel
electricals, media training/MRC
• CXL compliance and interoperability
testing
*Standardization of CXL memory
module form factors – EDSFF E3.s &
E1.s, PCI CEM and mezzanine in process
OEM/System provider focus:
• Device/module platform integration
• Configuration testing
• In-rack level testing
• Usage models testing/debug
• System Validation:
• SW integration including system FW/BIOS,
OS, generic driver
• Generate integrator list
Config-1………….. Config-N
A Massive Coordinated Industry Effort
CPU/Host
35 | ©2022 Flash Memory Summit. All Rights Reserved.
Q & A
Willie Nelson
Technology Enabling Architect - Intel
August 2022
36 | ©2022 Flash Memory Summit. All Rights Reserved.
CXL Delivers the Right Features & Architecture
CXL
Anopen industry-supported
cache-coherent
interconnectforprocessors,
memoryexpansion and
accelerators
CoherentInterface
LeveragesPCIewith3mix-and-match
protocols
LowLatency
.Cacheand.MemorytargetedatnearCPU
cachecoherentlatency
AsymmetricComplexity
Easesburdensofcachecoherent
interfacedesigns
Challenges
Industrytrendsdriving demandforfasterdataprocessingandnext-gen
datacenterperformance
Increasingdemandforheterogeneouscomputing andserver
disaggregation
Needforincreasedmemorycapacityandbandwidth
Lackofopenindustrystandardtoaddressnext-geninterconnect
challenges
https://www.computeexpresslink.org/resource-library
37 | ©2022 Flash Memory Summit. All Rights Reserved.
Representative CXL Usages
Memory
CXL • CXL.io
• CXL.memory
PROTOCOLS
Memory
Memory
Memory
Memory
MemoryBuffer
Processor
DDR
DDR
• MemoryBW expansion
• Memorycapacity expansion
• Storage classmemory
USAGES
Accelerators with Memory
CXL • CXL.io
• CXL.cache
• CXL.memory
PROTOCOLS
• GP GPU
• Densecomputation
USAGES
HBM
Accelerator
Cache
Processor
DDR
DDR
CachingDevices /Accelerators
CXL • CXL.io
• CXL.cache
PROTOCOLS
• PGAS NIC
• NIC atomics
USAGES
Accelerator
NIC
Cache
Processor
DDR
DDR
TYPE1 TYPE2 TYPE3
HBM
38 | ©2022 Flash Memory Summit. All Rights Reserved.
Usage Local Bandwidth or Capacity Expansion Memory Pooling
Main memory expansion Two-Tier Memory
Value Prop Scale performance or enable use of higher core counts via
added bandwidth and/or capacity
Flexible memory assignment, enabling:
- Lower total memory cost
- Platform SKU reduction & OpEx efficiency
CXL Memory Attributes Bandwidth and features
similar to direct attach DDR
Lower bandwidth, higher
latency vs. direct attach DDR
Bandwidth and features similar to direct
attach DDR, latency similar to remote socket
access
Software Considerations OS version must support CXL
memory.
CXL memory visible either in
same region as direct attach
DDR5 or as a separate region
OS version must support CXL
memory.
SW-visible as Persistent next-
tier memory
OS version must support CXL memory.
Additional software layer for orchestration
of pooled memory and multi-port controller
CXL Memory Overview
Pool
CPU
Direct Attach DDR5
EDSFF E3
or E1
PCI CEM/Custom Board
Pooled
Memory
Controller
39
CXL: BEYOND JUST ANOTHER INTERCONNECT
PROTOCOL
STEVE GLASER
40
AGENDA
 Cache Coherence for Accelerators
 Expansion Memory for CPUs
 Flexible Tiered Memory Configurations
 Security
41
CPU-GPU CACHE COHERENCE
Unified programming model across CPU architectures
 CPU-GPU coherence provides programmability
benefits
 Ease of porting applications to GPU
 Rapid development for new applications
 Grace + Hopper Superchip introduces cache-
coherent programming to GPUs
 CXL enables the same programming benefits
for our GPUs in systems based on 3rd-party
CPUs
Grace
CPU
Hopper
GPU
Coherent
NVLink C2C
x86/ Arm
CPU
NVIDIA
GPU
Coherent
CXL Link
42
PROGRAMMABILITY BENEFITS
CXL CPU-GPU cache coherence reduces barrier to entry
 Without Shared Virtual Memory (SVM) + coherence, nothing
works until everything works
 Enables single allocator for all types of memory: Host, Host-
accelerator coherent, accelerator-only
 Eases porting complicated pipelines in stages
 Many SW layers exist between frameworks and drivers
 Example: start with malloc, keep using malloc until you choose
otherwise
 Vendor-provided allocators remain fully supported and functional
 Workloads are pushing scaling boundaries  fine-grained
synchronization is on the rise
 Synchronization latency matters
 Avoid setup latency, do it in-memory when possible
 Host/device synchronization in device’s memory
 Concurrent algorithms and data structures become available
 Example: full C++ atomics support across host and device
 Locks
 Any suballocation can be used for synchronization, regardless of
placement
Ap
pP
erf
Programmi
ng Effort
Star
t
v1 with
SVM +
Coherenc
e
v1 without
SVM or
Coherence
43
CXL FOR CPU MEMORY EXPANSION
 SOC DDR channel count is becoming
constrained
 CXL-enabled PCIe ports can be used for
additional memory capacity
 Flexibility in underlying media choice, trading
off capacity/latency/persistence
 DRAM
 DRAM + cache
 Storage-class memory
 DDR/SCM + NVMe
DDR
Host
SOC0
Host
SOC1
DDR
DDR
DDR
DDR
DDR
DDR
DDR
DDR
DDR
CXL
Mem
CXL
Mem
CXL
Mem
CXL
Mem
44
CXL FOR MEMORY DISAGGREGATION
 Currently, data center servers are often over-provisioned with
memory
 All Hosts must be have enough DRAM to handle the demands of worst-
case workloads
 Under less memory-intensive workloads, DRAM is unused and wasted
 DRAM is very expensive at data center scale
 Large banks of CXL memory can be distributed among several Hosts
 Memory pools may be attached to Hosts via CXL switches, or directly
attached using multi-port memory devices
 Pooling
 Each Host is allocated a portion of the disaggregated memory
 Memory pools can be reallocated as needed
 Reduces memory over-provisioning on each Host while allowing
flexibility to handle a range of workloads with differing memory
demands
 Sharing
 Address ranges which may be accessed by multiple Hosts simultaneously
 Coherence may be provided in hardware by the CXL Device or may be
software-managed
CXL Switch Fabric
Host Host Host
CXL
Memory
Pool
CXL
Memory
Pool
45
CXL FOR GPU EXPANSION MEMORY
Tackling AI with very large memory capacity demands
 Accelerator workloads with large memory footprints are currently
challenged
 Constrained by bandwidth available to Host over PCIe
 Contention with Host SW for memory bandwidth
 CXL memory expanders may be directly attached to accelerators for
private use
 Tiered memory for GPUs: HBM and CXL tradeoffs
 Bandwidth
 Capacity
 Cost
 Flexibility
GPU
Host
CXL
Memory
HBM
HBM
Coherent CPU-GPUCXL Link
Private GPU-MemoryCXL Link(s)
46
CXL FOR GPU MEMORY POOLING
Streamlined Accelerator Data Sharing
 Memory pools may provide flexibility to apportion
memory to individual GPUs as needed
 Provides solution to workloads where capacity is
important and bandwidth is secondary
 Large data sets can be stored in CXL memory and
shared as needed among accelerators, without
burdening interface to Host
GPU GPU GPU
CXLSwitch Fabric
Host Host Host
CXL
Memory
Pool
CXL
Memory
Pool
HBM
HBM
HBM
HBM
HBM
HBM
47
SHARED EXPANSION MEMORY
CPU-GPU Shared Memory Pools
 CXL enables sharing of expansion memory
between Host and GPU
 Future capabilities may allow expansion memory
to simultaneously be shared
 Among Hosts
 Between Hosts and Accelerators
 Flexibility in provisioning under varying demands
 Ease of programming model
 CXL Switch could be local physical switch or
virtual switch over other physical transport
enabling remote disaggregated memory
CXLSwitch
Host
CXL
Memory
GPU
CXLSwitch
Host
GPU
Host Host
CXL
Memory
Pool
GPU CXL
Memory
Pool
GPU CXL
Memory
Pool
48
CXL FOR CONFIDENTIAL COMPUTING
Vision for secure accelerated computing
 Confidential computing components will be
 Partitionable and assignable to Trusted
Execution Environment Virtual Machines (TVM)
 TVMs can create their own secure virtual
environments including
 Host resources
 Accelerator partition
 Shared memory partitions
 Data transfers encrypted and integrity
protected
 Components are securely authenticated
 Partitions are secure from accesses by
untrusted entities
 Other VMs/TVMs
 Firmware
 VMM
 All CXL capabilities are enabled in secure
domains
Memory Pool
Memory Pool
CXL
Confidential Compute Host
Confidential Compute GPU
TVM
GPU
Partition
TVM TVM
GPU
Partition
GPU
Partition
Memory
Partition
Memory
Partition
Memory
Partition
Memory
Partition
49
Transforming Cloud
Data Centers with CXL
Shalesh Thusoo
VP, CXL Product Development
July 2022
© 2022 Marvell. All rights reserved. 51
Cloud data center memory challenges
CXL is poised to address these issues
Bandwidth per core declining
Normalized
growth
rate
3.5
3
1.5
1
0.5
0
2
2.5
2012 2014 2016 2018 2020
CPU core count Memory channel BW per core
Source: Meta, OCP Summit Presentation Nov 2021
Degrades efficiency
No near-memory compute
DRAM DIMM
Limits performance
Increasing gap
Memory tied-down to xPUs
DRAM
DRAM
DPU
DRAM
DRAM
CPU
GPU
Cannot share
© 2022 Marvell. All rights reserved. 52
Cloud data center memory challenges
CXL accelerator
Bandwidth per core declining No near-memory compute
Memory tied-down to xPUs
CXL is poised to address these issues
CXL expander
CXL pooling
CXL expander
CXL pooling
CXL switch
© 2022 Marvell. All rights reserved. 53
CXL
Expander
CXL
Expander
Addressing memory expansion
 Scalable
 Pluggable
 Telemetry
 Improved thermals
 Mix-and-match DRAM
 Config flexibility
DRAM DIMM
 Limited scalability
 Not serviceable
 No telemetry
© 2022 Marvell. All rights reserved. 53
DIMM challenges CXL solution
CXL expander controller
CXL expander module
Standard form factors
© 2022 Marvell. All rights reserved. 54
CXL memory expanders improve performance
Same capacity with greater bandwidth and utilization
1DPC + CXL expanders
Today: 2 DIMMs per channel (2DPC)
xPU
128GB  256GB
1DPC same bandwidth as 2DPC
xPU
128GB
Use PCI express to open bandwidth
 256GB
© 2022 Marvell. All rights reserved. 55
Sharing memory with CXL
 Pool memory across multiple xPUs
 Rescue under-utilized DRAM
 Scale memory independent of xPUs
Direct
CXL memory pool
56 Core
xPU 0 xPU N
CXL
pooling
…
xPUs
CXL pooling
 Flexible to connect resources into fabric
 Scalable, serviceable
 Enables fully composable infrastructure
56 Core
xPU 0 xPU N
CXL
switch
…
Memory expanders Memory accelerators
xPUs
…
…
CXL
Expander
CXL
Accelerator
CXL switch
CXL
pooling
© 2022 Marvell. All rights reserved. 56
Accelerating with CXL
xPU
CXL
Accelerator
CXL accelerator
Compute engines
 Coherent, efficient
 Accelerate analytics, ML, search, etc.
 Improves efficiency and TCO
CXL I/O acceleration
 DPU/NIC, SSD, …
 Accelerate protocol processing
 Composable I/O devices
xPU
© 2022 Marvell. All rights reserved. 57
Bandwidth per core declining No near-memory compute
Memory tied-down to xPUs
CXL solves data center memory challenges
CXL is disrupting cloud data center architectures
More bandwidth per core
Optimize efficiency
xPU
Memory
Compute
Storage
Fully composable
Disaggregated memory
xPU
Ultimate performance
CXL
Accelerator
Near-memory computation
© 2022 Marvell. All rights reserved. 58
Comprehensive end-to-end CXL solutions
 Expanders
 Pooling
 Switch
 Accelerators
 Custom compute
 DPUs / SmartNICs
 Electro-optics
 Re-timers
 SSD controllers
Multi-billion $ opportunity
CXL opportunities
xPU 0 xPU N
…
Re-timer
DPU
CXL
Accelerator
CXL
Expander
CXL
Switch
…
↑ in the box
↓ out of box
Optics
CXL
Pooling
DPU
CXL
Expander
SSD
Cntrl
© 2022 Marvell. All rights reserved. 59
Summary
1 CXL is disrupting cloud data center architectures
2 Uniquely positioned to enable end-to-end CXL in data center
3 CXL is driving the next multi-billion-dollar opportunity
4 CXL memory pooling demo at FMS Marvell Booth #607
© 2022 Marvell. All rights reserved. 60
60
Memory pooling
demo chassis
Server
Intel Archer City
Sapphire Rapid Hosts
Intel Archer City
Sapphire Rapid Hosts
Memory Appliance
 Up to 6 Memory Devices
(3 installed)
 Up to 2 E3s Memory cards
Thank You
63
Jonathan Prout
Senior Manager, Memory
Product Planning
Samsung Electronics
Uksong Kang
Vice President,
DRAM Product Planning
SK Hynix
Ryan Baxter
Sr. Director Marketing
Micron
Expanding Beyond Limits
With CXL™-based Memory
August 2nd 2022
Jonathan Prout
Memory New Business Planning Team
Industry Trends and Challenges
CXL™ (Compute Express Link) Introduction
CXL™ Memory Use Cases
Samsung’s CXL™ -based Memory Expander and SMDK (Scalable Memory
Development Kit)
Agenda
Industry Trends and Challenges
Artificial
Intelligence
Big Data
Edge
Cloud
5G
Massive demand for data-centric
technologies and applications
Memory bandwidth and density not
keeping up with increasing CPU core
count
Need a next gen interconnect for
heterogeneous computing and
server disaggregation
Industry Trends and Challenges
Normalized
growth
rate
0
0.5
1
1.5
2
2.5
3
3.5
4
2012 2013 2014 2015 2016 2017 2018 2019 2020 2021
CPU core count Memory channel BW per core
New memory
scaling solution is
needed
CXL™ Introduction
CXL is a high performance, low
latency protocol that leverages
PCIe physical layer
CXL is an open industry standard with
broad industry support
Processor
PCle Connector
PCIe
channel
PCIe Card CXL Card Type 1 Type 2 Type 3
Processor Processor Processor
CXL CXL CXL
Usages Usages Usages
Protocols Protocols
• CXL.io
• CXL.cache
• CXL.io
• CXL.cache
• CXL.memory
• CXL.io
• CXL.memory
Accelerator
Accelerator
NIC
Memory Buffer
Cache Cache
• PGAS NIC
• NIC atomics
• GP GPU
• Dense computation
• Memory BW expansion
• Memory capacity expansion
• Storage class memory
DDR
DDR
DDR
DDR
DDR
DDR
HBM
HBM
Memory
Memory
Memory
Memory
Caching Devices /
Accelerators Accelerators with Memory Memory Buffers
Protocols
CXL™ Type 3 Device
Home Agent
DDR
DDR
Host/CPU
CXL Memory
Expander
CXL.io
CXL.mem
Device
Memory
Memory
Controller
Memory
Controller
CXL is a cache coherent standard, meaning the host and the CXL device
see the same data seamlessly
CXL™ Type 3 Device - Memory Expansion
CXL enables systems to significantly scale memory capacity and bandwidth
8x 2DPC
(DIMM/channels)
Max. 8TB for 1CPU
DDRx 512GB
DDRx 512GB
DDRx 512GB
DDRx 512GB
Max. 12TB for 1CPU
DDRx 512GB
DDRx 512GB
DDRx 512GB
DDRx 512GB
Mem Ex 1TB
Mem Ex 1TB
Mem Ex 1TB
Mem Ex 1TB
8x 2DPC
(DIMM/channels)
CPU
CPU
Current Use Cases: Capacity / Bandwidth Expansion
IMDB Server
xTB
DRAM
CPU 0
xTB
DRAM
CPU 1
IMDB Server
xTB
DRAM
CPU 0
xTB
DRAM
CPU 1
IMDB Server
yTB
DRAM
CPU 0
yTB
DRAM
CPU 1
ZTB
CXL
ZTB
CXL
IMC Server
xGB
DRAM
CPU 0
xGB
DRAM
CPU 1
IMC Server
xGB
DRAM
CPU 0
xGB
DRAM
CPU 1
IMC Server
yGB
DRAM
CPU 0
yGB
DRAM
CPU 1
zGB
CXL
zGB
CXL
IMC Server
xGB
DRAM
CPU 0
xGB
DRAM
CPU 1
IMC Server
yGB
DRAM
CPU 0
yGB
DRAM
CPU 1
zGB
CXL
zGB
CXL
Capacity Expansion - TCO reduction
Bandwidth Expansion – performance improvement
zGB
CXL
zGB
CXL
zGB
CXL
zGB
CXL
CXL™ Memory Switching & Pooling
CXL supports pooling for increased system efficiency
Host
CXL Switch
CXL
Memory
Expander
CXL
Memory
Expander
DDR
DDR
Host Host Host
DDR
DDR
CXL Switch
CXL
Memory
Expander
DDR
DDR
DDR
DDR
CXL
Memory
Expander
CXL
Memory
Expander
CXL
Memory
Expander
CXL supports switching to enable memory expansion
Future Use Cases: Tiering and Pooling
IMC Server
xTB
DRAM
CPU 0
xTB
DRAM
CPU 1
IMC Server
xTB
DRAM
CPU 0
xTB
DRAM
CPU 1
IMC Server
xTB
DRAM
CPU 0
xTB
DRAM
CPU 1
IMC Server
yTB
DRAM
CPU 0
yTB
DRAM
CPU 1
IMC Server
yTB
DRAM
CPU 0
yTB
DRAM
CPU 1
IMC Server
yTB
DRAM
CPU 0
yTB
DRAM
CPU 1
zTB
CXL
zTB
CXL
MEMORY BOX
zTB
CXL
zTB
CXL
Memory Tiering* – Efficient expansion
Memory Pooling - Increased utilization
*Hot data on DRAM
Warm data on cost-optimized, CXL-attached media
Samsung CXL™ Proof of Concept
Supporting ecosystem growth with
CXL-based memory functional sample
Form Factor – EDSFF (E3.S) / AIC
Media – DDR4
Module Capacity – 128 GB
CXL Link Width – x16
Specification: CXL 2.0
Product Features
Ecosystem enablement success
Shipped 100+ samples since availability in 3Q ‘21
Successfully tested with a broad range of server, system,
and software providers across the industry
Samsung CXL™ Solution
Leading the industry toward mainstream
adoption of CXL-based memory Form Factor – EDSFF (E3.S)
Media – DDR5
Module Capacity – 512 GB
CXL Link Width – x8
Maximum CXL Bandwidth – 32GB/s
Specification – CXL 2.0
Other Features – RAS, Interleaving, Diagnostics, and more
Availability – Q3’22 for evaluation/testing
Product Features
SMDK- Scalable Memory Development Kit
Datacenterto EdgeApplications(IMDB,DLRM,ML/AI,etc)
CXLKernel
Compatible
API
Intelligent Tiering Engine
Optimization
API
Memory Pool Mgmt
Normal ZONE CXL.Mem ZONE
CPU
DRAM
Server
Main
Board
Memory Expander
SMDK
CXL
Allocator
Application Benchmark Test
System #1
Redis
DDR5 (32GB)
CXLMem (64GB)
Client
Set
60GB
Get
60GB
System #1
Client
System #2
Ethernet
Redis
DDR5 (32GB)
Redis
DDR5 (32GB)
Redis
DDR5 (32GB)
Cluster
Set
60GB
Get
60GB
vs
Redis Single Node
(DDR+CXL)
Redis Cluster
(DDR x 3)
CXL Link
Test Scenario Test Result
30
455
699
27
496
659
49
172 186
66
173 189
128B 4KB 1MB 128B 4KB 1MB
Set GET
Scale-up vs Scale-out (Redis)
Single Node(DRAM + CXL) 2 Node Cluster(DRAM)
MB/s
Scale-up performance 2.7x better than scale-out (4KB
chunk size)
 Memory capacity and bandwidth per core is lagging industry demand
 Conventional scaling technologies unable to meet the challenge
 CXL is the most promising technology to address the gap
 Capacity/bandwidth expansion, tiering, pooling use cases
 Samsung is leading the advancement of CXL-based memory solutions
 PoC, ASIC-based module, and SMDK
 Tested PoC with a broad range of partners for more than 1 year
 Samsung enthusiastically welcomes further collaboration with the industry
 Visit Samsung booth to learn more about Samsung’s Memory Expander and SMDK
Key Takeaways
Uksong Kang
VP, DRAM Product Planning
August 2, 2022
Adding New Value to Memory
Subsystems through CXL
© SK hynix Inc.
CXL Creating New Gateway for Increased Efficiency
• CXL is opening up a new gateway toward efficient use of computing, acceleration and
memory resources resulting in overall TCO reduction in data centers
ASIC (NPU/DPU)
GPU FPGA
CPU
CXL Memory
Computing
Acceleration
Memory Pools
Servers in
Remote Rack
81/16
© SK hynix Inc.
New Values in Memory through CXL
• CXL creates many new additional opportunities in memory subsystems beyond what
is possible today in existing server platforms
#1: Memory bandwidth
and capacity expansion
#4: Memory-as-a-
Service (MaaS)
#2: Memory media
differentiation
#3: Controller
differentiation
82/16
© SK hynix Inc.
Memory Bandwidth and Capacity Gap
• Increase in SoC core counts requires continued increase in memory bandwidth and
capacity, but the gap between such requirements and platform provisioning
capability is growing
0
1
2
3
4
5
6
7
8
9
10
'20 '21 '22 '23 '24 '25 '26 '27 '28 '29 '30
Capacity
per
Socket
[A.U.]
CAP(General Purpose) CAP(Memory Intensive)
2 DIMM per CH 1 DIMM per CH
12CH per SKT
8 CH. per SKT 16 CH
(??)
0
1
2
3
4
5
6
7
8
9
10
'20 '21 '22 '23 '24 '25 '26 '27 '28 '29 '30
Bandwidth
per
Socket
[A.U.]
BW(Intel SKU DIMM 지원 예상)
12CH per SKT
8 CH. per SKT 16 CH
(??)
BW provisioning capability
BW requirement
256GB/DIMM, 12CH, 1DPC
128GB/DIMM, 12CH, 1DPC
64GB/DIMM, 8CH, 2DPC
64GB/DIMM, 8CH, 1DPC
64GB/DIMM, 12CH, 1DPC
6.4Gbps/IO, 12CH
6.4Gbps/IO, 8CH
9.6Gbps/IO, 12CH
11.2Gbps/IO, 12CH
Memory intensive WL
General purpose WL
Compute intensive WL
Year Year
Memory Capacity Requirement
Memory Bandwidth Requirement
Gap growing
83/16
© SK hynix Inc.
#1: Memory Bandwidth and Capacity Expansion
• CXL memories allow continued scale-out in memory bandwidth and capacity beyond
physical limitations of traditional server platforms
9
8
7
6
5
4
3
2
1
L/RDIMM
L/RDIMM
L/RDIMM
L/RDIMM
L/RDIMM
L/RDIMM
L/RDIMM
L/RDIMM
CPU with
6.4Gbps+
Local DDRx
L/RDIMM
L/RDIMM
L/RDIMM
L/RDIMM
L/RDIMM
L/RDIMM
L/RDIMM
L/RDIMM
CXL
Solution
CXL
Solution
CXL
Solution
CXL
Solution
# of channels
per CPU socket
does not scale-
out due to FF
limit
# of DIMMs per
channel
decreased
from 2 to 1 due
to I/O SI issues
Reaching max
DIMM power
and thermal
limit
PCIe-Gen5~6
Motivation for capacity expansion with
CXL
DDR5
x8ea
+
CXL
x1ea
DDR5(x8ea)
CXL-BME
CXL-BME
CXL-BME
CXL-BME
CPU
DDR5
x8ea
+
CXL
x2ea
DDR5
x8ea
+
CXL
x3ea
DDR5
x8ea
+
CXL
x4ea
1R1W
2R1W
1R1W
2R1W
1R1W
2R1W
1R1W
2R1W
4800
5600
6400
BW Expansion
Motivation for bandwidth expansion with
CXL
Memory
B/W [A.U]
84/16
© SK hynix Inc.
#2: Memory Media Differentiation
• CXL is a memory agnostic non-deterministic protocol allowing differentiation in memories fulfilling the
demands of various server workloads for the future
• Different memory media can provide different performance, capacity, and power design trade-offs
CPU
D
D
R
Standard or
Custom memory
media
.
..…
DDR I/F: Memory aware
and deterministic
CXL I/F: Memory agnostic and non-deterministic
(allows decoupling of memory media from CPU)
..…
D
D
R
C
X
L
..…
C
X
L
…
[1st Tier] [2nd Tier]
64b
64b
85/16
© SK hynix Inc.
#3: Controller Differentiation
• Enhanced RAS (ECC, PPR, ECS), security, lower power, computation, processing,
acceleration features can be included inside the CXL controller for added values
CPU
DIMM
DIMM
DIMM
DIMM
Memory
Media
CXL (PCIe)
CXL CMS
Engine
CXL
CTRL
JEDEC
Computational memory solution
Enhanced RAS
ECC
Low Power
Processing
PPR
Controller Computation
Acceleration
Security
Value-add features in CXL controller
86/16
© SK hynix Inc.
#4: Memory-as-a-Service (MaaS)
• CXL allows building composable scalable rack-scale memory pool appliances which
can be populated with different types of memory media
• Variable memory capacity can be effectively allocated within the memory pool to
different xPUs through memory virtualization
CXL Memory
CXL Fabric
Disaggregated
Memory Pool
Fabric
Networ
k
DRAM / SCM
DRAM / SCM
Disaggregated
Storage Pools Another
HPC
Server
CXL Switch
Accelerator Interconnect (e.g.,NVLink)
NAND / SCM
SSD
NAND / SCM
DDR
HBM
CP
U
DDR
HBM
CP
U
HBM
GP
U
HBM
NP
U
HBM
xP
U
DPU
CPU Interconnect
Ethernet (Service)
HPC Server
Memory pool appliance:
• Populate with different memory
media based on user’s choice
• Allocate variable memory
capacity through memory
virtualization
87/16
© SK hynix Inc.
SK hynix’s Future Paths in CXL for Increased Values
• Start with CXL memory market enabling, followed by market expansion, and value
addition
• First product is planned with standard DDR DRAMs followed by products with new
value-added features to be defined further through close collaboration with eco-
system partners and customers
STAGE 1:
Ecosystem Enabling
Evaluate pilot devices and launch
first CXL memory for TTM
Role
AIC riser card, EDSFF E3.S
Memor
y F/F
Standard DRAM:
DDR5, DDR4
Memor
y
Media
CXL2.0 on PCIe gen5 x8/x16
Host
I/F
STAGE 2:
Market Expansion
Introduce value-added CXL memory
for expansion & basic pooling
AIC riser card, EDSFF E3.S, E1.S
Optimized memory for better TCO
CXL2.0 on PCIe gen5 x8 or
CXL3.0 on PCIe gen6 x4
STAGE 3:
Value Addition
Expand value-add with stronger
memory RAS, security, near-
memory processing, fabric-attach
etc.
AIC riser card, EDSFF + many more
Optimized memory for better TCO
CXL3.0 on PCIe gen6 x4/x8
88/16
© SK hynix Inc.
SK hynix’s Vision toward Future CXL Memory Solutions
• Envisioning four different types of CXL memory solutions for different use cases
• First CXL memory will be bandwidth memory expansion based on DDR5 DRAM media
followed by capacity expansion, memory pooling, and computational memory
solution
CXL-CMS
(Computational Memory Solution)
CXL-CME
(Capacity Memory Expansion)
CXL-BME
(Bandwidth Memory Expansion)
DDR5 class BW & energy
alleviates loaded latency
Memory expansion w/o
tiering
Capacity low
Higher capacity, lower W/GB,
advanced RAS than DDR5
Memory expansion w/ tiering
Capacity mid/high
Optimized memory media
and module FF for pooling
Memory pooling
Capacity high
Near memory processing
for AI and data analytics
New value for heterogeneous
computing era
TBD
CXL-MPS
(Memory Pooling Solution)
89/16
© SK hynix Inc.
SK hynix’s First CXL Memory now Ready for Take-off
• CXL-BME is a 96GB bandwidth memory expansion module integrated with cost-
effective single-die packaged DRAMs
• DDR5-class bandwidth, DDR5-class latency within 1 NUMA hop, outperforming in
BW/$ and BW/power
EDSFF E3.S 2T Product:
- CXL2.0 on PCIe gen5 x8
- 96GB (2Rx4-like), 1CH 80-bit DDR5
- SDP x4 PKG with 24Gb DDR5 die
- 30GB/s+ random BW
DDR
5
SDP
CTRL
PMIC,
etc.
PMIC,
etc.
DDR
5
SDP
DDR
5
SDP
DDR
5
SDP
DDR
5
SDP
DDR
5
SDP
DDR
5
SDP
DDR
5
SDP
DDR
5
SDP
DDR
5
SDP
DDR
5
SDP
DDR
5
SDP
DDR
5
SDP
DDR
5
SDP
DDR
5
SDP
DDR
5
SDP
DDR
5
SDP
DDR
5
SDP
DDR
5
SDP
DDR
5
SDP
DDR
5
SDP
DDR
5
SDP
DDR
5
SDP
DDR
5
SDP
DDR
5
SDP
DDR
5
SDP
DDR
5
SDP
DDR
5
SDP
DDR
5
SDP
DDR
5
SDP
DDR
5
SDP
DDR
5
SDP
DDR
5
SDP
DDR
5
SDP
DDR
5
SDP
DDR
5
SDP
DDR
5
SDP
DDR
5
SDP
DDR
5
SDP
DDR
5
SDP
Front side Back side
90/16
© SK hynix Inc.
SK hynix’s First CXL Memory in E3.S Form Factor
91/16
• 96GB E3.S module with cost-
effective single-die packages
• Based on DDR5 24Gb DRAM
with most advanced 1anm
process technology
SK hynix Newsroom, 08/01/22
© SK hynix Inc.
HMSDK for Increased Memory Performance
• Performance improved with CXL memory expansion + HMSDK (SW solution) on
high-bandwidth workloads
• HMSDK supports memory use ratio configuration, optimizing page interleaving to
be more BW-aware
Performance of CXL-BME w/ system memory BW & Allocated Memory Size
HMSDK (Heterogeneous Memory Software Development Kit)
92/16
© SK hynix Inc.
CXL Memory System Demo with HMSDK
• HMSDK’s library stores data to both DRAM and CXL memory based on user-defined
ratio configuration
• Check out the SK hynix booth (#509) for the video demo presentation
• Also, at the booth check out the live demo on research studies regarding dynamic, elastic CXL memory
allocation
Advanced Research Project: Elastic CXL Memory Solution
HMSDK Example: 2:1 Ratio in DRAM:CXL Memory
93/16
© SK hynix Inc.
Building Strong CXL Eco-system with Industry Partners
• Close collaboration with all CXL eco-system partners across the entire system
hierarchy is essential for successful launch of future CXL products
• SK hynix is committed to be a key player in building such eco-system by delivering
differentiated value-added memory products to the industry
Software
HW Platform
Memory
Controller
xPU
Storage
IP
94/16
© SK hynix Inc.
Summary
• CXL is creating new values through memory bandwidth and capacity expansion,
memory differentiation, controller differentiation, and Memory-as-a-Service
• SK hynix is excited and committed to contribute to the entire CXL eco-system by
providing many efficient scalable CXL memory solutions with differentiated value-
added memory products from memory expansion to memory pooling, such as BME,
CME, MPS, and CMS
• SK hynix is pleased to be able to announce its first cost-effective 24Gb 1anm DDR5
based 96GB CXL memory in E3.S form factor, which is just the beginning toward
providing more valuable scalable memory solutions to the entire industry in the future
“Check out the SK hynix FMS demo on CXL memory with HMSDK, showcasing performance
improvements of CXL memory with optimized BW-aware SW solutions”
95/16
THANK YOU
© 2022 Micron Technology, Inc. All rights reserved. Information, products, and/or specifications are
subject to change without notice. Micron, the Micron logo, and all other Micron trademarks are the
property of Micron Technology, Inc. All other trademarks are the property of their respective owners.
97
Ryan Baxter– Senior Director of Marketing, Data Center
Flash Memory Summit | August, 2022
CXL: Enabling New Pliability
in the Modern Data Center
Micron Confidential
Micron Confidential
DDR
memory
HBM
memory
NAND
storage
Compute
optimized server
AI optimized
server
Memor
y
Storage
CY20 CY25 CY30
AI
servers
Other
servers
CY20 CY25
Data centers = memory centers
Micron’s global data
center market
Memory and storage growth will never be as slow as before, and possibly never as fast as now.
Hyperscale adoption of AI
2x
16%
CAGR
7x storage
6x memory
Drives memory &
storage growth
$180B
$140B
$100B
$60B
$20B
Sources: 1. Hyperscale AI Adoption: Internal Bain research
2. Server Content referencing two published AWS EC2 hardware configs:
AWS Instance types 3/1/22
Standard server Config = 256GB DRAM , 0GB HBM, 1.2TB SSD Storage
AI Server Config = 1152 DRAM+ 320GB HBM, 8TB SSD Storage
3. Global Data Center Market = Micron MI Market Model
Micron Confidential
Micron Confidential
Memory-centric innovations in the data center
99
99
Applying the power of software-defined infrastructure
Increasing
latency
and
capacity
Increasing
cost
and
bandwidth
Hot
data
Warm
Data
Cold
data
Capacity storage
Fast storage
CXL-attached memory
Archival storage
In-pkg
direct attach
Near memory
Far memory
Compute
Compute
Memory
Memory
Storage
Storage
Storage
Memory
Memory
Memory
Storage
Storage
Compute
Compute
Compute
Storage
Storage
Storage
Memory
Memory
Compute
Compute
Compute
Memory
Memory
Memory
Compute
Compute
Storage
Storage
Storage
Compute
Modular
Composable
Performance
Efficient
Micron Confidential
Micron Confidential
CXL Use Cases
100
Alternative to Stacking
Stacking drives non-linear cost/bit
Provide Ultra-High Capacity
Expansion beyond 4H 3DS TSV
Add Memory Bandwidth
CXL enables memory attached points
Balance Memory
Capacity/BW
 DRAM capacity/BW on demand
 Balances GB/core and BW
Reduce System
Complexity
 Fewer memory channels
 Thermally optimized solutions
Enablement After
2DPC
Future 50% slot reduction
Micron Confidential
Micron Confidential
101
Increasing
Latency
and
Capacity
Increasing
Cost
and
Bandwidth
Hot
Dat
a
Warm
Data
Col
d
Dat
a
Capacity Storage
Fast Storage
Memory Expansion
Bandwidth Memory
Archival Storage
HDD Storage
Near Memory
Cache
Memory
Ultra Wide Memory Bus
SATA/Ethernet
Ethernet
Traditional Memory Bus
SSD (QLC)
SSD (TLC)
CXL Attached Memory
The Industry’s
Fully Composable,
Scalable Vision
Memory and Storage Hierarchy
Memory
Expansion
Memory
Expansion
Memory
Expansion
Bandwidth
Memory
Bandwidth
Memory
Near
Memory
Near
Memory
Micron Confidential
Micron Confidential
Micron’s “data centric” portfolio
102
Compute Storage
Networking
Hyperscale Enterprise & gov Communication Edge
Acceleration
Deep customer relationships Ecosystem enablement
Silicon
technology
Emerging
memory
Advanced
packaging
Tech node
leadership
HBM GDDR LPDDR DDR TLC NAND QLC NAND
Standards body leadership
A complete portfolio built on silicon technology, world class manufacturing, and a diversified supply chain.
Micron Confidential
Micron Confidential
Micron is committed to partnering with the industry;
ultimately serving and delighting our customers
103
Strategic ecosystem
partnerships
 Define, develop and prove
technologies
 DDR, LP, & GDDR
 GPU Direct Storage
 Enable differentiated solutions
 Extend the ecosystem
Industry organizations
 Provide leadership in industry
organizations to enable scalable
advancement
104
Arvind Jagannath
Product Management
VMware
Charles Fan
CEO & Co-founder
MemVerge
Manoj Wadekar
SW-Defined Memory
Workstream Lead, OCP,
Storage Architect, Meta
Afternoon Agenda
106
Start End Name Title Organization
3:25 3:45 Arvind Jagannath Cloud Platform Product Management
3:45 4:05 Mahesh Wagh Senior Fellow
4:05 4:25 Charles Fan CEO & Co-founder
4:25 4:45 Manoj Wadekar
SW-Defined Memory Workstream Lead,
OCP, Storage Architect, Meta
4:45 5:10
Siamak Tavallaei
Panel Moderator
President, CXL Consortium, Chief System
Architect, Google Infrastructure
5:10 5:35
Chris Mellor
Panel Moderator
Editor
Session SPOS-102-1 on the FMS program
Virtual CXL presentations now available on the
MemVerge YouTube channel
107
Confidential │ © VMware, Inc. 108
Towards a CXL future with
VMware
Confidential │ © VMware, Inc. 109
• This presentation may contain product features or functionality that are currently under development.
• This overview of new technology represents no commitment from VMware to deliver these features in any generally
available product.
• Features are subject to change, and must not be included in contracts, purchase orders, or sales agreements of any
kind.
• Technical feasibility and market demand will affect final delivery.
• Pricing and packaging for any new features/functionality/technology discussed or presented, have not been
determined.
Disclaimer
Confidential │ © VMware, Inc. 110
VMware Competencies
SmartNICs and Accelerators
Virtualization ideal for transparent tiering Cluster-wide DRS helps load balance and mitigate risks Strong Ecosystem of partners
Passthrough devices GPUs, sharing and Assignable hardware
Confidential │ © VMware, Inc. 111
Digital Transformation of Businesses
Explosive growth in data
1 NetworkWorld. “IDC: Expect 175 zettabytes of data worldwide by 2025.” December 2018. networkworld.com/article/3325397/idc-expect-175-zettabytes-of-data-worldwide-by-2025.html.
2 IBM. “3D NAND Technology – Implications to Enterprise Storage Applications.” 2015 Flash Memory Summit. 2014. flashmemorysummit.com/English/Collaterals/Proceedings/2015/20150811_FM12_Yoon.pdf.
By 2025, IDC predicts 30% of global data will be real time!
Low Latency for
Mission-critical
Transactions
Need to Deliver
Business Value
in Real-time
Highly Parallel
Processing
on Very Large
Set Of Data
Deliver Risk and
Opportunity
for Future
175 ZB by
2025, With
26% CAGR1,2
Artificial
Intelligence
Business
Intelligence
Real-time
Analytics
Machine
Learning
Big Data
Analytics
Transactional
Processing
Predictive
Analytics
Edge
Processing
Time Series
Virtualization
Hybrid
Cloud
Confidential │ © VMware, Inc. 112
Trends
Explosive Growth of the
Data
Desire to get more out
of the data
More data need to be
processed in real-time
Software has led the
innovation in the cloud,
Hardware is catching up
Need to scale infrastructure to
address data growth
More in-memory computing to
process faster
Need Enterprise class monitoring
and remediation
DRAM is expensive and lacks
high densities
Customer Needs
Trends Vs Customer Needs
Digital
Transformation
Confidential │ © VMware, Inc. 113
VMware’s Big Memory Vision
starts with Software Tiering
Confidential │ © VMware, Inc. 114
vCenter
Tiered Memory
ESXi
Memory Hardware
DDR CXL attached/
Remote Memory/ Slower
Memory
CXL or RDMA
over Ethernet
NVMe Pooled
NVMe
Container
CRX
• Higher Density, more capacity
• Lower TCO
• Negligible Performance Degradation
• Transparent – Single volatile memory address
• No Guest or Application changes
• Run any Operating System
• ESX internally handles page placement
• DRS and vMotion to mitigate risks
• Tiering heuristics fed to DRS
• Ensures fairness across workloads
• Consistent performance
• Zero Configuration changes
• No special tiering settings
• Processor specific monitoring
• vMMR monitors at both VM- and Host-levels
Software Tiering
Confidential │ © VMware, Inc. 115
Phase-1: Host Local memory tiering
Software Tiering: How Does it Work?
ESXi Kernel
Memory
Management
Distributed
Resource
Scheduler
(DRS)
Scheduler
Hypervisor
ESXi Kernel
Memory
Management
DRS Scheduler
Hypervisor
Tiering aware
Tiering aware Tiering aware
Container
CRX
DDR Lower Cost Memory
Container
CRX
DDR
Memory Hardware Memory Hardware
Windows Linux
Windows Linux
1st tier 1st tier 2nd tier
Lower Cost Memory
Confidential │ © VMware, Inc. 116
Future
Software Tiering: How Does it Work?
ESXi Kernel
Host Memory
Management
Cluster-level DRS Scheduler
Hypervisor
Tiering aware
Tiering aware Tiering aware
Container
CRX
DDR CXL-attached DRAM CXL-attached PMem
Pooled NVMe
NVMe
Lower Cost
Memory
Confidential │ © VMware, Inc. 117
Future
Various Tiering Approaches
DRAM
Lower cost/slower memory
1
DRAM
NVMe
2
DRAM
Lower cost/slower memory
3
DRAM
Remote Memory/Host sharing
4
NVMe
DRAM
CXL-attached device/Pool
5
DRAM
6
NVMe-OF
ESX Kernel
Confidential │ © VMware, Inc. 118
Future
Software Tiering with CXL 2.0
ESX Host
Single
Uniform
Memory
Address
Space
DRAM
CXL/NVMeOF/Shared
DRAM/ *
Remote memory/
Slower Memory
CXL
CXL Switch
X Shared and
Pooled Memory
*Prototyping with CXL1.1
Confidential │ © VMware, Inc. 119
How it all fits together?
Managed part of end-to-end vSphere workflow!
Host
VM 1
VM 2
Tier
2
Tier
1
VM
pages
Tier 2
Tier 1
Host 1 Host 2 Host 3
Cluster
VMs
DRS
Tier
sizing
Page
placement
choose
host for
VM
choose
size for
tier
choose
tier for
page
VM
monitor
Host
monitor
DRS
monitor
Administrator
monitor
tier
bandwidth
of VM
monitor
tier
bandwidth
of host
120
Mahesh Wagh
Senior Fellow
| AMD | Data Center Group| 2022
[Public]
AGENDA
◢ Paradigm Shift and Memory Composability Progression
◢ Runtime Memory Management
◢ Tiered Memory
◢ NUMA domains and Page Migration
◢ Runtime Memory Pooling
| AMD | Data Center Group| 2022
[Public]
PARADIGM SHIFT
◢ Scalable, high-speed CXL™ Interconnect and
PIM (Processing in Memory) contribute to the
paradigm shift in memory intensive computations
◢ Efficiency Boost of the next generation data
center
◢ Management of the Host/Accelerator
subsystems combined with the terabytes of the
Fabric Attached Memory
◢ Reduced complexity of the SW stack combined
with direct access to multiple memory
technologies
| AMD | Data Center Group| 2022
[Public]
MEMORY COMPOSABILITY PROGRESSION
Host R
P
Buffer
Host R
P
End
Point
View
Mem Direct
Attach
Memory Scale-
Out
Mem Pooling & Disaggregation
• Addresses the cost and
underutilization of the memory
• Multi-domain Pooled Memory -
memory in the pool is allocated/
released when required
• Workloads/ applications
benefiting from memory capacity
• Design optimization for {BW/$,
Memory Capacity/$, BW/core}
| AMD | Data Center Group| 2022
[Public]
RUNTIME MEMORY MANAGEMENT
| AMD | Data Center Group| 2022
[Public]
TIERED MEMORY
NUMA Domains
Page Migration
| AMD | Data Center Group| 2022
[Public]
TIERED MEMORY
NUMA DOMAINS
• Exposed to the HV, Guest OS, Apps
• OS-assisted optimization of the
memory subsystem
• Base on ACPI objects -
SRAT/SLIT/HMAT
| AMD | Data Center Group| 2022
[Public]
TIERED MEMORY
PAGE MIGRATION
CCD CCD CCD CCD
IOD
CCD CCD CCD CCD
IOD
Near
Mem
Far Mem
NUMA domains
PROC
CXL mem
Far mem
CXL
CXL
Far Mem
Near Mem
Memory Expansion
PROC
Far Mem
Near Mem
Mem as a Cache
CCD CCD CCD CCD
IOD
CCD CCD CCD CCD
IOD
Near Mem
Far Mem CXL mem
Far Mem
CXL
CXL
Near Mem
NUMA domains
MISS
Shorter latency Longer latency
Near
Mem
‒ Active page migration between Far and Near memories
‒ HV/Guest migrates hot pages into Near Mem and retire cold
pages into Far Mem
‒ Focused DMA to transfer required datasets from the Far to
Near Mem
SW Assisted Page Migration
‒ HW managed Hot Dataset
‒ Near Mem Miss redirected to the Far Mem
‒ App/ HV unawareness
DRAM as a cache optimization
| AMD | Data Center Group| 2022
[Public]
TIERED MEMORY
SW ASSISTED PAGE MIGRATION
Combined HW /SW tracking of the
Memory Page Activity/ “hotness”
Detecting Page(s) candidates for
migration
Requesting HV/Guest permission to
migrate
HV/Guest API to Security Processor to
Migrate the Page(s)
Migration – stalling accessed to specific
pages/ copying the data
Page “hotness” –combined action
of the HW and SW tracking
HV/Guest authorization of the
migration
Security Processor as a root of
trust for performing the migration
| AMD | Data Center Group| 2022
[Public]
RUNTIME MEMORY ALLOCATION/POOLING
FABRIC ATTACHED MEMORY
Host Host
Tier2 Mem
Multi-Headed CXL
controller
 Multiple structures serve for fabric level memory pooling
 Combination of the private (dedicated to specific host) and shareable memory ranges
 Protection of the memory regions from unauthorized guests and hypervisor
 Allocation/Pooling of the memory ranges between Hosts is regulated by the fabric
aware SW layer (i.e., Fabric Manager)
| AMD | Data Center Group| 2022
[Public]
RUNTIME MEMORY ALLOCATION/POOLING
FABRIC ATTACHED MEMORY
 Memory Allocation Layer – communicates
<new memory allocation per Host> based
on the system/apps needs
 Fabric Manager – adjusts the fabric
settings and communicates new memory
allocations to the Host SW
 Host SW - Invokes Hot Add/Hot Removal
method to increase/ reduce (or offline) an
amount of memory allocated to the Host
 In some instances, Host SW can directly
invoke SP to adjust the memory size allocated
to the Host
 On–die Security Processor (Root of Trust)
is involved in securing an exclusive access
to the memory range
| AMD | Data Center Group| 2022
[Public]
SUMMARY
Composable Disaggregated Memory is the key approach to address
the cost and underutilization of the System Memory
Further investment in the Runtime Management of the Composable &
Multi-Type memory structures is required to maximize the system level
performance across multiple use-cases
Application Transparency is another goal of efficient Runtime
Management by abstracting away an underlying fabric/memory
infrastructure
CXL: The Dawn of Big Memory
Charles Fan
Co-founder & CEO
MemVerge
The Rise of Modern Data-Centric Applications
135
EDA Simulation
AI/ML Video Rendering
Geophysical
Genomics Risk Analysis
CFD
Financial Analytics
Opening the Door to the
Era of Big Memory
136
Abundant
Composable
Available
What happened 30+ years ago
137
Just Bunch
of Disks
Storage Area
Network
(SAN)
Advanced
Storage Services
Fiber Channel Storage Data Services
Where We Are Going…
138
Just Bunch
of Disks
New
Memory
Storage Area
Network (SAN)
Pooled Memory
Advanced
Storage Services
Memory-as-
a-Service
Fiber Channel Storage Data Services
CXL Memory Data Services
Disaggregated & Pooled Memory
Memory Pool
Computing Servers
Pool Manager
CXL Switch
Dynamic Memory Expansion
Reduces Stranded Memory
Before CXL
Use Case #1
Used Memory Memory not used
* H. Li et. Al. First-generation Memory Disaggregation for Cloud Platforms.
arXiv:2203.00241v2 [cs.OS], March 5, 2022
Azure Paper*:
• Up to 50% of server costs is from DRAM alone
• Up to 25% of memory is stranded
• 50% of all VMs never touch 50% of their rented memory
Dynamic Memory Expansion
Reduces Stranded Memory
After CXL
Used Memory Memory not used
Use Case #1
Memory disaggregation can save billions of dollars per year.
Memory Auto-healing
With Transparent Migration
2. Provision a new memory
region from the pool
1. A memory module is becoming bad:
error rate going up.
3. Transparent
migration of
memory data
4. Memory Auto-healing
complete
Use Case #2
Distributed Data Shuffling
Local
SSD
Use Case #3
Before CXL
Local
SSD
Local
SSD
Network Network
Storage I/O w/
Serialization
Deserialization
Using Shared Memory Read
Use Case #3
After CXL
S. Chen, et. Al. Optimizing Performance and Computing Resource Management of
in-memory Big Data Analytics with Disaggregated Persistent Memory. CCGRID'19
Project Splash is open source: https://github.com/MemVerge/splash
Key Software Components
145
Memory
Snapshot
Memory
Tiering
Resource
management
Transparent Memory Service
Operating Systems
App App App App
CXL Switch
CXL
Computing Hosts Memory Pool
Memory Provisioning &
Sharing
Capacity Optimization
Global Insights
Security
Data
Protection
Memory Machine Pool Manager
Operating System
Pool Server
Memory
Viewer
App profiler
Hardware API Integration
Memory
Sharing
Key Software Components
146
Memory
Snapshot
Memory
Tiering
Resource
management
Transparent Memory Service
Operating Systems
App App App App
CXL Switch
CXL
Computing Hosts Memory Pool
Memory Provisioning &
Sharing
Capacity Optimization
Global Insights
Security
Data
Protection
Memory Machine Pool Manager
Operating System
Pool Server
Memory
Viewer
App profiler
Hardware API Integration
Memory
Sharing
14
7
Memory Capacity Expansion
• Software-defined Memory Pool
with intelligent Auto-tiering
• No application change required
Accelerate Time-to-discovery
• Transparent checkpointing
• Roll-back, restore and clone
anywhere any time
Reduce Cloud Cost by up to 70%
• Enable long-running applications
to use low-cost Spot instances
• Integration with cloud automation
and scheduler to auto-recover
from CSP preemptions
Memory Machine™
Memory Snapshot Service Memory Tiering Service System & Cloud
Orchestration Service
Transparent Memory Service
Linux
Compute Memory Storage
HBM
DDR
CXL
Genomics EDA Geophysics Risk Analysis Video Rendering Others
CPU GPU xPU
SSD
HDD
Announcing Memory Machine Cloud Edition
14
8
Memory Machine™
Memory Snapshot Service Memory Tiering Service System & Cloud
Orchestration Service
Transparent Memory Service
Linux
64GB of DDR5 DRAM 64GB of CXL DRAM Expander Card
(Montage Technologies)
MLC
Memory Latency Checking
Early Results Running Memory Machine on CXL
Next-Gen Server
Streams
Microbenchmark
Application
Redis
Early Results Running Memory Machine on CXL
149
0
5
10
15
20
25
30
35
40
45
ALL Reads 3:1 Reads-
Writes
2:1 Reads-
Writes
1:1 Reads-
Writes
Stream-triad
like
Throughput
(GB/S)
Workload Types
MLC (Memory Latency Checker) Results
DDR5 Only CXL Only DDR+CXL Memory Machine Auto-Tiering
0
5
10
15
20
25
Copy: Scale: Add: Triad:
Throughput
(GB/s)
Workload Types
Stream Results
DDR5 Only CXL Only DDR+CXL Memory Machine Auto-Tiering
Live Demos at MemVerge Booth
150
Key Software Components
151
Memory
Snapshot
Memory
Tiering
Resource
management
Transparent Memory Service
Linux
App App App App
CXL Switch
CXL
Computing Hosts Memory Pool
Memory Provisioning &
Sharing
Capacity Optimization
Global Insights
Security
Data
Protection
Memory Machine Pool Manager
Linux
Pool Server
Memory
Viewer
App profiler
Hardware API Integration
152
Announcing MemoryViewer
153
Application Memory Heatmap
Memory Viewer Free Download:
http://www.memverge.com/MemoryViewer
Software Partner to the CXL Ecosystem
154
Founded in 2017 to develop Big Memory software
Memory
Snapshot
Memory
Tiering
Resource
management
Transparent Memory Service
Memory Provisioning & Sharing Capacity Optimization
Global Insights
Security
Data Protection
Memory Machine Pool Manager
Memory
Viewer
App profiler
Hardware API Integration
Memory
Sharing
Big Memory Software
Processors:
Servers:
Switches:
Memory
Systems:
Clouds
Big Memory Apps
Standards Bodies
Memorize the future.
Please visit our booth for the live demos
Enabling Software
Defined Memory
Real World Use
Cases with CXL
156
Manoj Wadekar
Hardware System Technologist
Agenda
• SDM Workstream within OCP
• Hyperscale Infra - Needs
• Memory Hierarchy to address the needs
• SDM Use cases
• SDM Activities and Status
157
SDM Team Charter
- SDM (Software Defined Memory) is a workstream within
Future Technology Initiatives within OCP
Charter:
- Identify key applications driving adoption of Hierarchical/ Hybrid
memory solutions
- Establish architecture and nomenclature for such Systems
- Offer benchmarks that enable validation of novel ideas for
HW/SW solutions for such systems
Hyperscale Infrastructure
159
• Application performance and growth depends on
⎻ DC, System, Component performance and growth
⎻ Compute, Memory, Storage, Network..
• Focusing on Memory discussion
Ads
FE
Web
Database/
Cache
Inference
Ads
Data
Wearhouse
Data
Wearhouse
Database/
Cache
Storage
Training
Memory Challenges
160
Bandwidth and Capacity
• The Gap between bandwidth
and capacity is widening
• Applications ready to trade
between bandwidth and
capacity
Power
• DIMMs consume significant
share of rack power
⎻ DDR5 exacerbates this
• Applications co-design to
achieve higher capacity at
optimized power
TCO
• Cost impact of min
capacity increase and
Die/ECC overheads
• Applications can trade
performance/capacity to
achieve optimal TCO
“Memory” Pyramid today
161
Capacity driven
Bandwidth driven
DRAM
NAND SSD
Cache
HBM
Databases,
Caching..
GP Compute,
Training..
Inference,
Caching..
Use Case Examples
162
• Caching (e.g. Memcache/Memtier (Cachelib), Redis etc.)
⎻ Need to achieve higher QPS while satisfying “retention time”
⎻ Higher memory capacity needed
⎻ Current solutions include ”tiered memory” with DRAM+NAND, but need load/store
• Databases (E.g. RocksDB/MongoDB etc.)
⎻ Need to achieve efficient storage capacity per node and deliver QPS SLA
⎻ Higher amount of memory enables more storage per node
• Inference (E.g. DLRM)
⎻ Petaflops and Number of parameters are increasing rapidly
⎻ AI Models are scaling faster than the underlying memory technology
⎻ Current solutions include ”tiered memory” with DRAM+NAND, but need load/store
AI at Meta
● Across many applications/services and at scale → driving a portion of
our overall infrastructure (both HW and SW)
● From data centers to the edge
Keypoint
Segmentation
Augmented Reality
with Smart Camera
● Compute, Memory BW, Memory Capacity, all scale for frontier models
○ Scaling typically is faster than scaling of technology
● The rapid scaling requires more vertical integration from SW requirements to HW
design
Problem Statement: AI workloads scale
rapidly
DLRM Memory Requirements
● Bandwidth
1. Considerable portion of capacity needs high BW
Accelerator memory.
2. Inference has a bigger portion of the capacity at
low Bandwidth. More so than training.
● Latency
3. Inference has a tight latency requirement, even on
the low BW end
System Implications of DLRM Requirements
● A tier of memory beyond HBM and DRAM can be
leveraged, particularly for inference
○ Higher latency than main memory. But still tight
latency profile (e.g TLC Nand Flash does not
work)
○ Trade off perf for density
○ This does not negate the Capacity and BW
demand for HBM and DRAM
“Tiered Memory” Pyramid with CXL
167
Capacity driven
Bandwidth driven
Databases,
Caching..
GP Compute,
Training..
Inference,
Caching..
BW Memory
NAND SSD
Capacity
Memory
DRAM
Cache
CXL Attached
HBM
• Load/store interface
• Cache line read/writes
• Scalable
• Heterogeneous
• Standard interfaces
Memory Technologies
168
Compute Storage Training Inference
DDR4
DDR5
HBM
CXL+DDR
SCM
(PCIe/CXL)
[Exploration
Phase]
169
OCP SDM activity and progress
• SDM’s focus: Apply emerging memory technologies in the development of use cases
• The OCP SDM group has three real-world memory focus areas:
⎻ Databases/Caching
⎻ AI/ML & HPC
⎻ Virtualized Servers
• SDM Team Members: AMD, ARM, Intel, Meta, Micron, Microsoft, Omdia, Samsung,
VMWare,
• Vendors are demonstrating CXL Capable CPUs and devices
• Meta and others are investigating solutions to real world memory problems
SDM – Enabling memory solutions from emerging memory
technologies
170
Ben Bolles
Executive Director,
Product Management
Liqid
Gerry Fan
Founder, CEO
Xconn Technologies
Siamak Tavallaei
Panel Moderator
President
CXL Consortium
George Apostol
CEO
Elastics.cloud
Christopher Cox
VP Technology
Montage
Composable Memory Panel
171
Chris Mellor
Editor
Blocks and Files
Manoj Wadekar
SW-Defined Memory
Workstream Lead, OCP,
Storage Architect, Meta
Richard Solomon
Tech Mktg Mgr., PCIe/CXL
Synopsys
Bernie Wu
VP Strategic Alliances
MemVerge
James Cuff
Distinguished Engineer
Harvard University (retired)
Industry Expert, HPC & AI
Industr
y
Expert
Big Memory App Panel
CXL™: Ready for Take-Off

More Related Content

What's hot

Enfabrica - Bridging the Network and Memory Worlds
Enfabrica - Bridging the Network and Memory WorldsEnfabrica - Bridging the Network and Memory Worlds
Enfabrica - Bridging the Network and Memory WorldsMemory Fabric Forum
 
SMART Modular: Memory Solutions with CXL
SMART Modular: Memory Solutions with CXLSMART Modular: Memory Solutions with CXL
SMART Modular: Memory Solutions with CXLMemory Fabric Forum
 
The State of CXL-related Activities within OCP
The State of CXL-related Activities within OCPThe State of CXL-related Activities within OCP
The State of CXL-related Activities within OCPMemory Fabric Forum
 
MemVerge: The Software Stack for CXL Environments
MemVerge: The Software Stack for CXL EnvironmentsMemVerge: The Software Stack for CXL Environments
MemVerge: The Software Stack for CXL EnvironmentsMemory Fabric Forum
 
Micron CXL product and architecture update
Micron CXL product and architecture updateMicron CXL product and architecture update
Micron CXL product and architecture updateMemory Fabric Forum
 
CXL Forum at ISC 23 - Speaker Invitation.pdf
CXL Forum at ISC 23 - Speaker Invitation.pdfCXL Forum at ISC 23 - Speaker Invitation.pdf
CXL Forum at ISC 23 - Speaker Invitation.pdfIT Brand Pulse
 
SK hynix CXL Disaggregated Memory Solution
SK hynix CXL Disaggregated Memory SolutionSK hynix CXL Disaggregated Memory Solution
SK hynix CXL Disaggregated Memory SolutionMemory Fabric Forum
 
Evaluating UCIe based multi-die SoC to meet timing and power
Evaluating UCIe based multi-die SoC to meet timing and power Evaluating UCIe based multi-die SoC to meet timing and power
Evaluating UCIe based multi-die SoC to meet timing and power Deepak Shankar
 
Lightelligence: Optical CXL Interconnect for Large Scale Memory Pooling
Lightelligence: Optical CXL Interconnect for Large Scale Memory PoolingLightelligence: Optical CXL Interconnect for Large Scale Memory Pooling
Lightelligence: Optical CXL Interconnect for Large Scale Memory PoolingMemory Fabric Forum
 
Shared Memory Centric Computing with CXL & OMI
Shared Memory Centric Computing with CXL & OMIShared Memory Centric Computing with CXL & OMI
Shared Memory Centric Computing with CXL & OMIAllan Cantle
 
If AMD Adopted OMI in their EPYC Architecture
If AMD Adopted OMI in their EPYC ArchitectureIf AMD Adopted OMI in their EPYC Architecture
If AMD Adopted OMI in their EPYC ArchitectureAllan Cantle
 
The Tofu Interconnect D for the Post K Supercomputer
The Tofu Interconnect D for the Post K SupercomputerThe Tofu Interconnect D for the Post K Supercomputer
The Tofu Interconnect D for the Post K Supercomputerinside-BigData.com
 
Q1 Memory Fabric Forum: Building Fast and Secure Chips with CXL IP
Q1 Memory Fabric Forum: Building Fast and Secure Chips with CXL IPQ1 Memory Fabric Forum: Building Fast and Secure Chips with CXL IP
Q1 Memory Fabric Forum: Building Fast and Secure Chips with CXL IPMemory Fabric Forum
 
AMD Chiplet Architecture for High-Performance Server and Desktop Products
AMD Chiplet Architecture for High-Performance Server and Desktop ProductsAMD Chiplet Architecture for High-Performance Server and Desktop Products
AMD Chiplet Architecture for High-Performance Server and Desktop ProductsAMD
 
Molex and Nvidia - Partnership to enable copper for the next generation artif...
Molex and Nvidia - Partnership to enable copper for the next generation artif...Molex and Nvidia - Partnership to enable copper for the next generation artif...
Molex and Nvidia - Partnership to enable copper for the next generation artif...Memory Fabric Forum
 
OMI - The Missing Piece of a Modular, Flexible and Composable Computing World
OMI - The Missing Piece of a Modular, Flexible and Composable Computing WorldOMI - The Missing Piece of a Modular, Flexible and Composable Computing World
OMI - The Missing Piece of a Modular, Flexible and Composable Computing WorldAllan Cantle
 

What's hot (20)

CXL at OCP
CXL at OCPCXL at OCP
CXL at OCP
 
Enfabrica - Bridging the Network and Memory Worlds
Enfabrica - Bridging the Network and Memory WorldsEnfabrica - Bridging the Network and Memory Worlds
Enfabrica - Bridging the Network and Memory Worlds
 
SMART Modular: Memory Solutions with CXL
SMART Modular: Memory Solutions with CXLSMART Modular: Memory Solutions with CXL
SMART Modular: Memory Solutions with CXL
 
The State of CXL-related Activities within OCP
The State of CXL-related Activities within OCPThe State of CXL-related Activities within OCP
The State of CXL-related Activities within OCP
 
MemVerge: The Software Stack for CXL Environments
MemVerge: The Software Stack for CXL EnvironmentsMemVerge: The Software Stack for CXL Environments
MemVerge: The Software Stack for CXL Environments
 
Past Present and Future of CXL
Past Present and Future of CXLPast Present and Future of CXL
Past Present and Future of CXL
 
Micron CXL product and architecture update
Micron CXL product and architecture updateMicron CXL product and architecture update
Micron CXL product and architecture update
 
Liqid: Composable CXL Preview
Liqid: Composable CXL PreviewLiqid: Composable CXL Preview
Liqid: Composable CXL Preview
 
CXL Forum at ISC 23 - Speaker Invitation.pdf
CXL Forum at ISC 23 - Speaker Invitation.pdfCXL Forum at ISC 23 - Speaker Invitation.pdf
CXL Forum at ISC 23 - Speaker Invitation.pdf
 
Introduction to CXL Fabrics
Introduction to CXL FabricsIntroduction to CXL Fabrics
Introduction to CXL Fabrics
 
SK hynix CXL Disaggregated Memory Solution
SK hynix CXL Disaggregated Memory SolutionSK hynix CXL Disaggregated Memory Solution
SK hynix CXL Disaggregated Memory Solution
 
Evaluating UCIe based multi-die SoC to meet timing and power
Evaluating UCIe based multi-die SoC to meet timing and power Evaluating UCIe based multi-die SoC to meet timing and power
Evaluating UCIe based multi-die SoC to meet timing and power
 
Lightelligence: Optical CXL Interconnect for Large Scale Memory Pooling
Lightelligence: Optical CXL Interconnect for Large Scale Memory PoolingLightelligence: Optical CXL Interconnect for Large Scale Memory Pooling
Lightelligence: Optical CXL Interconnect for Large Scale Memory Pooling
 
Shared Memory Centric Computing with CXL & OMI
Shared Memory Centric Computing with CXL & OMIShared Memory Centric Computing with CXL & OMI
Shared Memory Centric Computing with CXL & OMI
 
If AMD Adopted OMI in their EPYC Architecture
If AMD Adopted OMI in their EPYC ArchitectureIf AMD Adopted OMI in their EPYC Architecture
If AMD Adopted OMI in their EPYC Architecture
 
The Tofu Interconnect D for the Post K Supercomputer
The Tofu Interconnect D for the Post K SupercomputerThe Tofu Interconnect D for the Post K Supercomputer
The Tofu Interconnect D for the Post K Supercomputer
 
Q1 Memory Fabric Forum: Building Fast and Secure Chips with CXL IP
Q1 Memory Fabric Forum: Building Fast and Secure Chips with CXL IPQ1 Memory Fabric Forum: Building Fast and Secure Chips with CXL IP
Q1 Memory Fabric Forum: Building Fast and Secure Chips with CXL IP
 
AMD Chiplet Architecture for High-Performance Server and Desktop Products
AMD Chiplet Architecture for High-Performance Server and Desktop ProductsAMD Chiplet Architecture for High-Performance Server and Desktop Products
AMD Chiplet Architecture for High-Performance Server and Desktop Products
 
Molex and Nvidia - Partnership to enable copper for the next generation artif...
Molex and Nvidia - Partnership to enable copper for the next generation artif...Molex and Nvidia - Partnership to enable copper for the next generation artif...
Molex and Nvidia - Partnership to enable copper for the next generation artif...
 
OMI - The Missing Piece of a Modular, Flexible and Composable Computing World
OMI - The Missing Piece of a Modular, Flexible and Composable Computing WorldOMI - The Missing Piece of a Modular, Flexible and Composable Computing World
OMI - The Missing Piece of a Modular, Flexible and Composable Computing World
 

Similar to CXLTM: Getting Ready for Take-Off - SEO-Optimized Title for CXL Forum Document

Intel: Industry Enablement of IO Technologies
Intel: Industry Enablement of IO TechnologiesIntel: Industry Enablement of IO Technologies
Intel: Industry Enablement of IO TechnologiesMemory Fabric Forum
 
Q1 Memory Fabric Forum: Memory Processor Interface 2023, Focus on CXL
Q1 Memory Fabric Forum: Memory Processor Interface 2023, Focus on CXLQ1 Memory Fabric Forum: Memory Processor Interface 2023, Focus on CXL
Q1 Memory Fabric Forum: Memory Processor Interface 2023, Focus on CXLMemory Fabric Forum
 
The Containers Ecosystem, the OpenStack Magnum Project, the Open Container In...
The Containers Ecosystem, the OpenStack Magnum Project, the Open Container In...The Containers Ecosystem, the OpenStack Magnum Project, the Open Container In...
The Containers Ecosystem, the OpenStack Magnum Project, the Open Container In...Daniel Krook
 
Q1 Memory Fabric Forum: Compute Express Link (CXL) 3.1 Update
Q1 Memory Fabric Forum: Compute Express Link (CXL) 3.1 UpdateQ1 Memory Fabric Forum: Compute Express Link (CXL) 3.1 Update
Q1 Memory Fabric Forum: Compute Express Link (CXL) 3.1 UpdateMemory Fabric Forum
 
Webinar: Dealing with automation tool overload!
Webinar: Dealing with automation tool overload!Webinar: Dealing with automation tool overload!
Webinar: Dealing with automation tool overload!Cloudify Community
 
Optimize Content Delivery with Multi-Access Edge Computing
Optimize Content Delivery with Multi-Access Edge ComputingOptimize Content Delivery with Multi-Access Edge Computing
Optimize Content Delivery with Multi-Access Edge ComputingRebekah Rodriguez
 
Cisco Connect Halifax 2018 Cisco Spark hybrid services architectural design
Cisco Connect Halifax 2018   Cisco Spark hybrid services architectural designCisco Connect Halifax 2018   Cisco Spark hybrid services architectural design
Cisco Connect Halifax 2018 Cisco Spark hybrid services architectural designCisco Canada
 
Cisco connect montreal 2018 collaboration les services webex hybrides
Cisco connect montreal 2018 collaboration les services webex hybridesCisco connect montreal 2018 collaboration les services webex hybrides
Cisco connect montreal 2018 collaboration les services webex hybridesCisco Canada
 
Supermicro AI Pod that’s Super Simple, Super Scalable, and Super Affordable
Supermicro AI Pod that’s Super Simple, Super Scalable, and Super AffordableSupermicro AI Pod that’s Super Simple, Super Scalable, and Super Affordable
Supermicro AI Pod that’s Super Simple, Super Scalable, and Super AffordableRebekah Rodriguez
 
Citi Tech Talk: Hybrid Cloud
Citi Tech Talk: Hybrid CloudCiti Tech Talk: Hybrid Cloud
Citi Tech Talk: Hybrid Cloudconfluent
 
CPaaS.io Y1 Review Meeting - Cloud & Edge Programming
CPaaS.io Y1 Review Meeting - Cloud & Edge ProgrammingCPaaS.io Y1 Review Meeting - Cloud & Edge Programming
CPaaS.io Y1 Review Meeting - Cloud & Edge ProgrammingStephan Haller
 
MinIO January 2020 Briefing
MinIO January 2020 BriefingMinIO January 2020 Briefing
MinIO January 2020 BriefingJonathan Symonds
 
Necos keynote UFRN Telecomday
Necos keynote UFRN TelecomdayNecos keynote UFRN Telecomday
Necos keynote UFRN TelecomdayAugusto Neto
 
Xilinx Data Center Strategy and CCIX
Xilinx Data Center Strategy and CCIXXilinx Data Center Strategy and CCIX
Xilinx Data Center Strategy and CCIXYoshihiro Horie
 
Synopsys: Achieve First Pass Silicon Success with Synopsys CXL IP Solutions
Synopsys: Achieve First Pass Silicon Success with Synopsys CXL IP SolutionsSynopsys: Achieve First Pass Silicon Success with Synopsys CXL IP Solutions
Synopsys: Achieve First Pass Silicon Success with Synopsys CXL IP SolutionsMemory Fabric Forum
 
Container Ecosystem and Docker Technology
Container Ecosystem and Docker TechnologyContainer Ecosystem and Docker Technology
Container Ecosystem and Docker Technologyijtsrd
 
“Khronos Group Standards: Powering the Future of Embedded Vision,” a Presenta...
“Khronos Group Standards: Powering the Future of Embedded Vision,” a Presenta...“Khronos Group Standards: Powering the Future of Embedded Vision,” a Presenta...
“Khronos Group Standards: Powering the Future of Embedded Vision,” a Presenta...Edge AI and Vision Alliance
 
Red Hat multi-cluster management & what's new in OpenShift
Red Hat multi-cluster management & what's new in OpenShiftRed Hat multi-cluster management & what's new in OpenShift
Red Hat multi-cluster management & what's new in OpenShiftKangaroot
 
Cloudviews eurocloud rcosta
Cloudviews eurocloud rcostaCloudviews eurocloud rcosta
Cloudviews eurocloud rcostaEuroCloud
 

Similar to CXLTM: Getting Ready for Take-Off - SEO-Optimized Title for CXL Forum Document (20)

CXL Consortium Update
CXL Consortium UpdateCXL Consortium Update
CXL Consortium Update
 
Intel: Industry Enablement of IO Technologies
Intel: Industry Enablement of IO TechnologiesIntel: Industry Enablement of IO Technologies
Intel: Industry Enablement of IO Technologies
 
Q1 Memory Fabric Forum: Memory Processor Interface 2023, Focus on CXL
Q1 Memory Fabric Forum: Memory Processor Interface 2023, Focus on CXLQ1 Memory Fabric Forum: Memory Processor Interface 2023, Focus on CXL
Q1 Memory Fabric Forum: Memory Processor Interface 2023, Focus on CXL
 
The Containers Ecosystem, the OpenStack Magnum Project, the Open Container In...
The Containers Ecosystem, the OpenStack Magnum Project, the Open Container In...The Containers Ecosystem, the OpenStack Magnum Project, the Open Container In...
The Containers Ecosystem, the OpenStack Magnum Project, the Open Container In...
 
Q1 Memory Fabric Forum: Compute Express Link (CXL) 3.1 Update
Q1 Memory Fabric Forum: Compute Express Link (CXL) 3.1 UpdateQ1 Memory Fabric Forum: Compute Express Link (CXL) 3.1 Update
Q1 Memory Fabric Forum: Compute Express Link (CXL) 3.1 Update
 
Webinar: Dealing with automation tool overload!
Webinar: Dealing with automation tool overload!Webinar: Dealing with automation tool overload!
Webinar: Dealing with automation tool overload!
 
Optimize Content Delivery with Multi-Access Edge Computing
Optimize Content Delivery with Multi-Access Edge ComputingOptimize Content Delivery with Multi-Access Edge Computing
Optimize Content Delivery with Multi-Access Edge Computing
 
Cisco Connect Halifax 2018 Cisco Spark hybrid services architectural design
Cisco Connect Halifax 2018   Cisco Spark hybrid services architectural designCisco Connect Halifax 2018   Cisco Spark hybrid services architectural design
Cisco Connect Halifax 2018 Cisco Spark hybrid services architectural design
 
Cisco connect montreal 2018 collaboration les services webex hybrides
Cisco connect montreal 2018 collaboration les services webex hybridesCisco connect montreal 2018 collaboration les services webex hybrides
Cisco connect montreal 2018 collaboration les services webex hybrides
 
Supermicro AI Pod that’s Super Simple, Super Scalable, and Super Affordable
Supermicro AI Pod that’s Super Simple, Super Scalable, and Super AffordableSupermicro AI Pod that’s Super Simple, Super Scalable, and Super Affordable
Supermicro AI Pod that’s Super Simple, Super Scalable, and Super Affordable
 
Citi Tech Talk: Hybrid Cloud
Citi Tech Talk: Hybrid CloudCiti Tech Talk: Hybrid Cloud
Citi Tech Talk: Hybrid Cloud
 
CPaaS.io Y1 Review Meeting - Cloud & Edge Programming
CPaaS.io Y1 Review Meeting - Cloud & Edge ProgrammingCPaaS.io Y1 Review Meeting - Cloud & Edge Programming
CPaaS.io Y1 Review Meeting - Cloud & Edge Programming
 
MinIO January 2020 Briefing
MinIO January 2020 BriefingMinIO January 2020 Briefing
MinIO January 2020 Briefing
 
Necos keynote UFRN Telecomday
Necos keynote UFRN TelecomdayNecos keynote UFRN Telecomday
Necos keynote UFRN Telecomday
 
Xilinx Data Center Strategy and CCIX
Xilinx Data Center Strategy and CCIXXilinx Data Center Strategy and CCIX
Xilinx Data Center Strategy and CCIX
 
Synopsys: Achieve First Pass Silicon Success with Synopsys CXL IP Solutions
Synopsys: Achieve First Pass Silicon Success with Synopsys CXL IP SolutionsSynopsys: Achieve First Pass Silicon Success with Synopsys CXL IP Solutions
Synopsys: Achieve First Pass Silicon Success with Synopsys CXL IP Solutions
 
Container Ecosystem and Docker Technology
Container Ecosystem and Docker TechnologyContainer Ecosystem and Docker Technology
Container Ecosystem and Docker Technology
 
“Khronos Group Standards: Powering the Future of Embedded Vision,” a Presenta...
“Khronos Group Standards: Powering the Future of Embedded Vision,” a Presenta...“Khronos Group Standards: Powering the Future of Embedded Vision,” a Presenta...
“Khronos Group Standards: Powering the Future of Embedded Vision,” a Presenta...
 
Red Hat multi-cluster management & what's new in OpenShift
Red Hat multi-cluster management & what's new in OpenShiftRed Hat multi-cluster management & what's new in OpenShift
Red Hat multi-cluster management & what's new in OpenShift
 
Cloudviews eurocloud rcosta
Cloudviews eurocloud rcostaCloudviews eurocloud rcosta
Cloudviews eurocloud rcosta
 

More from Memory Fabric Forum

H3 Platform CXL Solution_Memory Fabric Forum.pptx
H3 Platform CXL Solution_Memory Fabric Forum.pptxH3 Platform CXL Solution_Memory Fabric Forum.pptx
H3 Platform CXL Solution_Memory Fabric Forum.pptxMemory Fabric Forum
 
Q1 Memory Fabric Forum: ZeroPoint. Remove the waste. Release the power.
Q1 Memory Fabric Forum: ZeroPoint. Remove the waste. Release the power.Q1 Memory Fabric Forum: ZeroPoint. Remove the waste. Release the power.
Q1 Memory Fabric Forum: ZeroPoint. Remove the waste. Release the power.Memory Fabric Forum
 
Q1 Memory Fabric Forum: Using CXL with AI Applications - Steve Scargall.pptx
Q1 Memory Fabric Forum: Using CXL with AI Applications - Steve Scargall.pptxQ1 Memory Fabric Forum: Using CXL with AI Applications - Steve Scargall.pptx
Q1 Memory Fabric Forum: Using CXL with AI Applications - Steve Scargall.pptxMemory Fabric Forum
 
Q1 Memory Fabric Forum: Memory expansion with CXL-Ready Systems and Devices
Q1 Memory Fabric Forum: Memory expansion with CXL-Ready Systems and DevicesQ1 Memory Fabric Forum: Memory expansion with CXL-Ready Systems and Devices
Q1 Memory Fabric Forum: Memory expansion with CXL-Ready Systems and DevicesMemory Fabric Forum
 
Q1 Memory Fabric Forum: About MindShare Training
Q1 Memory Fabric Forum: About MindShare TrainingQ1 Memory Fabric Forum: About MindShare Training
Q1 Memory Fabric Forum: About MindShare TrainingMemory Fabric Forum
 
Q1 Memory Fabric Forum: CXL-Related Activities within OCP
Q1 Memory Fabric Forum: CXL-Related Activities within OCPQ1 Memory Fabric Forum: CXL-Related Activities within OCP
Q1 Memory Fabric Forum: CXL-Related Activities within OCPMemory Fabric Forum
 
Q1 Memory Fabric Forum: CXL Controller by Montage Technology
Q1 Memory Fabric Forum: CXL Controller by Montage TechnologyQ1 Memory Fabric Forum: CXL Controller by Montage Technology
Q1 Memory Fabric Forum: CXL Controller by Montage TechnologyMemory Fabric Forum
 
Q1 Memory Fabric Forum: Teledyne LeCroy | Austin Labs
Q1 Memory Fabric Forum: Teledyne LeCroy | Austin LabsQ1 Memory Fabric Forum: Teledyne LeCroy | Austin Labs
Q1 Memory Fabric Forum: Teledyne LeCroy | Austin LabsMemory Fabric Forum
 
Q1 Memory Fabric Forum: Breaking Through the Memory Wall
Q1 Memory Fabric Forum: Breaking Through the Memory WallQ1 Memory Fabric Forum: Breaking Through the Memory Wall
Q1 Memory Fabric Forum: Breaking Through the Memory WallMemory Fabric Forum
 
Q1 Memory Fabric Forum: SMART CXL Product Lineup
Q1 Memory Fabric Forum: SMART CXL Product LineupQ1 Memory Fabric Forum: SMART CXL Product Lineup
Q1 Memory Fabric Forum: SMART CXL Product LineupMemory Fabric Forum
 
Q1 Memory Fabric Forum: CXL Form Factor Primer
Q1 Memory Fabric Forum: CXL Form Factor PrimerQ1 Memory Fabric Forum: CXL Form Factor Primer
Q1 Memory Fabric Forum: CXL Form Factor PrimerMemory Fabric Forum
 
Q1 Memory Fabric Forum: Memory Fabric in a Composable System
Q1 Memory Fabric Forum: Memory Fabric in a Composable SystemQ1 Memory Fabric Forum: Memory Fabric in a Composable System
Q1 Memory Fabric Forum: Memory Fabric in a Composable SystemMemory Fabric Forum
 
Q1 Memory Fabric Forum: Big Memory Computing for AI
Q1 Memory Fabric Forum: Big Memory Computing for AIQ1 Memory Fabric Forum: Big Memory Computing for AI
Q1 Memory Fabric Forum: Big Memory Computing for AIMemory Fabric Forum
 
Q1 Memory Fabric Forum: Micron CXL-Compatible Memory Modules
Q1 Memory Fabric Forum: Micron CXL-Compatible Memory ModulesQ1 Memory Fabric Forum: Micron CXL-Compatible Memory Modules
Q1 Memory Fabric Forum: Micron CXL-Compatible Memory ModulesMemory Fabric Forum
 
Q1 Memory Fabric Forum: Advantages of Optical CXL​ for Disaggregated Compute ...
Q1 Memory Fabric Forum: Advantages of Optical CXL​ for Disaggregated Compute ...Q1 Memory Fabric Forum: Advantages of Optical CXL​ for Disaggregated Compute ...
Q1 Memory Fabric Forum: Advantages of Optical CXL​ for Disaggregated Compute ...Memory Fabric Forum
 
Q1 Memory Fabric Forum: Intel Enabling Compute Express Link (CXL)
Q1 Memory Fabric Forum: Intel Enabling Compute Express Link (CXL)Q1 Memory Fabric Forum: Intel Enabling Compute Express Link (CXL)
Q1 Memory Fabric Forum: Intel Enabling Compute Express Link (CXL)Memory Fabric Forum
 
Q1 Memory Fabric Forum: XConn CXL Switches for AI
Q1 Memory Fabric Forum: XConn CXL Switches for AIQ1 Memory Fabric Forum: XConn CXL Switches for AI
Q1 Memory Fabric Forum: XConn CXL Switches for AIMemory Fabric Forum
 
Q1 Memory Fabric Forum: VMware Memory Vision
Q1 Memory Fabric Forum: VMware Memory VisionQ1 Memory Fabric Forum: VMware Memory Vision
Q1 Memory Fabric Forum: VMware Memory VisionMemory Fabric Forum
 
MemVerge: Memory Expansion Without Breaking the Budget
MemVerge: Memory Expansion Without Breaking the BudgetMemVerge: Memory Expansion Without Breaking the Budget
MemVerge: Memory Expansion Without Breaking the BudgetMemory Fabric Forum
 
Micron - CXL Enabling New Pliability in the Modern Data Center.pptx
Micron - CXL Enabling New Pliability in the Modern Data Center.pptxMicron - CXL Enabling New Pliability in the Modern Data Center.pptx
Micron - CXL Enabling New Pliability in the Modern Data Center.pptxMemory Fabric Forum
 

More from Memory Fabric Forum (20)

H3 Platform CXL Solution_Memory Fabric Forum.pptx
H3 Platform CXL Solution_Memory Fabric Forum.pptxH3 Platform CXL Solution_Memory Fabric Forum.pptx
H3 Platform CXL Solution_Memory Fabric Forum.pptx
 
Q1 Memory Fabric Forum: ZeroPoint. Remove the waste. Release the power.
Q1 Memory Fabric Forum: ZeroPoint. Remove the waste. Release the power.Q1 Memory Fabric Forum: ZeroPoint. Remove the waste. Release the power.
Q1 Memory Fabric Forum: ZeroPoint. Remove the waste. Release the power.
 
Q1 Memory Fabric Forum: Using CXL with AI Applications - Steve Scargall.pptx
Q1 Memory Fabric Forum: Using CXL with AI Applications - Steve Scargall.pptxQ1 Memory Fabric Forum: Using CXL with AI Applications - Steve Scargall.pptx
Q1 Memory Fabric Forum: Using CXL with AI Applications - Steve Scargall.pptx
 
Q1 Memory Fabric Forum: Memory expansion with CXL-Ready Systems and Devices
Q1 Memory Fabric Forum: Memory expansion with CXL-Ready Systems and DevicesQ1 Memory Fabric Forum: Memory expansion with CXL-Ready Systems and Devices
Q1 Memory Fabric Forum: Memory expansion with CXL-Ready Systems and Devices
 
Q1 Memory Fabric Forum: About MindShare Training
Q1 Memory Fabric Forum: About MindShare TrainingQ1 Memory Fabric Forum: About MindShare Training
Q1 Memory Fabric Forum: About MindShare Training
 
Q1 Memory Fabric Forum: CXL-Related Activities within OCP
Q1 Memory Fabric Forum: CXL-Related Activities within OCPQ1 Memory Fabric Forum: CXL-Related Activities within OCP
Q1 Memory Fabric Forum: CXL-Related Activities within OCP
 
Q1 Memory Fabric Forum: CXL Controller by Montage Technology
Q1 Memory Fabric Forum: CXL Controller by Montage TechnologyQ1 Memory Fabric Forum: CXL Controller by Montage Technology
Q1 Memory Fabric Forum: CXL Controller by Montage Technology
 
Q1 Memory Fabric Forum: Teledyne LeCroy | Austin Labs
Q1 Memory Fabric Forum: Teledyne LeCroy | Austin LabsQ1 Memory Fabric Forum: Teledyne LeCroy | Austin Labs
Q1 Memory Fabric Forum: Teledyne LeCroy | Austin Labs
 
Q1 Memory Fabric Forum: Breaking Through the Memory Wall
Q1 Memory Fabric Forum: Breaking Through the Memory WallQ1 Memory Fabric Forum: Breaking Through the Memory Wall
Q1 Memory Fabric Forum: Breaking Through the Memory Wall
 
Q1 Memory Fabric Forum: SMART CXL Product Lineup
Q1 Memory Fabric Forum: SMART CXL Product LineupQ1 Memory Fabric Forum: SMART CXL Product Lineup
Q1 Memory Fabric Forum: SMART CXL Product Lineup
 
Q1 Memory Fabric Forum: CXL Form Factor Primer
Q1 Memory Fabric Forum: CXL Form Factor PrimerQ1 Memory Fabric Forum: CXL Form Factor Primer
Q1 Memory Fabric Forum: CXL Form Factor Primer
 
Q1 Memory Fabric Forum: Memory Fabric in a Composable System
Q1 Memory Fabric Forum: Memory Fabric in a Composable SystemQ1 Memory Fabric Forum: Memory Fabric in a Composable System
Q1 Memory Fabric Forum: Memory Fabric in a Composable System
 
Q1 Memory Fabric Forum: Big Memory Computing for AI
Q1 Memory Fabric Forum: Big Memory Computing for AIQ1 Memory Fabric Forum: Big Memory Computing for AI
Q1 Memory Fabric Forum: Big Memory Computing for AI
 
Q1 Memory Fabric Forum: Micron CXL-Compatible Memory Modules
Q1 Memory Fabric Forum: Micron CXL-Compatible Memory ModulesQ1 Memory Fabric Forum: Micron CXL-Compatible Memory Modules
Q1 Memory Fabric Forum: Micron CXL-Compatible Memory Modules
 
Q1 Memory Fabric Forum: Advantages of Optical CXL​ for Disaggregated Compute ...
Q1 Memory Fabric Forum: Advantages of Optical CXL​ for Disaggregated Compute ...Q1 Memory Fabric Forum: Advantages of Optical CXL​ for Disaggregated Compute ...
Q1 Memory Fabric Forum: Advantages of Optical CXL​ for Disaggregated Compute ...
 
Q1 Memory Fabric Forum: Intel Enabling Compute Express Link (CXL)
Q1 Memory Fabric Forum: Intel Enabling Compute Express Link (CXL)Q1 Memory Fabric Forum: Intel Enabling Compute Express Link (CXL)
Q1 Memory Fabric Forum: Intel Enabling Compute Express Link (CXL)
 
Q1 Memory Fabric Forum: XConn CXL Switches for AI
Q1 Memory Fabric Forum: XConn CXL Switches for AIQ1 Memory Fabric Forum: XConn CXL Switches for AI
Q1 Memory Fabric Forum: XConn CXL Switches for AI
 
Q1 Memory Fabric Forum: VMware Memory Vision
Q1 Memory Fabric Forum: VMware Memory VisionQ1 Memory Fabric Forum: VMware Memory Vision
Q1 Memory Fabric Forum: VMware Memory Vision
 
MemVerge: Memory Expansion Without Breaking the Budget
MemVerge: Memory Expansion Without Breaking the BudgetMemVerge: Memory Expansion Without Breaking the Budget
MemVerge: Memory Expansion Without Breaking the Budget
 
Micron - CXL Enabling New Pliability in the Modern Data Center.pptx
Micron - CXL Enabling New Pliability in the Modern Data Center.pptxMicron - CXL Enabling New Pliability in the Modern Data Center.pptx
Micron - CXL Enabling New Pliability in the Modern Data Center.pptx
 

Recently uploaded

Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebUiPathCommunity
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr BaganFwdays
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Scott Keck-Warren
 
Story boards and shot lists for my a level piece
Story boards and shot lists for my a level pieceStory boards and shot lists for my a level piece
Story boards and shot lists for my a level piececharlottematthew16
 
Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Enterprise Knowledge
 
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)Wonjun Hwang
 
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationSafe Software
 
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 3652toLead Limited
 
SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024Scott Keck-Warren
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsMark Billinghurst
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationRidwan Fadjar
 
Vertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsVertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsMiki Katsuragi
 
costume and set research powerpoint presentation
costume and set research powerpoint presentationcostume and set research powerpoint presentation
costume and set research powerpoint presentationphoebematthew05
 
SAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxSAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxNavinnSomaal
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Mark Simos
 
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks..."LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...Fwdays
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfAddepto
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii SoldatenkoFwdays
 
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...shyamraj55
 

Recently uploaded (20)

Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio Web
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024
 
Story boards and shot lists for my a level piece
Story boards and shot lists for my a level pieceStory boards and shot lists for my a level piece
Story boards and shot lists for my a level piece
 
Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024
 
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
 
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
 
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
 
SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR Systems
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 Presentation
 
Vertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsVertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering Tips
 
costume and set research powerpoint presentation
costume and set research powerpoint presentationcostume and set research powerpoint presentation
costume and set research powerpoint presentation
 
SAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxSAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptx
 
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptxE-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
 
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks..."LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdf
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko
 
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
 

CXLTM: Getting Ready for Take-Off - SEO-Optimized Title for CXL Forum Document

  • 1. CXL™: Getting Ready for Take-Off Full-Day Forum at Flash Memory Summit Hosted by The CXL Consortium and MemVerge Slides and video now available at https://memverge.com/cxl-forum/
  • 2. Morning Agenda 2 Start End Name Title Organization 8:35 8:50 Siamak Tavallaei President, CXL Consortium, Chief System Architect, Google Infrastructure 8:50 9:10 Willie Nelson Technology Enabling Architect 9:10 9:30 Steve Glaser Principal Engineer, PCI-SIG Board Member 9:30 9:50 Shalesh Thusoo VP, CXL Product Development 9:50 10:10 Jonathan Prout Sr. Manager, Memory Product Planning 10:10 10:30 Uksong Kang Vice President, DRAM Product Planning 10:30 10:50 Ryan Baxter Sr. Director of Marketing Session SPOS-101-1 on the FMS program
  • 3. Afternoon Agenda 3 Start End Name Title Organization 3:25 3:45 Arvind Jagannath Cloud Platform Product Management 3:45 4:05 Mahesh Wagh Senior Fellow 4:05 4:25 Charles Fan CEO & Co-founder 4:25 4:45 Manoj Wadekar SW-Defined Memory Workstream Lead, OCP, Storage Architect, Meta 4:45 5:10 Siamak Tavallaei Panel Moderator President, CXL Consortium, Chief System Architect, Google Infrastructure 5:10 5:35 Chris Mellor Panel Moderator Editor Session SPOS-102-1 on the FMS program
  • 4. Update from the CXL Consortium 4 Siamak Tavallaei CXL President Chief Systems Architect at Google Systems Infrastructure
  • 5. 5 | ©2022 Flash Memory Summit. All Rights Reserved. CXL™ Consortium Update Siamak Tavallaei, CXL President
  • 6. 6 | ©2022 Flash Memory Summit. All Rights Reserved. Introducing the CXL Consortium CXL Board of Directors 200+ MemberCompanies IndustryOpen Standardfor HighSpeedCommunications
  • 7. 7 | ©2022 Flash Memory Summit. All Rights Reserved. CXL Specification Release Timeline March 2019 CXL 1.0 Specificatio n Released September 2019 CXL Consortium Officially Incorporates CXL 1.1 Specification Released November 2020 CXL 2.0 Specification Released August 2022 CXL 3.0 Specification Released Press Release August 2, 2022, Flash Memory Summit CXL Consortium releases Compute Express Link 3.0 specification to expand fabric capabilities and management Members: 130+ Members: 15+ Members: 200+
  • 8. 8 | ©2022 Flash Memory Summit. All Rights Reserved. Compute Express Link ™ (CXL™) Overview
  • 9. 9 | ©2022 Flash Memory Summit. All Rights Reserved. Industry Landscape Proliferation of Cloud Computing Growth of AI & Analytics Cloudification of the Network & Edge
  • 10. 10 | ©2022 Flash Memory Summit. All Rights Reserved. Data Center: Expanding Scope of CXL CXL 2.0 across Multiple Nodes inside a Rack/ Chassis supporting pooling of resources Future - CXL 3.0 Fabric growth for disaggregation/pooling/accelerator
  • 11. 11 | ©2022 Flash Memory Summit. All Rights Reserved. Growing Industry Momentum • CXL Consortium showcased first public demonstrations of CXL technology at SC’21 • View virtual and live demos from CXL Consortium members here: https://www.computeexpresslink.org/videos • Demos showcase CXL usages, including memory development, memory expansion and memory disaggregation
  • 12. 12 | ©2022 Flash Memory Summit. All Rights Reserved. Industry Focal Point CXL is emerging as the industry focal point for coherent IO • CXL Consortium and OpenCAPI sign letter of intent to transfer OpenCAPI specification and assets to the CXL Consortium • In February 2022, CXL Consortium and Gen- Z Consortium signed agreement to transfer Gen-Z specification and assets to CXL Consortium August 1, 2022, Flash Memory Summit CXL Consortium and OpenCAPI Consortium Sign Letter of Intent to Transfer OpenCAPI Assets to CXL
  • 13. 13 | ©2022 Flash Memory Summit. All Rights Reserved. Unveiling the CXL 3.0 specification Press Release August 2, 2022, Flash Memory Summit CXL Consortium releases Compute Express Link 3.0 specification to expand fabric capabilities and management
  • 14. 14 | ©2022 Flash Memory Summit. All Rights Reserved. Industry trends • Use cases driving need for higher bandwidth include: high performance accelerators, system memory, SmartNIC and leading edge networking • CPU efficiency is declining due to reduced memory capacity and bandwidth per core • Efficient peer-to-peer resource sharing across multiple domains • Memory bottlenecks due to CPU pin and thermal constraints CXL 3.0 introduces • Fabric capabilities • Multi-headed and fabric attached devices • Enhance fabric management • Composable disaggregated infrastructure • Improved capability for better scalability and resource utilization • Enhanced memory pooling • Multi-level switching • New enhanced coherency capabilities • Improved software capabilities • Double the bandwidth • Zero added latency over CXL 2.0 • Full backward compatibility with CXL 2.0, CXL 1.1, and CXL 1.0 CXL 3.0 Specification
  • 15. 15 | ©2022 Flash Memory Summit. All Rights Reserved. CXL 3.0 Specification Feature Summary
  • 16. 16 | ©2022 Flash Memory Summit. All Rights Reserved. CXL 3.0: Expanding CXL Use Cases • Enabling new usage models • Memory sharing between hosts and peer devices • Support for multi-headed devices • Expanded support for Type-1 and Type-2 devices • GFAM provides expansion capabilities for current and future memory Download the CXL 3.0 specification on www.ComputeExpressLink.org
  • 17. 17 | ©2022 Flash Memory Summit. All Rights Reserved. Call to Action • Join the CXL Consortium, visit www.computeexpresslink.org/join • Attend CXL Consortium presentations at the Systems Architecture Track on Wednesday, August 3 for a deep-dive into the CXL 3.0 specification • Engage with us on social media @ComputeExLink www.linkedin.com/company/cxl-consortium/ CXL Consortium Channel
  • 18. 18 | ©2022 Flash Memory Summit. All Rights Reserved. Thank you!
  • 19. 19 | ©2022 Flash Memory Summit. All Rights Reserved. Backup
  • 20. 20 | ©2022 Flash Memory Summit. All Rights Reserved. Multiple Devices of all Types per Root Port Each host’s root port can connect to more than one device type 1
  • 21. 21 | ©2022 Flash Memory Summit. All Rights Reserved. Fabrics Overview CXL 3.0 enables non-tree architectures • Each node can be a CXL Host, CXL device or PCIe device 1
  • 22. 22 | ©2022 Flash Memory Summit. All Rights Reserved. Switch Cascade/Fanout Supporting vast array of switch topologies Multiple switch levels (aka cascade) • Supports fanout of all device types 1
  • 23. 23 | ©2022 Flash Memory Summit. All Rights Reserved. Device to Device Comms CXL 3.0 enables peer-to- peer communication (P2P) within a virtual hierarchy of devices • Virtual hierarchies are associations of devices that maintains a coherency domain 1
  • 24. 24 | ©2022 Flash Memory Summit. All Rights Reserved. Coherent Memory Sharing Device memory can be shared by all hosts to increase data flow efficiency and improve memory utilization Host can have a coherent copy of the shared region or portions of shared region in host cache CXL 3.0 defined mechanisms to enforce hardware cache coherency between copies 1 2 3
  • 25. 25 | ©2022 Flash Memory Summit. All Rights Reserved. Memory Pooling and Sharing Expanded use case showing memory sharing and pooling CXL Fabric Manager is available to setup, deploy, and modify the environment 1 2
  • 26. 26 Willie Nelson Architect Intel Steve Glaser Principal Architect, PIC SIG Board Member NVIDIA Shalesh Thusoo CXL Business Unit Marvell
  • 27. 27 | ©2022 Flash Memory Summit. All Rights Reserved. CXL – Industry Enablement Willie Nelson Technology Enabling Architect - Intel August 2022
  • 28. 28 | ©2022 Flash Memory Summit. All Rights Reserved. Introducing the CXL Consortium CXL Board of Directors 200+ MemberCompanies IndustryOpen Standardfor HighSpeedCommunications
  • 29. 29 | ©2022 Flash Memory Summit. All Rights Reserved. Growing Industry Momentum • CXL Consortium showcased first public demonstrations of CXL technology at SC’21 • View virtual and live demos from CXL Consortium members here: https://www.computeexpresslink.org/videos • Demos showcase CXL usages, including memory development, memory expansion and memory disaggregation
  • 30. 30 | ©2022 Flash Memory Summit. All Rights Reserved. Industry Focal Point CXL is emerging as the industry focal point for coherent IO • CXL Consortium and OpenCAPI sign letter of intent to transfer OpenCAPI specification and assets to the CXL Consortium • In February 2022, CXL Consortium and Gen- Z Consortium signed agreement to transfer Gen-Z specification and assets to CXL Consortium August 1, 2022, Flash Memory Summit CXL Consortium and OpenCAPI Consortium Sign Letter of Intent to Transfer OpenCAPI Assets to CXL
  • 31. 31 | ©2022 Flash Memory Summit. All Rights Reserved. CXL Specification Release Timeline March 2019 CXL 1.0 Specificatio n Released September 2019 CXL Consortium Officially Incorporates CXL 1.1 Specification Released November 2020 CXL 2.0 Specification Released August 2022 CXL 3.0 Specification Released Press Release August 2, 2022, Flash Memory Summit CXL Consortium releases Compute Express Link 3.0 specification to expand fabric capabilities and management Members: 130+ Members: 15+ Members: 200+
  • 32. 32 | ©2022 Flash Memory Summit. All Rights Reserved. New Technology Enabling – Key Contributors Revolutionary New Technology HWDevelopmentTools(Analyzers,etc.) SuccessfulNewTechnologyEnablingRequiresALL Contributors to beViableforIndustry Adoption HWSilicon/ControllerVendors SiIPProviders(incl.pre-sisimulation) HardwareProductionProductVendors SWDevelopmentTools(testing,debug,perf.,etc.) Device/UseCaseOSDrivers OperatingSystemSupport UseCaseApplications(tangiblebenefitsw/newtech) Standards/Consortiums/etc… IndustryAdoption
  • 33. 33 | ©2022 Flash Memory Summit. All Rights Reserved. Intel CXL Memory Enablement & Validation DDR PCIe CXL Memory POR Platform Configurations • Large matrix of POR configurations • “Open socket” (extensive variety of technology and use cases • Plans to validate specific POR configurations of CXL memory per platform, with several vendors and modules – not exhaustive Engagement Model • Direct engagement and collaboration with Tier1 suppliers • SIG-based engagement with PCIe IHVs • Targeted engagement with numerous CXL memory device & module IHVs, as well as key customers, plus multiple Consortium based compliance workshops and various interactions Validation Model • Early and exhaustive Host-based validation spanning electrical, protocol, functional • SIG-led compliance workshops & plugfests • Host PCIe validation focus on PCIe channel, protocol features/function • Limited platform validation with PCIe products • Host validation focus on CXL channel, features & function of CXL memory as part of platform’s memory subsystem • CXL memory device & module IHV validation focus on device+media channel, function/features • Long term plan: Consortium-led compliance testing Comparing CXL memory validation to DDR/PCIe efforts Approach for CXL memory expected to evolve over generations to be PCIe-like
  • 34. 34 | ©2022 Flash Memory Summit. All Rights Reserved. Industry CXL Memory HW Enabling & Validation CPU Vendor focus: • Work with device & module vendors to enable key features • Provided CXL vendors an open, bridge architecture reference document as an initial guide, covering Bridge/module operation/features recommendations • Device/module platform integration (focused configs) • For initial AIC CEM Modules – focused validation of the media interface • Validation: • Host-side CXL functions • Memory features – RAS, etc. • CXL channel • Specificconfigs and vendors (# of ports, capacity, etc.) • SW Enabling: • Intel providing reference system FW/BIOS • Part of the industry effort to develop an open-source driver • SW guide for type 3 devices CXL IP Media IP DRAM DRAM DRAM DRAM DRAM DRAM DRAM DRAM DRAM DRAM CXL Memory Bridge (aka controller, buffer) CXL Memory Module* Bridge-media channel Host CXL Channel CXL IP Controller/Module Vendor focus: (bridge or module) • Memory media interface, channel electricals, media training/MRC • CXL compliance and interoperability testing *Standardization of CXL memory module form factors – EDSFF E3.s & E1.s, PCI CEM and mezzanine in process OEM/System provider focus: • Device/module platform integration • Configuration testing • In-rack level testing • Usage models testing/debug • System Validation: • SW integration including system FW/BIOS, OS, generic driver • Generate integrator list Config-1………….. Config-N A Massive Coordinated Industry Effort CPU/Host
  • 35. 35 | ©2022 Flash Memory Summit. All Rights Reserved. Q & A Willie Nelson Technology Enabling Architect - Intel August 2022
  • 36. 36 | ©2022 Flash Memory Summit. All Rights Reserved. CXL Delivers the Right Features & Architecture CXL Anopen industry-supported cache-coherent interconnectforprocessors, memoryexpansion and accelerators CoherentInterface LeveragesPCIewith3mix-and-match protocols LowLatency .Cacheand.MemorytargetedatnearCPU cachecoherentlatency AsymmetricComplexity Easesburdensofcachecoherent interfacedesigns Challenges Industrytrendsdriving demandforfasterdataprocessingandnext-gen datacenterperformance Increasingdemandforheterogeneouscomputing andserver disaggregation Needforincreasedmemorycapacityandbandwidth Lackofopenindustrystandardtoaddressnext-geninterconnect challenges https://www.computeexpresslink.org/resource-library
  • 37. 37 | ©2022 Flash Memory Summit. All Rights Reserved. Representative CXL Usages Memory CXL • CXL.io • CXL.memory PROTOCOLS Memory Memory Memory Memory MemoryBuffer Processor DDR DDR • MemoryBW expansion • Memorycapacity expansion • Storage classmemory USAGES Accelerators with Memory CXL • CXL.io • CXL.cache • CXL.memory PROTOCOLS • GP GPU • Densecomputation USAGES HBM Accelerator Cache Processor DDR DDR CachingDevices /Accelerators CXL • CXL.io • CXL.cache PROTOCOLS • PGAS NIC • NIC atomics USAGES Accelerator NIC Cache Processor DDR DDR TYPE1 TYPE2 TYPE3 HBM
  • 38. 38 | ©2022 Flash Memory Summit. All Rights Reserved. Usage Local Bandwidth or Capacity Expansion Memory Pooling Main memory expansion Two-Tier Memory Value Prop Scale performance or enable use of higher core counts via added bandwidth and/or capacity Flexible memory assignment, enabling: - Lower total memory cost - Platform SKU reduction & OpEx efficiency CXL Memory Attributes Bandwidth and features similar to direct attach DDR Lower bandwidth, higher latency vs. direct attach DDR Bandwidth and features similar to direct attach DDR, latency similar to remote socket access Software Considerations OS version must support CXL memory. CXL memory visible either in same region as direct attach DDR5 or as a separate region OS version must support CXL memory. SW-visible as Persistent next- tier memory OS version must support CXL memory. Additional software layer for orchestration of pooled memory and multi-port controller CXL Memory Overview Pool CPU Direct Attach DDR5 EDSFF E3 or E1 PCI CEM/Custom Board Pooled Memory Controller
  • 39. 39 CXL: BEYOND JUST ANOTHER INTERCONNECT PROTOCOL STEVE GLASER
  • 40. 40 AGENDA  Cache Coherence for Accelerators  Expansion Memory for CPUs  Flexible Tiered Memory Configurations  Security
  • 41. 41 CPU-GPU CACHE COHERENCE Unified programming model across CPU architectures  CPU-GPU coherence provides programmability benefits  Ease of porting applications to GPU  Rapid development for new applications  Grace + Hopper Superchip introduces cache- coherent programming to GPUs  CXL enables the same programming benefits for our GPUs in systems based on 3rd-party CPUs Grace CPU Hopper GPU Coherent NVLink C2C x86/ Arm CPU NVIDIA GPU Coherent CXL Link
  • 42. 42 PROGRAMMABILITY BENEFITS CXL CPU-GPU cache coherence reduces barrier to entry  Without Shared Virtual Memory (SVM) + coherence, nothing works until everything works  Enables single allocator for all types of memory: Host, Host- accelerator coherent, accelerator-only  Eases porting complicated pipelines in stages  Many SW layers exist between frameworks and drivers  Example: start with malloc, keep using malloc until you choose otherwise  Vendor-provided allocators remain fully supported and functional  Workloads are pushing scaling boundaries  fine-grained synchronization is on the rise  Synchronization latency matters  Avoid setup latency, do it in-memory when possible  Host/device synchronization in device’s memory  Concurrent algorithms and data structures become available  Example: full C++ atomics support across host and device  Locks  Any suballocation can be used for synchronization, regardless of placement Ap pP erf Programmi ng Effort Star t v1 with SVM + Coherenc e v1 without SVM or Coherence
  • 43. 43 CXL FOR CPU MEMORY EXPANSION  SOC DDR channel count is becoming constrained  CXL-enabled PCIe ports can be used for additional memory capacity  Flexibility in underlying media choice, trading off capacity/latency/persistence  DRAM  DRAM + cache  Storage-class memory  DDR/SCM + NVMe DDR Host SOC0 Host SOC1 DDR DDR DDR DDR DDR DDR DDR DDR DDR CXL Mem CXL Mem CXL Mem CXL Mem
  • 44. 44 CXL FOR MEMORY DISAGGREGATION  Currently, data center servers are often over-provisioned with memory  All Hosts must be have enough DRAM to handle the demands of worst- case workloads  Under less memory-intensive workloads, DRAM is unused and wasted  DRAM is very expensive at data center scale  Large banks of CXL memory can be distributed among several Hosts  Memory pools may be attached to Hosts via CXL switches, or directly attached using multi-port memory devices  Pooling  Each Host is allocated a portion of the disaggregated memory  Memory pools can be reallocated as needed  Reduces memory over-provisioning on each Host while allowing flexibility to handle a range of workloads with differing memory demands  Sharing  Address ranges which may be accessed by multiple Hosts simultaneously  Coherence may be provided in hardware by the CXL Device or may be software-managed CXL Switch Fabric Host Host Host CXL Memory Pool CXL Memory Pool
  • 45. 45 CXL FOR GPU EXPANSION MEMORY Tackling AI with very large memory capacity demands  Accelerator workloads with large memory footprints are currently challenged  Constrained by bandwidth available to Host over PCIe  Contention with Host SW for memory bandwidth  CXL memory expanders may be directly attached to accelerators for private use  Tiered memory for GPUs: HBM and CXL tradeoffs  Bandwidth  Capacity  Cost  Flexibility GPU Host CXL Memory HBM HBM Coherent CPU-GPUCXL Link Private GPU-MemoryCXL Link(s)
  • 46. 46 CXL FOR GPU MEMORY POOLING Streamlined Accelerator Data Sharing  Memory pools may provide flexibility to apportion memory to individual GPUs as needed  Provides solution to workloads where capacity is important and bandwidth is secondary  Large data sets can be stored in CXL memory and shared as needed among accelerators, without burdening interface to Host GPU GPU GPU CXLSwitch Fabric Host Host Host CXL Memory Pool CXL Memory Pool HBM HBM HBM HBM HBM HBM
  • 47. 47 SHARED EXPANSION MEMORY CPU-GPU Shared Memory Pools  CXL enables sharing of expansion memory between Host and GPU  Future capabilities may allow expansion memory to simultaneously be shared  Among Hosts  Between Hosts and Accelerators  Flexibility in provisioning under varying demands  Ease of programming model  CXL Switch could be local physical switch or virtual switch over other physical transport enabling remote disaggregated memory CXLSwitch Host CXL Memory GPU CXLSwitch Host GPU Host Host CXL Memory Pool GPU CXL Memory Pool GPU CXL Memory Pool
  • 48. 48 CXL FOR CONFIDENTIAL COMPUTING Vision for secure accelerated computing  Confidential computing components will be  Partitionable and assignable to Trusted Execution Environment Virtual Machines (TVM)  TVMs can create their own secure virtual environments including  Host resources  Accelerator partition  Shared memory partitions  Data transfers encrypted and integrity protected  Components are securely authenticated  Partitions are secure from accesses by untrusted entities  Other VMs/TVMs  Firmware  VMM  All CXL capabilities are enabled in secure domains Memory Pool Memory Pool CXL Confidential Compute Host Confidential Compute GPU TVM GPU Partition TVM TVM GPU Partition GPU Partition Memory Partition Memory Partition Memory Partition Memory Partition
  • 49. 49
  • 50. Transforming Cloud Data Centers with CXL Shalesh Thusoo VP, CXL Product Development July 2022
  • 51. © 2022 Marvell. All rights reserved. 51 Cloud data center memory challenges CXL is poised to address these issues Bandwidth per core declining Normalized growth rate 3.5 3 1.5 1 0.5 0 2 2.5 2012 2014 2016 2018 2020 CPU core count Memory channel BW per core Source: Meta, OCP Summit Presentation Nov 2021 Degrades efficiency No near-memory compute DRAM DIMM Limits performance Increasing gap Memory tied-down to xPUs DRAM DRAM DPU DRAM DRAM CPU GPU Cannot share
  • 52. © 2022 Marvell. All rights reserved. 52 Cloud data center memory challenges CXL accelerator Bandwidth per core declining No near-memory compute Memory tied-down to xPUs CXL is poised to address these issues CXL expander CXL pooling CXL expander CXL pooling CXL switch
  • 53. © 2022 Marvell. All rights reserved. 53 CXL Expander CXL Expander Addressing memory expansion  Scalable  Pluggable  Telemetry  Improved thermals  Mix-and-match DRAM  Config flexibility DRAM DIMM  Limited scalability  Not serviceable  No telemetry © 2022 Marvell. All rights reserved. 53 DIMM challenges CXL solution CXL expander controller CXL expander module Standard form factors
  • 54. © 2022 Marvell. All rights reserved. 54 CXL memory expanders improve performance Same capacity with greater bandwidth and utilization 1DPC + CXL expanders Today: 2 DIMMs per channel (2DPC) xPU 128GB  256GB 1DPC same bandwidth as 2DPC xPU 128GB Use PCI express to open bandwidth  256GB
  • 55. © 2022 Marvell. All rights reserved. 55 Sharing memory with CXL  Pool memory across multiple xPUs  Rescue under-utilized DRAM  Scale memory independent of xPUs Direct CXL memory pool 56 Core xPU 0 xPU N CXL pooling … xPUs CXL pooling  Flexible to connect resources into fabric  Scalable, serviceable  Enables fully composable infrastructure 56 Core xPU 0 xPU N CXL switch … Memory expanders Memory accelerators xPUs … … CXL Expander CXL Accelerator CXL switch CXL pooling
  • 56. © 2022 Marvell. All rights reserved. 56 Accelerating with CXL xPU CXL Accelerator CXL accelerator Compute engines  Coherent, efficient  Accelerate analytics, ML, search, etc.  Improves efficiency and TCO CXL I/O acceleration  DPU/NIC, SSD, …  Accelerate protocol processing  Composable I/O devices xPU
  • 57. © 2022 Marvell. All rights reserved. 57 Bandwidth per core declining No near-memory compute Memory tied-down to xPUs CXL solves data center memory challenges CXL is disrupting cloud data center architectures More bandwidth per core Optimize efficiency xPU Memory Compute Storage Fully composable Disaggregated memory xPU Ultimate performance CXL Accelerator Near-memory computation
  • 58. © 2022 Marvell. All rights reserved. 58 Comprehensive end-to-end CXL solutions  Expanders  Pooling  Switch  Accelerators  Custom compute  DPUs / SmartNICs  Electro-optics  Re-timers  SSD controllers Multi-billion $ opportunity CXL opportunities xPU 0 xPU N … Re-timer DPU CXL Accelerator CXL Expander CXL Switch … ↑ in the box ↓ out of box Optics CXL Pooling DPU CXL Expander SSD Cntrl
  • 59. © 2022 Marvell. All rights reserved. 59 Summary 1 CXL is disrupting cloud data center architectures 2 Uniquely positioned to enable end-to-end CXL in data center 3 CXL is driving the next multi-billion-dollar opportunity 4 CXL memory pooling demo at FMS Marvell Booth #607
  • 60. © 2022 Marvell. All rights reserved. 60 60 Memory pooling demo chassis Server Intel Archer City Sapphire Rapid Hosts Intel Archer City Sapphire Rapid Hosts Memory Appliance  Up to 6 Memory Devices (3 installed)  Up to 2 E3s Memory cards
  • 62.
  • 63. 63 Jonathan Prout Senior Manager, Memory Product Planning Samsung Electronics Uksong Kang Vice President, DRAM Product Planning SK Hynix Ryan Baxter Sr. Director Marketing Micron
  • 64. Expanding Beyond Limits With CXL™-based Memory August 2nd 2022 Jonathan Prout Memory New Business Planning Team
  • 65. Industry Trends and Challenges CXL™ (Compute Express Link) Introduction CXL™ Memory Use Cases Samsung’s CXL™ -based Memory Expander and SMDK (Scalable Memory Development Kit) Agenda
  • 66. Industry Trends and Challenges Artificial Intelligence Big Data Edge Cloud 5G Massive demand for data-centric technologies and applications Memory bandwidth and density not keeping up with increasing CPU core count Need a next gen interconnect for heterogeneous computing and server disaggregation
  • 67. Industry Trends and Challenges Normalized growth rate 0 0.5 1 1.5 2 2.5 3 3.5 4 2012 2013 2014 2015 2016 2017 2018 2019 2020 2021 CPU core count Memory channel BW per core New memory scaling solution is needed
  • 68. CXL™ Introduction CXL is a high performance, low latency protocol that leverages PCIe physical layer CXL is an open industry standard with broad industry support Processor PCle Connector PCIe channel PCIe Card CXL Card Type 1 Type 2 Type 3 Processor Processor Processor CXL CXL CXL Usages Usages Usages Protocols Protocols • CXL.io • CXL.cache • CXL.io • CXL.cache • CXL.memory • CXL.io • CXL.memory Accelerator Accelerator NIC Memory Buffer Cache Cache • PGAS NIC • NIC atomics • GP GPU • Dense computation • Memory BW expansion • Memory capacity expansion • Storage class memory DDR DDR DDR DDR DDR DDR HBM HBM Memory Memory Memory Memory Caching Devices / Accelerators Accelerators with Memory Memory Buffers Protocols
  • 69. CXL™ Type 3 Device Home Agent DDR DDR Host/CPU CXL Memory Expander CXL.io CXL.mem Device Memory Memory Controller Memory Controller CXL is a cache coherent standard, meaning the host and the CXL device see the same data seamlessly
  • 70. CXL™ Type 3 Device - Memory Expansion CXL enables systems to significantly scale memory capacity and bandwidth 8x 2DPC (DIMM/channels) Max. 8TB for 1CPU DDRx 512GB DDRx 512GB DDRx 512GB DDRx 512GB Max. 12TB for 1CPU DDRx 512GB DDRx 512GB DDRx 512GB DDRx 512GB Mem Ex 1TB Mem Ex 1TB Mem Ex 1TB Mem Ex 1TB 8x 2DPC (DIMM/channels) CPU CPU
  • 71. Current Use Cases: Capacity / Bandwidth Expansion IMDB Server xTB DRAM CPU 0 xTB DRAM CPU 1 IMDB Server xTB DRAM CPU 0 xTB DRAM CPU 1 IMDB Server yTB DRAM CPU 0 yTB DRAM CPU 1 ZTB CXL ZTB CXL IMC Server xGB DRAM CPU 0 xGB DRAM CPU 1 IMC Server xGB DRAM CPU 0 xGB DRAM CPU 1 IMC Server yGB DRAM CPU 0 yGB DRAM CPU 1 zGB CXL zGB CXL IMC Server xGB DRAM CPU 0 xGB DRAM CPU 1 IMC Server yGB DRAM CPU 0 yGB DRAM CPU 1 zGB CXL zGB CXL Capacity Expansion - TCO reduction Bandwidth Expansion – performance improvement zGB CXL zGB CXL zGB CXL zGB CXL
  • 72. CXL™ Memory Switching & Pooling CXL supports pooling for increased system efficiency Host CXL Switch CXL Memory Expander CXL Memory Expander DDR DDR Host Host Host DDR DDR CXL Switch CXL Memory Expander DDR DDR DDR DDR CXL Memory Expander CXL Memory Expander CXL Memory Expander CXL supports switching to enable memory expansion
  • 73. Future Use Cases: Tiering and Pooling IMC Server xTB DRAM CPU 0 xTB DRAM CPU 1 IMC Server xTB DRAM CPU 0 xTB DRAM CPU 1 IMC Server xTB DRAM CPU 0 xTB DRAM CPU 1 IMC Server yTB DRAM CPU 0 yTB DRAM CPU 1 IMC Server yTB DRAM CPU 0 yTB DRAM CPU 1 IMC Server yTB DRAM CPU 0 yTB DRAM CPU 1 zTB CXL zTB CXL MEMORY BOX zTB CXL zTB CXL Memory Tiering* – Efficient expansion Memory Pooling - Increased utilization *Hot data on DRAM Warm data on cost-optimized, CXL-attached media
  • 74. Samsung CXL™ Proof of Concept Supporting ecosystem growth with CXL-based memory functional sample Form Factor – EDSFF (E3.S) / AIC Media – DDR4 Module Capacity – 128 GB CXL Link Width – x16 Specification: CXL 2.0 Product Features Ecosystem enablement success Shipped 100+ samples since availability in 3Q ‘21 Successfully tested with a broad range of server, system, and software providers across the industry
  • 75. Samsung CXL™ Solution Leading the industry toward mainstream adoption of CXL-based memory Form Factor – EDSFF (E3.S) Media – DDR5 Module Capacity – 512 GB CXL Link Width – x8 Maximum CXL Bandwidth – 32GB/s Specification – CXL 2.0 Other Features – RAS, Interleaving, Diagnostics, and more Availability – Q3’22 for evaluation/testing Product Features
  • 76. SMDK- Scalable Memory Development Kit Datacenterto EdgeApplications(IMDB,DLRM,ML/AI,etc) CXLKernel Compatible API Intelligent Tiering Engine Optimization API Memory Pool Mgmt Normal ZONE CXL.Mem ZONE CPU DRAM Server Main Board Memory Expander SMDK CXL Allocator
  • 77. Application Benchmark Test System #1 Redis DDR5 (32GB) CXLMem (64GB) Client Set 60GB Get 60GB System #1 Client System #2 Ethernet Redis DDR5 (32GB) Redis DDR5 (32GB) Redis DDR5 (32GB) Cluster Set 60GB Get 60GB vs Redis Single Node (DDR+CXL) Redis Cluster (DDR x 3) CXL Link Test Scenario Test Result 30 455 699 27 496 659 49 172 186 66 173 189 128B 4KB 1MB 128B 4KB 1MB Set GET Scale-up vs Scale-out (Redis) Single Node(DRAM + CXL) 2 Node Cluster(DRAM) MB/s Scale-up performance 2.7x better than scale-out (4KB chunk size)
  • 78.  Memory capacity and bandwidth per core is lagging industry demand  Conventional scaling technologies unable to meet the challenge  CXL is the most promising technology to address the gap  Capacity/bandwidth expansion, tiering, pooling use cases  Samsung is leading the advancement of CXL-based memory solutions  PoC, ASIC-based module, and SMDK  Tested PoC with a broad range of partners for more than 1 year  Samsung enthusiastically welcomes further collaboration with the industry  Visit Samsung booth to learn more about Samsung’s Memory Expander and SMDK Key Takeaways
  • 79.
  • 80. Uksong Kang VP, DRAM Product Planning August 2, 2022 Adding New Value to Memory Subsystems through CXL
  • 81. © SK hynix Inc. CXL Creating New Gateway for Increased Efficiency • CXL is opening up a new gateway toward efficient use of computing, acceleration and memory resources resulting in overall TCO reduction in data centers ASIC (NPU/DPU) GPU FPGA CPU CXL Memory Computing Acceleration Memory Pools Servers in Remote Rack 81/16
  • 82. © SK hynix Inc. New Values in Memory through CXL • CXL creates many new additional opportunities in memory subsystems beyond what is possible today in existing server platforms #1: Memory bandwidth and capacity expansion #4: Memory-as-a- Service (MaaS) #2: Memory media differentiation #3: Controller differentiation 82/16
  • 83. © SK hynix Inc. Memory Bandwidth and Capacity Gap • Increase in SoC core counts requires continued increase in memory bandwidth and capacity, but the gap between such requirements and platform provisioning capability is growing 0 1 2 3 4 5 6 7 8 9 10 '20 '21 '22 '23 '24 '25 '26 '27 '28 '29 '30 Capacity per Socket [A.U.] CAP(General Purpose) CAP(Memory Intensive) 2 DIMM per CH 1 DIMM per CH 12CH per SKT 8 CH. per SKT 16 CH (??) 0 1 2 3 4 5 6 7 8 9 10 '20 '21 '22 '23 '24 '25 '26 '27 '28 '29 '30 Bandwidth per Socket [A.U.] BW(Intel SKU DIMM 지원 예상) 12CH per SKT 8 CH. per SKT 16 CH (??) BW provisioning capability BW requirement 256GB/DIMM, 12CH, 1DPC 128GB/DIMM, 12CH, 1DPC 64GB/DIMM, 8CH, 2DPC 64GB/DIMM, 8CH, 1DPC 64GB/DIMM, 12CH, 1DPC 6.4Gbps/IO, 12CH 6.4Gbps/IO, 8CH 9.6Gbps/IO, 12CH 11.2Gbps/IO, 12CH Memory intensive WL General purpose WL Compute intensive WL Year Year Memory Capacity Requirement Memory Bandwidth Requirement Gap growing 83/16
  • 84. © SK hynix Inc. #1: Memory Bandwidth and Capacity Expansion • CXL memories allow continued scale-out in memory bandwidth and capacity beyond physical limitations of traditional server platforms 9 8 7 6 5 4 3 2 1 L/RDIMM L/RDIMM L/RDIMM L/RDIMM L/RDIMM L/RDIMM L/RDIMM L/RDIMM CPU with 6.4Gbps+ Local DDRx L/RDIMM L/RDIMM L/RDIMM L/RDIMM L/RDIMM L/RDIMM L/RDIMM L/RDIMM CXL Solution CXL Solution CXL Solution CXL Solution # of channels per CPU socket does not scale- out due to FF limit # of DIMMs per channel decreased from 2 to 1 due to I/O SI issues Reaching max DIMM power and thermal limit PCIe-Gen5~6 Motivation for capacity expansion with CXL DDR5 x8ea + CXL x1ea DDR5(x8ea) CXL-BME CXL-BME CXL-BME CXL-BME CPU DDR5 x8ea + CXL x2ea DDR5 x8ea + CXL x3ea DDR5 x8ea + CXL x4ea 1R1W 2R1W 1R1W 2R1W 1R1W 2R1W 1R1W 2R1W 4800 5600 6400 BW Expansion Motivation for bandwidth expansion with CXL Memory B/W [A.U] 84/16
  • 85. © SK hynix Inc. #2: Memory Media Differentiation • CXL is a memory agnostic non-deterministic protocol allowing differentiation in memories fulfilling the demands of various server workloads for the future • Different memory media can provide different performance, capacity, and power design trade-offs CPU D D R Standard or Custom memory media . ..… DDR I/F: Memory aware and deterministic CXL I/F: Memory agnostic and non-deterministic (allows decoupling of memory media from CPU) ..… D D R C X L ..… C X L … [1st Tier] [2nd Tier] 64b 64b 85/16
  • 86. © SK hynix Inc. #3: Controller Differentiation • Enhanced RAS (ECC, PPR, ECS), security, lower power, computation, processing, acceleration features can be included inside the CXL controller for added values CPU DIMM DIMM DIMM DIMM Memory Media CXL (PCIe) CXL CMS Engine CXL CTRL JEDEC Computational memory solution Enhanced RAS ECC Low Power Processing PPR Controller Computation Acceleration Security Value-add features in CXL controller 86/16
  • 87. © SK hynix Inc. #4: Memory-as-a-Service (MaaS) • CXL allows building composable scalable rack-scale memory pool appliances which can be populated with different types of memory media • Variable memory capacity can be effectively allocated within the memory pool to different xPUs through memory virtualization CXL Memory CXL Fabric Disaggregated Memory Pool Fabric Networ k DRAM / SCM DRAM / SCM Disaggregated Storage Pools Another HPC Server CXL Switch Accelerator Interconnect (e.g.,NVLink) NAND / SCM SSD NAND / SCM DDR HBM CP U DDR HBM CP U HBM GP U HBM NP U HBM xP U DPU CPU Interconnect Ethernet (Service) HPC Server Memory pool appliance: • Populate with different memory media based on user’s choice • Allocate variable memory capacity through memory virtualization 87/16
  • 88. © SK hynix Inc. SK hynix’s Future Paths in CXL for Increased Values • Start with CXL memory market enabling, followed by market expansion, and value addition • First product is planned with standard DDR DRAMs followed by products with new value-added features to be defined further through close collaboration with eco- system partners and customers STAGE 1: Ecosystem Enabling Evaluate pilot devices and launch first CXL memory for TTM Role AIC riser card, EDSFF E3.S Memor y F/F Standard DRAM: DDR5, DDR4 Memor y Media CXL2.0 on PCIe gen5 x8/x16 Host I/F STAGE 2: Market Expansion Introduce value-added CXL memory for expansion & basic pooling AIC riser card, EDSFF E3.S, E1.S Optimized memory for better TCO CXL2.0 on PCIe gen5 x8 or CXL3.0 on PCIe gen6 x4 STAGE 3: Value Addition Expand value-add with stronger memory RAS, security, near- memory processing, fabric-attach etc. AIC riser card, EDSFF + many more Optimized memory for better TCO CXL3.0 on PCIe gen6 x4/x8 88/16
  • 89. © SK hynix Inc. SK hynix’s Vision toward Future CXL Memory Solutions • Envisioning four different types of CXL memory solutions for different use cases • First CXL memory will be bandwidth memory expansion based on DDR5 DRAM media followed by capacity expansion, memory pooling, and computational memory solution CXL-CMS (Computational Memory Solution) CXL-CME (Capacity Memory Expansion) CXL-BME (Bandwidth Memory Expansion) DDR5 class BW & energy alleviates loaded latency Memory expansion w/o tiering Capacity low Higher capacity, lower W/GB, advanced RAS than DDR5 Memory expansion w/ tiering Capacity mid/high Optimized memory media and module FF for pooling Memory pooling Capacity high Near memory processing for AI and data analytics New value for heterogeneous computing era TBD CXL-MPS (Memory Pooling Solution) 89/16
  • 90. © SK hynix Inc. SK hynix’s First CXL Memory now Ready for Take-off • CXL-BME is a 96GB bandwidth memory expansion module integrated with cost- effective single-die packaged DRAMs • DDR5-class bandwidth, DDR5-class latency within 1 NUMA hop, outperforming in BW/$ and BW/power EDSFF E3.S 2T Product: - CXL2.0 on PCIe gen5 x8 - 96GB (2Rx4-like), 1CH 80-bit DDR5 - SDP x4 PKG with 24Gb DDR5 die - 30GB/s+ random BW DDR 5 SDP CTRL PMIC, etc. PMIC, etc. DDR 5 SDP DDR 5 SDP DDR 5 SDP DDR 5 SDP DDR 5 SDP DDR 5 SDP DDR 5 SDP DDR 5 SDP DDR 5 SDP DDR 5 SDP DDR 5 SDP DDR 5 SDP DDR 5 SDP DDR 5 SDP DDR 5 SDP DDR 5 SDP DDR 5 SDP DDR 5 SDP DDR 5 SDP DDR 5 SDP DDR 5 SDP DDR 5 SDP DDR 5 SDP DDR 5 SDP DDR 5 SDP DDR 5 SDP DDR 5 SDP DDR 5 SDP DDR 5 SDP DDR 5 SDP DDR 5 SDP DDR 5 SDP DDR 5 SDP DDR 5 SDP DDR 5 SDP DDR 5 SDP DDR 5 SDP DDR 5 SDP DDR 5 SDP Front side Back side 90/16
  • 91. © SK hynix Inc. SK hynix’s First CXL Memory in E3.S Form Factor 91/16 • 96GB E3.S module with cost- effective single-die packages • Based on DDR5 24Gb DRAM with most advanced 1anm process technology SK hynix Newsroom, 08/01/22
  • 92. © SK hynix Inc. HMSDK for Increased Memory Performance • Performance improved with CXL memory expansion + HMSDK (SW solution) on high-bandwidth workloads • HMSDK supports memory use ratio configuration, optimizing page interleaving to be more BW-aware Performance of CXL-BME w/ system memory BW & Allocated Memory Size HMSDK (Heterogeneous Memory Software Development Kit) 92/16
  • 93. © SK hynix Inc. CXL Memory System Demo with HMSDK • HMSDK’s library stores data to both DRAM and CXL memory based on user-defined ratio configuration • Check out the SK hynix booth (#509) for the video demo presentation • Also, at the booth check out the live demo on research studies regarding dynamic, elastic CXL memory allocation Advanced Research Project: Elastic CXL Memory Solution HMSDK Example: 2:1 Ratio in DRAM:CXL Memory 93/16
  • 94. © SK hynix Inc. Building Strong CXL Eco-system with Industry Partners • Close collaboration with all CXL eco-system partners across the entire system hierarchy is essential for successful launch of future CXL products • SK hynix is committed to be a key player in building such eco-system by delivering differentiated value-added memory products to the industry Software HW Platform Memory Controller xPU Storage IP 94/16
  • 95. © SK hynix Inc. Summary • CXL is creating new values through memory bandwidth and capacity expansion, memory differentiation, controller differentiation, and Memory-as-a-Service • SK hynix is excited and committed to contribute to the entire CXL eco-system by providing many efficient scalable CXL memory solutions with differentiated value- added memory products from memory expansion to memory pooling, such as BME, CME, MPS, and CMS • SK hynix is pleased to be able to announce its first cost-effective 24Gb 1anm DDR5 based 96GB CXL memory in E3.S form factor, which is just the beginning toward providing more valuable scalable memory solutions to the entire industry in the future “Check out the SK hynix FMS demo on CXL memory with HMSDK, showcasing performance improvements of CXL memory with optimized BW-aware SW solutions” 95/16
  • 97. © 2022 Micron Technology, Inc. All rights reserved. Information, products, and/or specifications are subject to change without notice. Micron, the Micron logo, and all other Micron trademarks are the property of Micron Technology, Inc. All other trademarks are the property of their respective owners. 97 Ryan Baxter– Senior Director of Marketing, Data Center Flash Memory Summit | August, 2022 CXL: Enabling New Pliability in the Modern Data Center
  • 98. Micron Confidential Micron Confidential DDR memory HBM memory NAND storage Compute optimized server AI optimized server Memor y Storage CY20 CY25 CY30 AI servers Other servers CY20 CY25 Data centers = memory centers Micron’s global data center market Memory and storage growth will never be as slow as before, and possibly never as fast as now. Hyperscale adoption of AI 2x 16% CAGR 7x storage 6x memory Drives memory & storage growth $180B $140B $100B $60B $20B Sources: 1. Hyperscale AI Adoption: Internal Bain research 2. Server Content referencing two published AWS EC2 hardware configs: AWS Instance types 3/1/22 Standard server Config = 256GB DRAM , 0GB HBM, 1.2TB SSD Storage AI Server Config = 1152 DRAM+ 320GB HBM, 8TB SSD Storage 3. Global Data Center Market = Micron MI Market Model
  • 99. Micron Confidential Micron Confidential Memory-centric innovations in the data center 99 99 Applying the power of software-defined infrastructure Increasing latency and capacity Increasing cost and bandwidth Hot data Warm Data Cold data Capacity storage Fast storage CXL-attached memory Archival storage In-pkg direct attach Near memory Far memory Compute Compute Memory Memory Storage Storage Storage Memory Memory Memory Storage Storage Compute Compute Compute Storage Storage Storage Memory Memory Compute Compute Compute Memory Memory Memory Compute Compute Storage Storage Storage Compute Modular Composable Performance Efficient
  • 100. Micron Confidential Micron Confidential CXL Use Cases 100 Alternative to Stacking Stacking drives non-linear cost/bit Provide Ultra-High Capacity Expansion beyond 4H 3DS TSV Add Memory Bandwidth CXL enables memory attached points Balance Memory Capacity/BW  DRAM capacity/BW on demand  Balances GB/core and BW Reduce System Complexity  Fewer memory channels  Thermally optimized solutions Enablement After 2DPC Future 50% slot reduction
  • 101. Micron Confidential Micron Confidential 101 Increasing Latency and Capacity Increasing Cost and Bandwidth Hot Dat a Warm Data Col d Dat a Capacity Storage Fast Storage Memory Expansion Bandwidth Memory Archival Storage HDD Storage Near Memory Cache Memory Ultra Wide Memory Bus SATA/Ethernet Ethernet Traditional Memory Bus SSD (QLC) SSD (TLC) CXL Attached Memory The Industry’s Fully Composable, Scalable Vision Memory and Storage Hierarchy Memory Expansion Memory Expansion Memory Expansion Bandwidth Memory Bandwidth Memory Near Memory Near Memory
  • 102. Micron Confidential Micron Confidential Micron’s “data centric” portfolio 102 Compute Storage Networking Hyperscale Enterprise & gov Communication Edge Acceleration Deep customer relationships Ecosystem enablement Silicon technology Emerging memory Advanced packaging Tech node leadership HBM GDDR LPDDR DDR TLC NAND QLC NAND Standards body leadership A complete portfolio built on silicon technology, world class manufacturing, and a diversified supply chain.
  • 103. Micron Confidential Micron Confidential Micron is committed to partnering with the industry; ultimately serving and delighting our customers 103 Strategic ecosystem partnerships  Define, develop and prove technologies  DDR, LP, & GDDR  GPU Direct Storage  Enable differentiated solutions  Extend the ecosystem Industry organizations  Provide leadership in industry organizations to enable scalable advancement
  • 104. 104
  • 105. Arvind Jagannath Product Management VMware Charles Fan CEO & Co-founder MemVerge Manoj Wadekar SW-Defined Memory Workstream Lead, OCP, Storage Architect, Meta
  • 106. Afternoon Agenda 106 Start End Name Title Organization 3:25 3:45 Arvind Jagannath Cloud Platform Product Management 3:45 4:05 Mahesh Wagh Senior Fellow 4:05 4:25 Charles Fan CEO & Co-founder 4:25 4:45 Manoj Wadekar SW-Defined Memory Workstream Lead, OCP, Storage Architect, Meta 4:45 5:10 Siamak Tavallaei Panel Moderator President, CXL Consortium, Chief System Architect, Google Infrastructure 5:10 5:35 Chris Mellor Panel Moderator Editor Session SPOS-102-1 on the FMS program
  • 107. Virtual CXL presentations now available on the MemVerge YouTube channel 107
  • 108. Confidential │ © VMware, Inc. 108 Towards a CXL future with VMware
  • 109. Confidential │ © VMware, Inc. 109 • This presentation may contain product features or functionality that are currently under development. • This overview of new technology represents no commitment from VMware to deliver these features in any generally available product. • Features are subject to change, and must not be included in contracts, purchase orders, or sales agreements of any kind. • Technical feasibility and market demand will affect final delivery. • Pricing and packaging for any new features/functionality/technology discussed or presented, have not been determined. Disclaimer
  • 110. Confidential │ © VMware, Inc. 110 VMware Competencies SmartNICs and Accelerators Virtualization ideal for transparent tiering Cluster-wide DRS helps load balance and mitigate risks Strong Ecosystem of partners Passthrough devices GPUs, sharing and Assignable hardware
  • 111. Confidential │ © VMware, Inc. 111 Digital Transformation of Businesses Explosive growth in data 1 NetworkWorld. “IDC: Expect 175 zettabytes of data worldwide by 2025.” December 2018. networkworld.com/article/3325397/idc-expect-175-zettabytes-of-data-worldwide-by-2025.html. 2 IBM. “3D NAND Technology – Implications to Enterprise Storage Applications.” 2015 Flash Memory Summit. 2014. flashmemorysummit.com/English/Collaterals/Proceedings/2015/20150811_FM12_Yoon.pdf. By 2025, IDC predicts 30% of global data will be real time! Low Latency for Mission-critical Transactions Need to Deliver Business Value in Real-time Highly Parallel Processing on Very Large Set Of Data Deliver Risk and Opportunity for Future 175 ZB by 2025, With 26% CAGR1,2 Artificial Intelligence Business Intelligence Real-time Analytics Machine Learning Big Data Analytics Transactional Processing Predictive Analytics Edge Processing Time Series Virtualization Hybrid Cloud
  • 112. Confidential │ © VMware, Inc. 112 Trends Explosive Growth of the Data Desire to get more out of the data More data need to be processed in real-time Software has led the innovation in the cloud, Hardware is catching up Need to scale infrastructure to address data growth More in-memory computing to process faster Need Enterprise class monitoring and remediation DRAM is expensive and lacks high densities Customer Needs Trends Vs Customer Needs Digital Transformation
  • 113. Confidential │ © VMware, Inc. 113 VMware’s Big Memory Vision starts with Software Tiering
  • 114. Confidential │ © VMware, Inc. 114 vCenter Tiered Memory ESXi Memory Hardware DDR CXL attached/ Remote Memory/ Slower Memory CXL or RDMA over Ethernet NVMe Pooled NVMe Container CRX • Higher Density, more capacity • Lower TCO • Negligible Performance Degradation • Transparent – Single volatile memory address • No Guest or Application changes • Run any Operating System • ESX internally handles page placement • DRS and vMotion to mitigate risks • Tiering heuristics fed to DRS • Ensures fairness across workloads • Consistent performance • Zero Configuration changes • No special tiering settings • Processor specific monitoring • vMMR monitors at both VM- and Host-levels Software Tiering
  • 115. Confidential │ © VMware, Inc. 115 Phase-1: Host Local memory tiering Software Tiering: How Does it Work? ESXi Kernel Memory Management Distributed Resource Scheduler (DRS) Scheduler Hypervisor ESXi Kernel Memory Management DRS Scheduler Hypervisor Tiering aware Tiering aware Tiering aware Container CRX DDR Lower Cost Memory Container CRX DDR Memory Hardware Memory Hardware Windows Linux Windows Linux 1st tier 1st tier 2nd tier Lower Cost Memory
  • 116. Confidential │ © VMware, Inc. 116 Future Software Tiering: How Does it Work? ESXi Kernel Host Memory Management Cluster-level DRS Scheduler Hypervisor Tiering aware Tiering aware Tiering aware Container CRX DDR CXL-attached DRAM CXL-attached PMem Pooled NVMe NVMe Lower Cost Memory
  • 117. Confidential │ © VMware, Inc. 117 Future Various Tiering Approaches DRAM Lower cost/slower memory 1 DRAM NVMe 2 DRAM Lower cost/slower memory 3 DRAM Remote Memory/Host sharing 4 NVMe DRAM CXL-attached device/Pool 5 DRAM 6 NVMe-OF ESX Kernel
  • 118. Confidential │ © VMware, Inc. 118 Future Software Tiering with CXL 2.0 ESX Host Single Uniform Memory Address Space DRAM CXL/NVMeOF/Shared DRAM/ * Remote memory/ Slower Memory CXL CXL Switch X Shared and Pooled Memory *Prototyping with CXL1.1
  • 119. Confidential │ © VMware, Inc. 119 How it all fits together? Managed part of end-to-end vSphere workflow! Host VM 1 VM 2 Tier 2 Tier 1 VM pages Tier 2 Tier 1 Host 1 Host 2 Host 3 Cluster VMs DRS Tier sizing Page placement choose host for VM choose size for tier choose tier for page VM monitor Host monitor DRS monitor Administrator monitor tier bandwidth of VM monitor tier bandwidth of host
  • 121.
  • 122. | AMD | Data Center Group| 2022 [Public] AGENDA ◢ Paradigm Shift and Memory Composability Progression ◢ Runtime Memory Management ◢ Tiered Memory ◢ NUMA domains and Page Migration ◢ Runtime Memory Pooling
  • 123. | AMD | Data Center Group| 2022 [Public] PARADIGM SHIFT ◢ Scalable, high-speed CXL™ Interconnect and PIM (Processing in Memory) contribute to the paradigm shift in memory intensive computations ◢ Efficiency Boost of the next generation data center ◢ Management of the Host/Accelerator subsystems combined with the terabytes of the Fabric Attached Memory ◢ Reduced complexity of the SW stack combined with direct access to multiple memory technologies
  • 124. | AMD | Data Center Group| 2022 [Public] MEMORY COMPOSABILITY PROGRESSION Host R P Buffer Host R P End Point View Mem Direct Attach Memory Scale- Out Mem Pooling & Disaggregation • Addresses the cost and underutilization of the memory • Multi-domain Pooled Memory - memory in the pool is allocated/ released when required • Workloads/ applications benefiting from memory capacity • Design optimization for {BW/$, Memory Capacity/$, BW/core}
  • 125. | AMD | Data Center Group| 2022 [Public] RUNTIME MEMORY MANAGEMENT
  • 126. | AMD | Data Center Group| 2022 [Public] TIERED MEMORY NUMA Domains Page Migration
  • 127. | AMD | Data Center Group| 2022 [Public] TIERED MEMORY NUMA DOMAINS • Exposed to the HV, Guest OS, Apps • OS-assisted optimization of the memory subsystem • Base on ACPI objects - SRAT/SLIT/HMAT
  • 128. | AMD | Data Center Group| 2022 [Public] TIERED MEMORY PAGE MIGRATION CCD CCD CCD CCD IOD CCD CCD CCD CCD IOD Near Mem Far Mem NUMA domains PROC CXL mem Far mem CXL CXL Far Mem Near Mem Memory Expansion PROC Far Mem Near Mem Mem as a Cache CCD CCD CCD CCD IOD CCD CCD CCD CCD IOD Near Mem Far Mem CXL mem Far Mem CXL CXL Near Mem NUMA domains MISS Shorter latency Longer latency Near Mem ‒ Active page migration between Far and Near memories ‒ HV/Guest migrates hot pages into Near Mem and retire cold pages into Far Mem ‒ Focused DMA to transfer required datasets from the Far to Near Mem SW Assisted Page Migration ‒ HW managed Hot Dataset ‒ Near Mem Miss redirected to the Far Mem ‒ App/ HV unawareness DRAM as a cache optimization
  • 129. | AMD | Data Center Group| 2022 [Public] TIERED MEMORY SW ASSISTED PAGE MIGRATION Combined HW /SW tracking of the Memory Page Activity/ “hotness” Detecting Page(s) candidates for migration Requesting HV/Guest permission to migrate HV/Guest API to Security Processor to Migrate the Page(s) Migration – stalling accessed to specific pages/ copying the data Page “hotness” –combined action of the HW and SW tracking HV/Guest authorization of the migration Security Processor as a root of trust for performing the migration
  • 130. | AMD | Data Center Group| 2022 [Public] RUNTIME MEMORY ALLOCATION/POOLING FABRIC ATTACHED MEMORY Host Host Tier2 Mem Multi-Headed CXL controller  Multiple structures serve for fabric level memory pooling  Combination of the private (dedicated to specific host) and shareable memory ranges  Protection of the memory regions from unauthorized guests and hypervisor  Allocation/Pooling of the memory ranges between Hosts is regulated by the fabric aware SW layer (i.e., Fabric Manager)
  • 131. | AMD | Data Center Group| 2022 [Public] RUNTIME MEMORY ALLOCATION/POOLING FABRIC ATTACHED MEMORY  Memory Allocation Layer – communicates <new memory allocation per Host> based on the system/apps needs  Fabric Manager – adjusts the fabric settings and communicates new memory allocations to the Host SW  Host SW - Invokes Hot Add/Hot Removal method to increase/ reduce (or offline) an amount of memory allocated to the Host  In some instances, Host SW can directly invoke SP to adjust the memory size allocated to the Host  On–die Security Processor (Root of Trust) is involved in securing an exclusive access to the memory range
  • 132. | AMD | Data Center Group| 2022 [Public] SUMMARY Composable Disaggregated Memory is the key approach to address the cost and underutilization of the System Memory Further investment in the Runtime Management of the Composable & Multi-Type memory structures is required to maximize the system level performance across multiple use-cases Application Transparency is another goal of efficient Runtime Management by abstracting away an underlying fabric/memory infrastructure
  • 133.
  • 134. CXL: The Dawn of Big Memory Charles Fan Co-founder & CEO MemVerge
  • 135. The Rise of Modern Data-Centric Applications 135 EDA Simulation AI/ML Video Rendering Geophysical Genomics Risk Analysis CFD Financial Analytics
  • 136. Opening the Door to the Era of Big Memory 136 Abundant Composable Available
  • 137. What happened 30+ years ago 137 Just Bunch of Disks Storage Area Network (SAN) Advanced Storage Services Fiber Channel Storage Data Services
  • 138. Where We Are Going… 138 Just Bunch of Disks New Memory Storage Area Network (SAN) Pooled Memory Advanced Storage Services Memory-as- a-Service Fiber Channel Storage Data Services CXL Memory Data Services
  • 139. Disaggregated & Pooled Memory Memory Pool Computing Servers Pool Manager CXL Switch
  • 140. Dynamic Memory Expansion Reduces Stranded Memory Before CXL Use Case #1 Used Memory Memory not used * H. Li et. Al. First-generation Memory Disaggregation for Cloud Platforms. arXiv:2203.00241v2 [cs.OS], March 5, 2022 Azure Paper*: • Up to 50% of server costs is from DRAM alone • Up to 25% of memory is stranded • 50% of all VMs never touch 50% of their rented memory
  • 141. Dynamic Memory Expansion Reduces Stranded Memory After CXL Used Memory Memory not used Use Case #1 Memory disaggregation can save billions of dollars per year.
  • 142. Memory Auto-healing With Transparent Migration 2. Provision a new memory region from the pool 1. A memory module is becoming bad: error rate going up. 3. Transparent migration of memory data 4. Memory Auto-healing complete Use Case #2
  • 143. Distributed Data Shuffling Local SSD Use Case #3 Before CXL Local SSD Local SSD Network Network Storage I/O w/ Serialization Deserialization
  • 144. Using Shared Memory Read Use Case #3 After CXL S. Chen, et. Al. Optimizing Performance and Computing Resource Management of in-memory Big Data Analytics with Disaggregated Persistent Memory. CCGRID'19 Project Splash is open source: https://github.com/MemVerge/splash
  • 145. Key Software Components 145 Memory Snapshot Memory Tiering Resource management Transparent Memory Service Operating Systems App App App App CXL Switch CXL Computing Hosts Memory Pool Memory Provisioning & Sharing Capacity Optimization Global Insights Security Data Protection Memory Machine Pool Manager Operating System Pool Server Memory Viewer App profiler Hardware API Integration Memory Sharing
  • 146. Key Software Components 146 Memory Snapshot Memory Tiering Resource management Transparent Memory Service Operating Systems App App App App CXL Switch CXL Computing Hosts Memory Pool Memory Provisioning & Sharing Capacity Optimization Global Insights Security Data Protection Memory Machine Pool Manager Operating System Pool Server Memory Viewer App profiler Hardware API Integration Memory Sharing
  • 147. 14 7 Memory Capacity Expansion • Software-defined Memory Pool with intelligent Auto-tiering • No application change required Accelerate Time-to-discovery • Transparent checkpointing • Roll-back, restore and clone anywhere any time Reduce Cloud Cost by up to 70% • Enable long-running applications to use low-cost Spot instances • Integration with cloud automation and scheduler to auto-recover from CSP preemptions Memory Machine™ Memory Snapshot Service Memory Tiering Service System & Cloud Orchestration Service Transparent Memory Service Linux Compute Memory Storage HBM DDR CXL Genomics EDA Geophysics Risk Analysis Video Rendering Others CPU GPU xPU SSD HDD Announcing Memory Machine Cloud Edition
  • 148. 14 8 Memory Machine™ Memory Snapshot Service Memory Tiering Service System & Cloud Orchestration Service Transparent Memory Service Linux 64GB of DDR5 DRAM 64GB of CXL DRAM Expander Card (Montage Technologies) MLC Memory Latency Checking Early Results Running Memory Machine on CXL Next-Gen Server Streams Microbenchmark Application Redis
  • 149. Early Results Running Memory Machine on CXL 149 0 5 10 15 20 25 30 35 40 45 ALL Reads 3:1 Reads- Writes 2:1 Reads- Writes 1:1 Reads- Writes Stream-triad like Throughput (GB/S) Workload Types MLC (Memory Latency Checker) Results DDR5 Only CXL Only DDR+CXL Memory Machine Auto-Tiering 0 5 10 15 20 25 Copy: Scale: Add: Triad: Throughput (GB/s) Workload Types Stream Results DDR5 Only CXL Only DDR+CXL Memory Machine Auto-Tiering
  • 150. Live Demos at MemVerge Booth 150
  • 151. Key Software Components 151 Memory Snapshot Memory Tiering Resource management Transparent Memory Service Linux App App App App CXL Switch CXL Computing Hosts Memory Pool Memory Provisioning & Sharing Capacity Optimization Global Insights Security Data Protection Memory Machine Pool Manager Linux Pool Server Memory Viewer App profiler Hardware API Integration
  • 153. 153 Application Memory Heatmap Memory Viewer Free Download: http://www.memverge.com/MemoryViewer
  • 154. Software Partner to the CXL Ecosystem 154 Founded in 2017 to develop Big Memory software Memory Snapshot Memory Tiering Resource management Transparent Memory Service Memory Provisioning & Sharing Capacity Optimization Global Insights Security Data Protection Memory Machine Pool Manager Memory Viewer App profiler Hardware API Integration Memory Sharing Big Memory Software Processors: Servers: Switches: Memory Systems: Clouds Big Memory Apps Standards Bodies
  • 155. Memorize the future. Please visit our booth for the live demos
  • 156. Enabling Software Defined Memory Real World Use Cases with CXL 156 Manoj Wadekar Hardware System Technologist
  • 157. Agenda • SDM Workstream within OCP • Hyperscale Infra - Needs • Memory Hierarchy to address the needs • SDM Use cases • SDM Activities and Status 157
  • 158. SDM Team Charter - SDM (Software Defined Memory) is a workstream within Future Technology Initiatives within OCP Charter: - Identify key applications driving adoption of Hierarchical/ Hybrid memory solutions - Establish architecture and nomenclature for such Systems - Offer benchmarks that enable validation of novel ideas for HW/SW solutions for such systems
  • 159. Hyperscale Infrastructure 159 • Application performance and growth depends on ⎻ DC, System, Component performance and growth ⎻ Compute, Memory, Storage, Network.. • Focusing on Memory discussion Ads FE Web Database/ Cache Inference Ads Data Wearhouse Data Wearhouse Database/ Cache Storage Training
  • 160. Memory Challenges 160 Bandwidth and Capacity • The Gap between bandwidth and capacity is widening • Applications ready to trade between bandwidth and capacity Power • DIMMs consume significant share of rack power ⎻ DDR5 exacerbates this • Applications co-design to achieve higher capacity at optimized power TCO • Cost impact of min capacity increase and Die/ECC overheads • Applications can trade performance/capacity to achieve optimal TCO
  • 161. “Memory” Pyramid today 161 Capacity driven Bandwidth driven DRAM NAND SSD Cache HBM Databases, Caching.. GP Compute, Training.. Inference, Caching..
  • 162. Use Case Examples 162 • Caching (e.g. Memcache/Memtier (Cachelib), Redis etc.) ⎻ Need to achieve higher QPS while satisfying “retention time” ⎻ Higher memory capacity needed ⎻ Current solutions include ”tiered memory” with DRAM+NAND, but need load/store • Databases (E.g. RocksDB/MongoDB etc.) ⎻ Need to achieve efficient storage capacity per node and deliver QPS SLA ⎻ Higher amount of memory enables more storage per node • Inference (E.g. DLRM) ⎻ Petaflops and Number of parameters are increasing rapidly ⎻ AI Models are scaling faster than the underlying memory technology ⎻ Current solutions include ”tiered memory” with DRAM+NAND, but need load/store
  • 163. AI at Meta ● Across many applications/services and at scale → driving a portion of our overall infrastructure (both HW and SW) ● From data centers to the edge Keypoint Segmentation Augmented Reality with Smart Camera
  • 164. ● Compute, Memory BW, Memory Capacity, all scale for frontier models ○ Scaling typically is faster than scaling of technology ● The rapid scaling requires more vertical integration from SW requirements to HW design Problem Statement: AI workloads scale rapidly
  • 165. DLRM Memory Requirements ● Bandwidth 1. Considerable portion of capacity needs high BW Accelerator memory. 2. Inference has a bigger portion of the capacity at low Bandwidth. More so than training. ● Latency 3. Inference has a tight latency requirement, even on the low BW end
  • 166. System Implications of DLRM Requirements ● A tier of memory beyond HBM and DRAM can be leveraged, particularly for inference ○ Higher latency than main memory. But still tight latency profile (e.g TLC Nand Flash does not work) ○ Trade off perf for density ○ This does not negate the Capacity and BW demand for HBM and DRAM
  • 167. “Tiered Memory” Pyramid with CXL 167 Capacity driven Bandwidth driven Databases, Caching.. GP Compute, Training.. Inference, Caching.. BW Memory NAND SSD Capacity Memory DRAM Cache CXL Attached HBM • Load/store interface • Cache line read/writes • Scalable • Heterogeneous • Standard interfaces
  • 168. Memory Technologies 168 Compute Storage Training Inference DDR4 DDR5 HBM CXL+DDR SCM (PCIe/CXL) [Exploration Phase]
  • 169. 169 OCP SDM activity and progress • SDM’s focus: Apply emerging memory technologies in the development of use cases • The OCP SDM group has three real-world memory focus areas: ⎻ Databases/Caching ⎻ AI/ML & HPC ⎻ Virtualized Servers • SDM Team Members: AMD, ARM, Intel, Meta, Micron, Microsoft, Omdia, Samsung, VMWare, • Vendors are demonstrating CXL Capable CPUs and devices • Meta and others are investigating solutions to real world memory problems SDM – Enabling memory solutions from emerging memory technologies
  • 170. 170 Ben Bolles Executive Director, Product Management Liqid Gerry Fan Founder, CEO Xconn Technologies Siamak Tavallaei Panel Moderator President CXL Consortium George Apostol CEO Elastics.cloud Christopher Cox VP Technology Montage Composable Memory Panel
  • 171. 171 Chris Mellor Editor Blocks and Files Manoj Wadekar SW-Defined Memory Workstream Lead, OCP, Storage Architect, Meta Richard Solomon Tech Mktg Mgr., PCIe/CXL Synopsys Bernie Wu VP Strategic Alliances MemVerge James Cuff Distinguished Engineer Harvard University (retired) Industry Expert, HPC & AI Industr y Expert Big Memory App Panel
  • 172. CXL™: Ready for Take-Off