In the CXL Forum Theater at SC23 hosted by MemVerge, Samsung described their the architecture and use cases of their hybrid drive that includes DRAM and Flash memory
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Samsung: CMM-H Tiered Memory Solution with Built-in DRAM
1. CMM-H
Tiered Memory Solution with Built-in DRAM
Dr. Shuyi Pei
Ph.D., Sr. Engineer @Memory Solutions Lab.
Samsung Semiconductor Inc.
2. Larger capacity memory device at lower TCO
best suited for tiered memory solutions
Speed comparable to DRAM with NAND storage backed
and external battery power supply
Persistent memory option
CMM-H (CXL Memory Module, H: Hybrid)
Better system TCO
64-byte cache-granular fine grained access
to meet modern AI/ML workload needs
Small granularity access
Expanding capacity and utilization of memory for AI
3. • DRAM cache to move/store small-
sized data chunks suitable for AI/ML
Applications
• Improve data store efficiency by
writing data at the DRAM speed
• Low latency enabled by CXL.
mem protocol
Optimized for AI workloads
CMM-H Architecture
Computer System
Normal
I/O
Small
I/O
CXL.memory
DRAM Cache
CXL.io
NAND Flash
4KB
64 Bytes
128 Bytes
4. 0
1
Title in Samsung Sharp Sans Bold (34)
Body text in Samsung Sharp Sans Medium (16)
Insert more text here.
Use this page when Samsung fonts are available.
Subtitle in Samsung Sharp Sans Bold (24)
**Compared to PCIe Gen4 NVMe SSD
• Small granularity data access
enable performance scales
with cache hits
• Direct memory access
advantage
• Large memory capacity at
lower TCO
Memory Reads per Second (Million)
Tiered Memory
8.0
2.2
1.6
1.4
1.3
1.2
1.1
1.0
1.0
0.9
16.3
3.6
2.4
2.0
1.7
1.5
1.4
1.2
1.1
1.0
1.1 1.2 1.3
1.5
1.8
2.2
2.7
3.6
5.9
32.7
43
9.9
4.9
3.2
2.4
1.9
1.66
1.4
1.2
1.1
512B
256B
128B
64B
100.0
10.0
1.0
0.1
10 20 30 40 50 60 70 80 90 100
Cache Hit Rate (%)
5. 0
1
Title in Samsung Sharp Sans Bold (34)
Body text in Samsung Sharp Sans Medium (16)
Insert more text here.
Use this page when Samsung fonts are available.
Subtitle in Samsung Sharp Sans Bold (24)
5
• Battery-backed DRAM with
speed comparable to DDR5
• Persistence achieved with data
dumps to NAND flash
• Supports flush-on-fail with CXL
2.0 GPF feature
Persistent Memory
Operations per Second (Million)
0
35
70
105
140
100% Write 50% Write: 50% Read 10% Write: 90% Reads
DDR5 DRAM
CMM-H
Persistent Memory
Persistent Memory Competitor
6. 0
1
Title in Samsung Sharp Sans Bold (34)
Body text in Samsung Sharp Sans Medium (16)
Insert more text here.
Use this page when Samsung fonts are available.
Subtitle in Samsung Sharp Sans Bold (24)
6
**Compared to PCIe Gen4 NVMe SSD
• Direct memory access
advantage; no software cache
overhead
• Up to ~10x better end-to-end
performance with FPGA-based
PoC**
0
12500
25000
37500
50000
0 28 55 83 110
Inferences
per
Second
Cache Hit Ratio (%)
End-to-End Performance
Block IO
CMM-H
Block IO + Host Memory Cache
DRAM Memory
7. 0
1
Title in Samsung Sharp Sans Bold (34)
Body text in Samsung Sharp Sans Medium (16)
Insert more text here.
Use this page when Samsung fonts are available.
Subtitle in Samsung Sharp Sans Bold (24)
7
Movie Recommendation System Demo
Editor's Notes
1 TB MS-SSD memory with 8GB internal cache; Prototype performance scales for smaller granular memory accesses also as cache hit rate increases
16GB MS-SSD persistent memory; FPGA-based Prototype performance better than competitors (Optane) and close to DDR5 performance
End-to-end recommendation inference performance also scales as cache hit rate increases and comes closer to higher performance and cost DDR5
Movie Recommendation system is one good example to show MS-SSD’s performance. MS-SSD is HW device with Cache based ; most cost and power efficient AI recommendation system
** 40X better IO performance : PCIe Gen4 NVMe SSD 4KB Random read(0.9M IOPS) vs MS-SSD 64Byte (42.9M IOPS)
(Embedding table column size for DLRM is 64Byte)
https://ai.facebook.com/blog/dlrm-an-advanced-open-source-deep-learning-recommendation-model/