SlideShare a Scribd company logo
1 of 34
Download to read offline
* Based on kernel 6.3 (x86_64) – QEMU
* 2-socket CPUs (4 cores/socket)
* 16GB memory
* Kernel parameter: nokaslr norandmaps
* KASAN: disabled
* Userspace: ASLR is disabled
* Legacy BIOS
Memory Management with Page Folios
Adrian Huang | May, 2023
Agenda
• Problem description
✓[Background] Normal high-order page & compound page
✓Legacy page cache
• Memory folio’s goal
• Solution
✓Page cache with memory folio
✓page struct vs folio struct
✓[Example] total_mapcount() implementation difference between legacy approach and
folio
Normal high-order page & compound page
Page
(head page)
flags |= PG_head
Page
(Tail page)
compound_head
compound_dtor
compound_order
compound_mapcount
compound_nr
First
Tail
Page
only
Page
(Tail page)
_compound_pad_1
(compound_head)
hpage_pinned_refcount
deferred_list
2nd
Tail
Page
only
Compound Page
Page
(Tail page)
_compound_pad_1
(compound_head)
Page
Page
Page
Page
Normal high-order page
pages = alloc_pages(GFP_KERNEL, 2);
Four physically contiguous pages: Init compound page metadata during page allocation
pages = alloc_pages(GFP_KERNEL | __GFP_COMP, 2);
Four physically contiguous pages: not a compound page
Compound page
Page
(head page)
flags |= PG_head
Page
(Tail page)
compound_head
compound_dtor
compound_order
compound_mapcount
compound_nr
First
Tail
Page
only
Page
(Tail page)
_compound_pad_1
(compound_head)
hpage_pinned_refcount
deferred_list
2nd
Tail
Page
only
. . .
Compound Page
Page
(Tail page)
_compound_pad_1
(compound_head)
Page
(Tail page)
_compound_pad_1
(compound_head)
Compound page – Use Cases
• Mainly used in huge page
✓ hugetlbfs (also called HugeTLB Pages or persistent huge pages)
➢ Reserved inside the kernel and cannot be used for other purposes.
➢ Cannot be swapped out.
➢ Two allocation methods:
• Pre-allocated to the kernel huge page pool with appending kernel parameter.
• [Dynamically allocated huge pages of the default size] Example: `echo 10 > /proc/sys/vm/nr_hugepages`
➢ User application calls the mmap system call or shared memory system calls (shmget and shmat) to request the huge page allocation.
➢ Used by database for many years
➢ Manual configuration for hugetlb pages is required.
➢ Application change is required. (via open/mmap)
Compound page
Page
(head page)
flags |= PG_head
Page
(Tail page)
compound_head
compound_dtor
compound_order
compound_mapcount
compound_nr
First
Tail
Page
only
Page
(Tail page)
_compound_pad_1
(compound_head)
hpage_pinned_refcount
deferred_list
2nd
Tail
Page
only
. . .
Compound Page
Page
(Tail page)
_compound_pad_1
(compound_head)
Page
(Tail page)
_compound_pad_1
(compound_head)
Compound page
Page
(head page)
flags |= PG_head
Page
(Tail page)
compound_head
compound_dtor
compound_order
compound_mapcount
compound_nr
First
Tail
Page
only
Page
(Tail page)
_compound_pad_1
(compound_head)
hpage_pinned_refcount
deferred_list
2nd
Tail
Page
only
. . .
Compound Page
Page
(Tail page)
_compound_pad_1
(compound_head)
Page
(Tail page)
_compound_pad_1
(compound_head)
Compound page – Use Cases
• Mainly used in huge page
✓ Transparent Huge Page (THP)
➢ Support the automatic promotion and demotion of page sizes
➢ Transparent to the application: No need to modify application.
➢ Control via /sys/kernel/mm/transparent_hugepage/enabled:
Compound page
Page
(head page)
flags |= PG_head
Page
(Tail page)
compound_head
compound_dtor
compound_order
compound_mapcount
compound_nr
First
Tail
Page
only
Page
(Tail page)
_compound_pad_1
(compound_head)
hpage_pinned_refcount
deferred_list
2nd
Tail
Page
only
. . .
Compound Page
Page
(Tail page)
_compound_pad_1
(compound_head)
Page
(Tail page)
_compound_pad_1
(compound_head)
Compound page – Use Cases
• Mainly used in huge page
• kmalloc: allocation size > 8192 bytes
o Check kmalloc_order()
• Memory folio
When to configure compound page?
• Condition: page order >= 1 && __GFP_COMP allocation flag is set
• alloc_pages -> … -> prep_new_page -> prep_compound_page
Compound page: Problem Description #1
Page
(head page)
flags |= PG_head
Page
(Tail page)
compound_head
compound_dtor
compound_order
compound_mapcount
compound_nr
First
Tail
Page
only
Page
(Tail page)
_compound_pad_1
(compound_head)
hpage_pinned_refcount
deferred_list
2nd
Tail
Page
only
. . .
Compound Page
Page
(Tail page)
_compound_pad_1
(compound_head)
Page
(Tail page)
_compound_pad_1
(compound_head)
[Problem Description] No unified interface: Ambiguity
• Some functions may deal with PAGES_SIZE unit (4KB): They’re unaware of compound pages and huge pages
• Some functions accept the page head *only*
• Some functions accept the page head or page tail
✓ Call compound_head() to get the page head: waste instructions to get the page head → Performance impact
✓ compound_head() users:
➢ get_page(): This function is called quite frequently
➢ put_page(): This function is called quite frequently
➢ …
Legacy page cache
4KB 4KB
512B
512B
512B
512B
[file] file->f_pos
(continuous file position)
Page cache
512B
512B
512B
512B
512B
512B
512B
512B
512B
512B
512B
512B
512B
512B
512B
512B
512B
512B
512B
512B
sector
..
Disk
4KB
Page Cache
Buffer Cache
(Buffer head)
Kernel Space
read()/write()/sendfile()
User Space
mmap()
Legacy page cache: Problem Description #2
1. Page cache occupies most of memory pages!!!
2. Each page cache (no “compound page” concept) is added to active/inactive lru list
• Long lru list: lock contention & cache misses
Agenda
• Problem description
✓Normal high-order page & compound page
✓Legacy page cache
• Memory folio’s goal
• Solution
✓Page cache with memory folio
✓page struct vs folio struct
✓[Example] total_mapcount() implementation difference between legacy approach and
folio
Memory folio’s goal
• Unified interface
✓All accesses via folio struct (head page/tail page in a compound page)
• [Page Cache] Shorter LRU list
✓Original: one struct page per 4KB to be added to LRU list
✓Folio: one struct page (page head) per 8KB, 16KB, 32KB, 64KB, 128KB….and so
on (include THP) to be added to LRU list
• [Anonymous page] THP support
✓create_huge_pmd -> do_huge_pmd_anonymous_page ->
__do_huge_pmd_anonymous_page
* THP: Transparent Huge Page
Agenda
• Problem description
✓Normal high-order page & compound page
✓Legacy page cache
• Memory folio’s goal
• Solution
✓Page cache with memory folio
✓page struct vs folio struct
✓[Example] total_mapcount() implementation difference between legacy approach and
folio
Page cache with folio
4KB
512B
…
512B
[file] file->f_pos
(continuous file position)
folio
sector
…
…
..
Disk
page page page page
4KB 4KB 4KB 4KB 4KB 4KB 4KB
512B
…
512B
512B
…
512B
512B
…
512B
512B
…
512B
page page page page
512B
…
512B
512B
…
512B
512B
…
512B
Kernel Space
read()/write()/sendfile()
User Space
mmap()
• Folio is the container of struct page(s)
✓ All accesses via folio struct
✓ No tail page → fewer run-time checks
Page cache with folio
4KB
512B
…
512B
[file] file->f_pos
(continuous file position)
folio
sector
…
…
..
Disk
page page page page
4KB 4KB 4KB 4KB 4KB 4KB 4KB
512B
…
512B
512B
…
512B
512B
…
512B
512B
…
512B
page page page page
512B
…
512B
512B
…
512B
512B
…
512B
Kernel Space
read()/write()/sendfile()
User Space
mmap()
Folio’s page order: readahead mechanism
• CONFIG_TRANSPARENT_HUGEPAGE is enabled
✓ Minimum: order 2 (4 pages)
✓ Maximum: order 9 (512 pages)
• CONFIG_TRANSPARENT_HUGEPAGE is disabled
✓ Minimum: order 2 (4 pages)
✓ Maximum: order 8 (256 pages)
• Commit 793917d997df (“mm/readahead: Add
large folio readahead”): merged in 5.18 kernel
• Default readahead size: 128KB (32 pages)
Page cache with folio
4KB
512B
…
512B
[file] file->f_pos
(continuous file position)
folio
sector
…
…
..
Disk
page page page page
4KB 4KB 4KB 4KB 4KB 4KB 4KB
512B
…
512B
512B
…
512B
512B
…
512B
512B
…
512B
page page page page
512B
…
512B
512B
…
512B
512B
…
512B
Kernel Space
read()/write()/sendfile()
User Space
mmap()
1. Short LRU list: Only the head page of folio is added LRU list → Performance improvement
2. 45% improvement for lru-file-mmap-read (vm-scalability): Matthew Wilcox’s PDF file
Folio’s page order: readahead mechanism
• CONFIG_TRANSPARENT_HUGEPAGE is enabled
✓ Minimum: order 2 (4 pages)
✓ Maximum: order 9 (512 pages)
• CONFIG_TRANSPARENT_HUGEPAGE is disabled
✓ Minimum: order 2 (4 pages)
✓ Maximum: order 8 (256 pages)
• Commit 793917d997df (“mm/readahead: Add
large folio readahead”): merged in 5.18 kernel
• Default readahead size: 128KB (32 pages)
Page cache with folio: backtrace * kernel: 6.3
Agenda
• Problem description
✓Normal high-order page & compound page
✓Legacy page cache
• Memory folio’s goal
• Solution
✓Page cache with memory folio
✓page struct vs folio struct
✓[Example] total_mapcount() implementation difference between legacy approach and
folio
folio
flags
struct list_head lru
void *__filler
mlock_count
struct address_space *mapping
pgoff_t index
union
void *private
atomic_t _mapcount
atomic_t _refcount
unsigned long memcg_data
struct page page
struct
struct
union
_flags_1
_head_1
unsigned char _folio_dtor
unsigned char _folio_order
atomic_t _entire_mapcount
atomic_t _nr_pages_mapped
atomic_t _pincount
unsigned int _folio_nr_pages
struct
union
struct page __page_1
_flags_2
_head_2
void *_hugetlb_subpool
void *_hugetlb_cgroup
void *_hugetlb_cgroup_rsvd
void *_hugetlb_hwpoison
_flags_2a
_head_2a
struct
union
struct page __page_2
struct
struct list_head _deferred_list
Page #0 (head)
flags
…
Page #1 (tail)
flags
compound_head
compound_dtor
compound_order
compound_mapcount
compound_nr
…
Page #N (tail)
flags
_compound_pad_1
hpage_pinned_refcount
deferred_list
.
.
.
page struct vs folio struct
Compound pages
folio
flags
struct list_head lru
void *__filler
mlock_count
struct address_space *mapping
pgoff_t index
union
void *private
atomic_t _mapcount
atomic_t _refcount
unsigned long memcg_data
struct page page
struct
struct
union
_flags_1
_head_1
unsigned char _folio_dtor
unsigned char _folio_order
atomic_t _entire_mapcount
atomic_t _nr_pages_mapped
atomic_t _pincount
unsigned int _folio_nr_pages
struct
union
struct page __page_1
_flags_2
_head_2
void *_hugetlb_subpool
void *_hugetlb_cgroup
void *_hugetlb_cgroup_rsvd
void *_hugetlb_hwpoison
_flags_2a
_head_2a
struct
union
struct page __page_2
struct
struct list_head _deferred_list
Page #0 (head)
flags
…
Page #1 (tail)
flags
compound_head
compound_dtor
compound_order
compound_mapcount
compound_nr
…
Page #2 (tail)
flags
_compound_pad_1
hpage_pinned_refcount
deferred_list
.
.
.
page struct vs folio struct
Compound pages
folio’s benefit
• [Example] 512KB compound page
✓ page struct: Need to maintain 128
page structs (1 head page and 127
tail pages)
✓ folio struct: Maintain 3 page structs
regardless of the size of compound
pages.
folio
flags
struct list_head lru
void *__filler
mlock_count
struct address_space *mapping
pgoff_t index
union
void *private
atomic_t _mapcount
atomic_t _refcount
unsigned long memcg_data
struct page page
struct
struct
union
_flags_1
_head_1
unsigned char _folio_dtor
unsigned char _folio_order
atomic_t _entire_mapcount
atomic_t _nr_pages_mapped
atomic_t _pincount
unsigned int _folio_nr_pages
struct
union
struct page __page_1 → won’t be used
_flags_2
_head_2
void *_hugetlb_subpool
void *_hugetlb_cgroup
void *_hugetlb_cgroup_rsvd
void *_hugetlb_hwpoison
_flags_2a
_head_2a
struct
union
struct page __page_2 → won’t be used
struct
struct list_head _deferred_list
page struct vs folio struct
folio struct’s members
• _entire_mapcount
✓ The compound page is mapped via a single PMD (huge page).
• _nr_pages_mapped
✓ Number of individual subpages (PTE: 4KB pages) are mapped.
✓ Scenario: Two processes map the same memory range
✓ One process maps the entire 2MB compound page (Transparent Huge
Page - THP): mapped via a single PMD
✓ The other process maps some 4KB pages within this 2MB memory
area: mapped via PTEs
✓ Benefit about THP: No need to split the huge page if other processes
map 4KB pages within the same memory area.
• _folio_nr_pages
✓ Number of pages in this folio.
✓ _folio_nr_pages = 1 << order, where order > 0.
folio
flags
struct list_head lru
void *__filler
mlock_count
struct address_space *mapping
pgoff_t index
union
void *private
atomic_t _mapcount
atomic_t _refcount
unsigned long memcg_data
struct page page
struct
struct
union
_flags_1
_head_1
unsigned char _folio_dtor
unsigned char _folio_order
atomic_t _entire_mapcount
atomic_t _nr_pages_mapped
atomic_t _pincount
unsigned int _folio_nr_pages
struct
union
struct page __page_1
_flags_2
_head_2
void *_hugetlb_subpool
void *_hugetlb_cgroup
void *_hugetlb_cgroup_rsvd
void *_hugetlb_hwpoison
_flags_2a
_head_2a
struct
union
struct page __page_2
struct
struct list_head _deferred_list
Page #0 (head)
flags
…
Page #1 (tail)
flags
compound_head
compound_dtor
compound_order
compound_mapcount
compound_nr
…
Page #2 (tail)
flags
_compound_pad_1
hpage_pinned_refcount
deferred_list
.
.
.
folio struct vs legacy compound page
Page #1 of folio and legacy page
struct has the same mapping
Agenda
• Problem description
✓Normal high-order page & compound page
✓Legacy page cache
• Memory folio’s goal
• Solution
✓Page cache with memory folio
✓page struct vs folio struct
✓[Example] total_mapcount() implementation difference between legacy approach and
folio
[kernel v5.11] total_mapcount()
page
Page cache and anonymous pages
struct
union
page_pool used by netstack
struct
slab, slob and slub
struct
Tail pages of compound page
struct
Second tail page of compound page
struct
Page table pages:
1. PMD huge PTE
2. x86 pgd page <-> mm_struct
struct
ZONE_DEVICE pages
struct
rcu_head: free a page by RCU
struct
union
atomic_t _mapcount: the number of this
page is referenced by page table
unsigned int page_type
unsigned int active: used by slab
int units: used by slob
atomic_t _refcount
…
Case 1: Singleton page(s)
Get _mapcount directly
total_mapcount() users:
• Huge page
• rmap (reverse mapping)
page (head)
Page cache and anonymous pages
struct
union
page_pool used by netstack
struct
slab, slob and slub
struct
Tail pages of compound page
struct
Second tail page of compound page
struct
Page table pages:
1. PMD huge PTE
2. x86 pgd page <-> mm_struct
struct
ZONE_DEVICE pages
struct
rcu_head: free a page by RCU
struct
union
atomic_t _mapcount: the number of this
page is referenced by page table
unsigned int page_type
unsigned int active: used by slab
int units: used by slob
atomic_t _refcount
…
Case 2: Compound page && hugetlb (hugetlbfs) page
page (first tail)
compound_head
struct
union
compound_dtor
compound_order
compound_mapcount
compound_nr =
1 << compound_nr
struct
union
atomic_t _mapcount
unsigned int page_type
unsigned int active
int units
atomic_t _refcount
…
. . .
struct
. . .
page (second tail)
_compound_pad_1
struct
union
hpage_pinned_refcount
deferred_list
struct
union
atomic_t _mapcount
unsigned int page_type
unsigned int active
int units
atomic_t _refcount
…
. . .
struct
. . .
. . .
_compound_pad_1
(compound_head)
page (second tail)
Get compound_mapcount directly
compound_mapcount:
• Map count of the whole compound page
(does not include mapped sub-pages)
Steps:
1. Get the head page based on any page (page
head or page tail)
2. Read ‘compound_mapcount’ of the first tail page
A. page[1].compound_mapcount
[kernel v5.11] total_mapcount()
page (head)
Page cache and anonymous pages
struct
union
page_pool used by netstack
struct
slab, slob and slub
struct
Tail pages of compound page
struct
Second tail page of compound page
struct
Page table pages:
1. PMD huge PTE
2. x86 pgd page <-> mm_struct
struct
ZONE_DEVICE pages
struct
rcu_head: free a page by RCU
struct
union
atomic_t _mapcount: the number of this
page is referenced by page table
unsigned int page_type
unsigned int active: used by slab
int units: used by slob
atomic_t _refcount
…
Case 3: [Anonymous page] Compound page && transparent huge page
page (first tail)
compound_head
struct
union
compound_dtor
compound_order
compound_mapcount
compound_nr =
1 << compound_nr
struct
union
atomic_t _mapcount
unsigned int page_type
unsigned int active
int units
atomic_t _refcount
…
. . .
struct
. . .
page (second tail)
_compound_pad_1
struct
union
hpage_pinned_refcount
deferred_list
struct
union
atomic_t _mapcount
unsigned int page_type
unsigned int active
int units
atomic_t _refcount
…
. . .
struct
. . .
. . .
_compound_pad_1
(compound_head)
page (second tail)
Steps:
1. Get the head page based on any page (page head or page tail)
2. Read ‘compound_mapcount’ of the first tail page
A. page[1].compound_mapcount: Map count of the whole compound page
3. `Accumulate each subpage._mapcount`:
A. One process maps 2MB range as a single huge page (a single PMD)
B. Another process maps 512 individual PTEs
4. `Accumulate each subpage._mapcount` + page[1].compound_mapcount
[kernel v5.11] total_mapcount()
page (head)
Page cache and anonymous pages
struct
union
page_pool used by netstack
struct
slab, slob and slub
struct
Tail pages of compound page
struct
Second tail page of compound page
struct
Page table pages:
1. PMD huge PTE
2. x86 pgd page <-> mm_struct
struct
ZONE_DEVICE pages
struct
rcu_head: free a page by RCU
struct
union
atomic_t _mapcount: the number of this
page is referenced by page table
unsigned int page_type
unsigned int active: used by slab
int units: used by slob
atomic_t _refcount
…
Case 3: [Page cache] Compound page && transparent huge page
page (first tail)
compound_head
struct
union
compound_dtor
compound_order
compound_mapcount
compound_nr =
1 << compound_nr
struct
union
atomic_t _mapcount
unsigned int page_type
unsigned int active
int units
atomic_t _refcount
…
. . .
struct
. . .
page (second tail)
_compound_pad_1
struct
union
hpage_pinned_refcount
deferred_list
struct
union
atomic_t _mapcount
unsigned int page_type
unsigned int active
int units
atomic_t _refcount
…
. . .
struct
. . .
. . .
_compound_pad_1
(compound_head)
page (second tail)
Steps:
1. Get the head page based on any page (page head or page tail)
2. Read ‘compound_mapcount’ of the first tail page
A. page[1].compound_mapcount: Map count of the whole compound page
3. `Accumulate each subpage._mapcount`:
A. One process maps 2MB range as a single huge page (a single PMD)
B. Another process maps 512 individual PTEs
4. `Accumulate each subpage._mapcount` + page[1].compound_mapcount -
page[1].compound_mapcount * page[1].compound_nr
A. File pages has compound_mapcount included in _mapcount
[kernel v5.11] total_mapcount()
[kernel v6.3] total_mapcount() and folio_mapcount()
folio
flags
struct list_head lru
void *__filler
mlock_count
struct address_space *mapping
pgoff_t index
union
void *private
atomic_t _mapcount
atomic_t _refcount
unsigned long memcg_data
struct page page
struct
struct
union
_flags_1
_head_1
unsigned char _folio_dtor
unsigned char _folio_order
atomic_t _entire_mapcount
atomic_t _nr_pages_mapped
atomic_t _pincount
unsigned int _folio_nr_pages
struct
union
struct page __page_1
…
void *_hugetlb_subpool
void *_hugetlb_cgroup
void *_hugetlb_cgroup_rsvd
void *_hugetlb_hwpoison
_flags_2a
_head_2a
struct
union
struct page __page_2
struct
struct list_head _deferred_list
Case 1: Singleton page – Not a compound page
1
2
[kernel v6.3] total_mapcount() and folio_mapcount()
folio
flags
struct list_head lru
void *__filler
mlock_count
struct address_space *mapping
pgoff_t index
union
void *private
atomic_t _mapcount
atomic_t _refcount
unsigned long memcg_data
struct page page
struct
struct
union
_flags_1
_head_1
unsigned char _folio_dtor
unsigned char _folio_order
atomic_t _entire_mapcount
atomic_t _nr_pages_mapped
atomic_t _pincount
unsigned int _folio_nr_pages
struct
union
struct page __page_1
…
void *_hugetlb_subpool
void *_hugetlb_cgroup
void *_hugetlb_cgroup_rsvd
void *_hugetlb_hwpoison
_flags_2a
_head_2a
struct
union
struct page __page_2
struct
struct list_head _deferred_list
Case 2: Compound page is mapped via PMD (huge page)
1
3
2
Get _entire_mapcount
4
5 Get _nr_pages_mapped
[kernel v6.3] total_mapcount() and folio_mapcount()
folio
flags
struct list_head lru
void *__filler
mlock_count
struct address_space *mapping
pgoff_t index
union
void *private
atomic_t _mapcount
atomic_t _refcount
unsigned long memcg_data
struct page page
struct
struct
union
_flags_1
_head_1
unsigned char _folio_dtor
unsigned char _folio_order
atomic_t _entire_mapcount
atomic_t _nr_pages_mapped
atomic_t _pincount
unsigned int _folio_nr_pages
struct
union
struct page __page_1
…
void *_hugetlb_subpool
void *_hugetlb_cgroup
void *_hugetlb_cgroup_rsvd
void *_hugetlb_hwpoison
_flags_2a
_head_2a
struct
union
struct page __page_2
struct
struct list_head _deferred_list
Case 3: Compound page is mapped via PMD (huge page)
and some subpages are mapped by PTE
1
2 mapcount = folio’s _entire_mapcount +
sum(each subpage’s _mapcount)
Reference
• Memory Folios
• LWN - A memory-folio update
• LWN - An introduction to compound pages
• LWN - Huge pages part 1 (Introduction)
• Documentation/mm/transhuge.rst
backup
Learn new C standard (C11) from folio: Generic Selection
* Reference from: ISO/IEC 9899:201x
• C99 defines type-generic macros in the standardized
library: the type of argument is detected automatically,
and the corresponding function is invoked based on that
type.
✓ Example: sqrt(X),
➢ X is double → invoke sqrt()
➢ X is float → invoke sqrtf()
➢ X is long double → invoke sqrtl()
• However, programmers cannot define their own type-
generic macros in C99.
• In C11, programmers can define their own type-generic
macros:
Some Functions

More Related Content

What's hot

Linux Memory Management
Linux Memory ManagementLinux Memory Management
Linux Memory Management
Ni Zo-Ma
 
How to use KASAN to debug memory corruption in OpenStack environment- (2)
How to use KASAN to debug memory corruption in OpenStack environment- (2)How to use KASAN to debug memory corruption in OpenStack environment- (2)
How to use KASAN to debug memory corruption in OpenStack environment- (2)
Gavin Guo
 

What's hot (20)

Decompressed vmlinux: linux kernel initialization from page table configurati...
Decompressed vmlinux: linux kernel initialization from page table configurati...Decompressed vmlinux: linux kernel initialization from page table configurati...
Decompressed vmlinux: linux kernel initialization from page table configurati...
 
malloc & vmalloc in Linux
malloc & vmalloc in Linuxmalloc & vmalloc in Linux
malloc & vmalloc in Linux
 
Slab Allocator in Linux Kernel
Slab Allocator in Linux KernelSlab Allocator in Linux Kernel
Slab Allocator in Linux Kernel
 
semaphore & mutex.pdf
semaphore & mutex.pdfsemaphore & mutex.pdf
semaphore & mutex.pdf
 
Memory Compaction in Linux Kernel.pdf
Memory Compaction in Linux Kernel.pdfMemory Compaction in Linux Kernel.pdf
Memory Compaction in Linux Kernel.pdf
 
Anatomy of the loadable kernel module (lkm)
Anatomy of the loadable kernel module (lkm)Anatomy of the loadable kernel module (lkm)
Anatomy of the loadable kernel module (lkm)
 
Linux Kernel Booting Process (2) - For NLKB
Linux Kernel Booting Process (2) - For NLKBLinux Kernel Booting Process (2) - For NLKB
Linux Kernel Booting Process (2) - For NLKB
 
Vmlinux: anatomy of bzimage and how x86 64 processor is booted
Vmlinux: anatomy of bzimage and how x86 64 processor is bootedVmlinux: anatomy of bzimage and how x86 64 processor is booted
Vmlinux: anatomy of bzimage and how x86 64 processor is booted
 
Linux MMAP & Ioremap introduction
Linux MMAP & Ioremap introductionLinux MMAP & Ioremap introduction
Linux MMAP & Ioremap introduction
 
Linux Kernel - Virtual File System
Linux Kernel - Virtual File SystemLinux Kernel - Virtual File System
Linux Kernel - Virtual File System
 
Linux Initialization Process (1)
Linux Initialization Process (1)Linux Initialization Process (1)
Linux Initialization Process (1)
 
Linux kernel debugging
Linux kernel debuggingLinux kernel debugging
Linux kernel debugging
 
Memory management in Linux kernel
Memory management in Linux kernelMemory management in Linux kernel
Memory management in Linux kernel
 
Understanding of linux kernel memory model
Understanding of linux kernel memory modelUnderstanding of linux kernel memory model
Understanding of linux kernel memory model
 
Linux memory-management-kamal
Linux memory-management-kamalLinux memory-management-kamal
Linux memory-management-kamal
 
Linux kernel memory allocators
Linux kernel memory allocatorsLinux kernel memory allocators
Linux kernel memory allocators
 
BPF Internals (eBPF)
BPF Internals (eBPF)BPF Internals (eBPF)
BPF Internals (eBPF)
 
Linux Memory Management
Linux Memory ManagementLinux Memory Management
Linux Memory Management
 
New Ways to Find Latency in Linux Using Tracing
New Ways to Find Latency in Linux Using TracingNew Ways to Find Latency in Linux Using Tracing
New Ways to Find Latency in Linux Using Tracing
 
How to use KASAN to debug memory corruption in OpenStack environment- (2)
How to use KASAN to debug memory corruption in OpenStack environment- (2)How to use KASAN to debug memory corruption in OpenStack environment- (2)
How to use KASAN to debug memory corruption in OpenStack environment- (2)
 

Similar to Memory Management with Page Folios

My sql innovation work -innosql
My sql innovation work -innosqlMy sql innovation work -innosql
My sql innovation work -innosql
thinkinlamp
 
MongoDB Memory Management Demystified
MongoDB Memory Management DemystifiedMongoDB Memory Management Demystified
MongoDB Memory Management Demystified
MongoDB
 
Ch10 OS
Ch10 OSCh10 OS
Ch10 OS
C.U
 

Similar to Memory Management with Page Folios (20)

Vam: A Locality-Improving Dynamic Memory Allocator
Vam: A Locality-Improving Dynamic Memory AllocatorVam: A Locality-Improving Dynamic Memory Allocator
Vam: A Locality-Improving Dynamic Memory Allocator
 
MySQL Cluster page management (2014)
MySQL Cluster page management (2014)MySQL Cluster page management (2014)
MySQL Cluster page management (2014)
 
Lecture storage-buffer
Lecture storage-bufferLecture storage-buffer
Lecture storage-buffer
 
Transparent Hugepages in RHEL 6
Transparent Hugepages in RHEL 6 Transparent Hugepages in RHEL 6
Transparent Hugepages in RHEL 6
 
Unit 5
Unit 5Unit 5
Unit 5
 
My sql innovation work -innosql
My sql innovation work -innosqlMy sql innovation work -innosql
My sql innovation work -innosql
 
MongoDB Memory Management Demystified
MongoDB Memory Management DemystifiedMongoDB Memory Management Demystified
MongoDB Memory Management Demystified
 
Linux Huge Pages
Linux Huge PagesLinux Huge Pages
Linux Huge Pages
 
Computer architecture virtual memory
Computer architecture virtual memoryComputer architecture virtual memory
Computer architecture virtual memory
 
Virtual memory translation.pptx
Virtual memory translation.pptxVirtual memory translation.pptx
Virtual memory translation.pptx
 
Cs416 08 09a
Cs416 08 09aCs416 08 09a
Cs416 08 09a
 
Mca ii os u-4 memory management
Mca  ii  os u-4 memory managementMca  ii  os u-4 memory management
Mca ii os u-4 memory management
 
Vmfs
VmfsVmfs
Vmfs
 
OSCh10
OSCh10OSCh10
OSCh10
 
Ch10 OS
Ch10 OSCh10 OS
Ch10 OS
 
OS_Ch10
OS_Ch10OS_Ch10
OS_Ch10
 
Linux Memory
Linux MemoryLinux Memory
Linux Memory
 
Segmentation and paging
Segmentation and paging Segmentation and paging
Segmentation and paging
 
Windows Internal - Ch9 memory management
Windows Internal - Ch9 memory managementWindows Internal - Ch9 memory management
Windows Internal - Ch9 memory management
 
CH09.pdf
CH09.pdfCH09.pdf
CH09.pdf
 

Recently uploaded

+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
Health
 
TECUNIQUE: Success Stories: IT Service provider
TECUNIQUE: Success Stories: IT Service providerTECUNIQUE: Success Stories: IT Service provider
TECUNIQUE: Success Stories: IT Service provider
mohitmore19
 
CHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICECHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
9953056974 Low Rate Call Girls In Saket, Delhi NCR
 
introduction-to-automotive Andoid os-csimmonds-ndctechtown-2021.pdf
introduction-to-automotive Andoid os-csimmonds-ndctechtown-2021.pdfintroduction-to-automotive Andoid os-csimmonds-ndctechtown-2021.pdf
introduction-to-automotive Andoid os-csimmonds-ndctechtown-2021.pdf
VishalKumarJha10
 

Recently uploaded (20)

Exploring the Best Video Editing App.pdf
Exploring the Best Video Editing App.pdfExploring the Best Video Editing App.pdf
Exploring the Best Video Editing App.pdf
 
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
 
TECUNIQUE: Success Stories: IT Service provider
TECUNIQUE: Success Stories: IT Service providerTECUNIQUE: Success Stories: IT Service provider
TECUNIQUE: Success Stories: IT Service provider
 
Microsoft AI Transformation Partner Playbook.pdf
Microsoft AI Transformation Partner Playbook.pdfMicrosoft AI Transformation Partner Playbook.pdf
Microsoft AI Transformation Partner Playbook.pdf
 
Introducing Microsoft’s new Enterprise Work Management (EWM) Solution
Introducing Microsoft’s new Enterprise Work Management (EWM) SolutionIntroducing Microsoft’s new Enterprise Work Management (EWM) Solution
Introducing Microsoft’s new Enterprise Work Management (EWM) Solution
 
Define the academic and professional writing..pdf
Define the academic and professional writing..pdfDefine the academic and professional writing..pdf
Define the academic and professional writing..pdf
 
Unveiling the Tech Salsa of LAMs with Janus in Real-Time Applications
Unveiling the Tech Salsa of LAMs with Janus in Real-Time ApplicationsUnveiling the Tech Salsa of LAMs with Janus in Real-Time Applications
Unveiling the Tech Salsa of LAMs with Janus in Real-Time Applications
 
How To Use Server-Side Rendering with Nuxt.js
How To Use Server-Side Rendering with Nuxt.jsHow To Use Server-Side Rendering with Nuxt.js
How To Use Server-Side Rendering with Nuxt.js
 
10 Trends Likely to Shape Enterprise Technology in 2024
10 Trends Likely to Shape Enterprise Technology in 202410 Trends Likely to Shape Enterprise Technology in 2024
10 Trends Likely to Shape Enterprise Technology in 2024
 
How To Troubleshoot Collaboration Apps for the Modern Connected Worker
How To Troubleshoot Collaboration Apps for the Modern Connected WorkerHow To Troubleshoot Collaboration Apps for the Modern Connected Worker
How To Troubleshoot Collaboration Apps for the Modern Connected Worker
 
CHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICECHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
 
Azure_Native_Qumulo_High_Performance_Compute_Benchmarks.pdf
Azure_Native_Qumulo_High_Performance_Compute_Benchmarks.pdfAzure_Native_Qumulo_High_Performance_Compute_Benchmarks.pdf
Azure_Native_Qumulo_High_Performance_Compute_Benchmarks.pdf
 
The Ultimate Test Automation Guide_ Best Practices and Tips.pdf
The Ultimate Test Automation Guide_ Best Practices and Tips.pdfThe Ultimate Test Automation Guide_ Best Practices and Tips.pdf
The Ultimate Test Automation Guide_ Best Practices and Tips.pdf
 
introduction-to-automotive Andoid os-csimmonds-ndctechtown-2021.pdf
introduction-to-automotive Andoid os-csimmonds-ndctechtown-2021.pdfintroduction-to-automotive Andoid os-csimmonds-ndctechtown-2021.pdf
introduction-to-automotive Andoid os-csimmonds-ndctechtown-2021.pdf
 
A Secure and Reliable Document Management System is Essential.docx
A Secure and Reliable Document Management System is Essential.docxA Secure and Reliable Document Management System is Essential.docx
A Secure and Reliable Document Management System is Essential.docx
 
Direct Style Effect Systems - The Print[A] Example - A Comprehension Aid
Direct Style Effect Systems -The Print[A] Example- A Comprehension AidDirect Style Effect Systems -The Print[A] Example- A Comprehension Aid
Direct Style Effect Systems - The Print[A] Example - A Comprehension Aid
 
Software Quality Assurance Interview Questions
Software Quality Assurance Interview QuestionsSoftware Quality Assurance Interview Questions
Software Quality Assurance Interview Questions
 
Diamond Application Development Crafting Solutions with Precision
Diamond Application Development Crafting Solutions with PrecisionDiamond Application Development Crafting Solutions with Precision
Diamond Application Development Crafting Solutions with Precision
 
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
 
The Guide to Integrating Generative AI into Unified Continuous Testing Platfo...
The Guide to Integrating Generative AI into Unified Continuous Testing Platfo...The Guide to Integrating Generative AI into Unified Continuous Testing Platfo...
The Guide to Integrating Generative AI into Unified Continuous Testing Platfo...
 

Memory Management with Page Folios

  • 1. * Based on kernel 6.3 (x86_64) – QEMU * 2-socket CPUs (4 cores/socket) * 16GB memory * Kernel parameter: nokaslr norandmaps * KASAN: disabled * Userspace: ASLR is disabled * Legacy BIOS Memory Management with Page Folios Adrian Huang | May, 2023
  • 2. Agenda • Problem description ✓[Background] Normal high-order page & compound page ✓Legacy page cache • Memory folio’s goal • Solution ✓Page cache with memory folio ✓page struct vs folio struct ✓[Example] total_mapcount() implementation difference between legacy approach and folio
  • 3. Normal high-order page & compound page Page (head page) flags |= PG_head Page (Tail page) compound_head compound_dtor compound_order compound_mapcount compound_nr First Tail Page only Page (Tail page) _compound_pad_1 (compound_head) hpage_pinned_refcount deferred_list 2nd Tail Page only Compound Page Page (Tail page) _compound_pad_1 (compound_head) Page Page Page Page Normal high-order page pages = alloc_pages(GFP_KERNEL, 2); Four physically contiguous pages: Init compound page metadata during page allocation pages = alloc_pages(GFP_KERNEL | __GFP_COMP, 2); Four physically contiguous pages: not a compound page
  • 4. Compound page Page (head page) flags |= PG_head Page (Tail page) compound_head compound_dtor compound_order compound_mapcount compound_nr First Tail Page only Page (Tail page) _compound_pad_1 (compound_head) hpage_pinned_refcount deferred_list 2nd Tail Page only . . . Compound Page Page (Tail page) _compound_pad_1 (compound_head) Page (Tail page) _compound_pad_1 (compound_head) Compound page – Use Cases • Mainly used in huge page ✓ hugetlbfs (also called HugeTLB Pages or persistent huge pages) ➢ Reserved inside the kernel and cannot be used for other purposes. ➢ Cannot be swapped out. ➢ Two allocation methods: • Pre-allocated to the kernel huge page pool with appending kernel parameter. • [Dynamically allocated huge pages of the default size] Example: `echo 10 > /proc/sys/vm/nr_hugepages` ➢ User application calls the mmap system call or shared memory system calls (shmget and shmat) to request the huge page allocation. ➢ Used by database for many years ➢ Manual configuration for hugetlb pages is required. ➢ Application change is required. (via open/mmap)
  • 5. Compound page Page (head page) flags |= PG_head Page (Tail page) compound_head compound_dtor compound_order compound_mapcount compound_nr First Tail Page only Page (Tail page) _compound_pad_1 (compound_head) hpage_pinned_refcount deferred_list 2nd Tail Page only . . . Compound Page Page (Tail page) _compound_pad_1 (compound_head) Page (Tail page) _compound_pad_1 (compound_head)
  • 6. Compound page Page (head page) flags |= PG_head Page (Tail page) compound_head compound_dtor compound_order compound_mapcount compound_nr First Tail Page only Page (Tail page) _compound_pad_1 (compound_head) hpage_pinned_refcount deferred_list 2nd Tail Page only . . . Compound Page Page (Tail page) _compound_pad_1 (compound_head) Page (Tail page) _compound_pad_1 (compound_head) Compound page – Use Cases • Mainly used in huge page ✓ Transparent Huge Page (THP) ➢ Support the automatic promotion and demotion of page sizes ➢ Transparent to the application: No need to modify application. ➢ Control via /sys/kernel/mm/transparent_hugepage/enabled:
  • 7. Compound page Page (head page) flags |= PG_head Page (Tail page) compound_head compound_dtor compound_order compound_mapcount compound_nr First Tail Page only Page (Tail page) _compound_pad_1 (compound_head) hpage_pinned_refcount deferred_list 2nd Tail Page only . . . Compound Page Page (Tail page) _compound_pad_1 (compound_head) Page (Tail page) _compound_pad_1 (compound_head) Compound page – Use Cases • Mainly used in huge page • kmalloc: allocation size > 8192 bytes o Check kmalloc_order() • Memory folio When to configure compound page? • Condition: page order >= 1 && __GFP_COMP allocation flag is set • alloc_pages -> … -> prep_new_page -> prep_compound_page
  • 8. Compound page: Problem Description #1 Page (head page) flags |= PG_head Page (Tail page) compound_head compound_dtor compound_order compound_mapcount compound_nr First Tail Page only Page (Tail page) _compound_pad_1 (compound_head) hpage_pinned_refcount deferred_list 2nd Tail Page only . . . Compound Page Page (Tail page) _compound_pad_1 (compound_head) Page (Tail page) _compound_pad_1 (compound_head) [Problem Description] No unified interface: Ambiguity • Some functions may deal with PAGES_SIZE unit (4KB): They’re unaware of compound pages and huge pages • Some functions accept the page head *only* • Some functions accept the page head or page tail ✓ Call compound_head() to get the page head: waste instructions to get the page head → Performance impact ✓ compound_head() users: ➢ get_page(): This function is called quite frequently ➢ put_page(): This function is called quite frequently ➢ …
  • 9. Legacy page cache 4KB 4KB 512B 512B 512B 512B [file] file->f_pos (continuous file position) Page cache 512B 512B 512B 512B 512B 512B 512B 512B 512B 512B 512B 512B 512B 512B 512B 512B 512B 512B 512B 512B sector .. Disk 4KB Page Cache Buffer Cache (Buffer head) Kernel Space read()/write()/sendfile() User Space mmap()
  • 10. Legacy page cache: Problem Description #2 1. Page cache occupies most of memory pages!!! 2. Each page cache (no “compound page” concept) is added to active/inactive lru list • Long lru list: lock contention & cache misses
  • 11. Agenda • Problem description ✓Normal high-order page & compound page ✓Legacy page cache • Memory folio’s goal • Solution ✓Page cache with memory folio ✓page struct vs folio struct ✓[Example] total_mapcount() implementation difference between legacy approach and folio
  • 12. Memory folio’s goal • Unified interface ✓All accesses via folio struct (head page/tail page in a compound page) • [Page Cache] Shorter LRU list ✓Original: one struct page per 4KB to be added to LRU list ✓Folio: one struct page (page head) per 8KB, 16KB, 32KB, 64KB, 128KB….and so on (include THP) to be added to LRU list • [Anonymous page] THP support ✓create_huge_pmd -> do_huge_pmd_anonymous_page -> __do_huge_pmd_anonymous_page * THP: Transparent Huge Page
  • 13. Agenda • Problem description ✓Normal high-order page & compound page ✓Legacy page cache • Memory folio’s goal • Solution ✓Page cache with memory folio ✓page struct vs folio struct ✓[Example] total_mapcount() implementation difference between legacy approach and folio
  • 14. Page cache with folio 4KB 512B … 512B [file] file->f_pos (continuous file position) folio sector … … .. Disk page page page page 4KB 4KB 4KB 4KB 4KB 4KB 4KB 512B … 512B 512B … 512B 512B … 512B 512B … 512B page page page page 512B … 512B 512B … 512B 512B … 512B Kernel Space read()/write()/sendfile() User Space mmap() • Folio is the container of struct page(s) ✓ All accesses via folio struct ✓ No tail page → fewer run-time checks
  • 15. Page cache with folio 4KB 512B … 512B [file] file->f_pos (continuous file position) folio sector … … .. Disk page page page page 4KB 4KB 4KB 4KB 4KB 4KB 4KB 512B … 512B 512B … 512B 512B … 512B 512B … 512B page page page page 512B … 512B 512B … 512B 512B … 512B Kernel Space read()/write()/sendfile() User Space mmap() Folio’s page order: readahead mechanism • CONFIG_TRANSPARENT_HUGEPAGE is enabled ✓ Minimum: order 2 (4 pages) ✓ Maximum: order 9 (512 pages) • CONFIG_TRANSPARENT_HUGEPAGE is disabled ✓ Minimum: order 2 (4 pages) ✓ Maximum: order 8 (256 pages) • Commit 793917d997df (“mm/readahead: Add large folio readahead”): merged in 5.18 kernel • Default readahead size: 128KB (32 pages)
  • 16. Page cache with folio 4KB 512B … 512B [file] file->f_pos (continuous file position) folio sector … … .. Disk page page page page 4KB 4KB 4KB 4KB 4KB 4KB 4KB 512B … 512B 512B … 512B 512B … 512B 512B … 512B page page page page 512B … 512B 512B … 512B 512B … 512B Kernel Space read()/write()/sendfile() User Space mmap() 1. Short LRU list: Only the head page of folio is added LRU list → Performance improvement 2. 45% improvement for lru-file-mmap-read (vm-scalability): Matthew Wilcox’s PDF file Folio’s page order: readahead mechanism • CONFIG_TRANSPARENT_HUGEPAGE is enabled ✓ Minimum: order 2 (4 pages) ✓ Maximum: order 9 (512 pages) • CONFIG_TRANSPARENT_HUGEPAGE is disabled ✓ Minimum: order 2 (4 pages) ✓ Maximum: order 8 (256 pages) • Commit 793917d997df (“mm/readahead: Add large folio readahead”): merged in 5.18 kernel • Default readahead size: 128KB (32 pages)
  • 17. Page cache with folio: backtrace * kernel: 6.3
  • 18. Agenda • Problem description ✓Normal high-order page & compound page ✓Legacy page cache • Memory folio’s goal • Solution ✓Page cache with memory folio ✓page struct vs folio struct ✓[Example] total_mapcount() implementation difference between legacy approach and folio
  • 19. folio flags struct list_head lru void *__filler mlock_count struct address_space *mapping pgoff_t index union void *private atomic_t _mapcount atomic_t _refcount unsigned long memcg_data struct page page struct struct union _flags_1 _head_1 unsigned char _folio_dtor unsigned char _folio_order atomic_t _entire_mapcount atomic_t _nr_pages_mapped atomic_t _pincount unsigned int _folio_nr_pages struct union struct page __page_1 _flags_2 _head_2 void *_hugetlb_subpool void *_hugetlb_cgroup void *_hugetlb_cgroup_rsvd void *_hugetlb_hwpoison _flags_2a _head_2a struct union struct page __page_2 struct struct list_head _deferred_list Page #0 (head) flags … Page #1 (tail) flags compound_head compound_dtor compound_order compound_mapcount compound_nr … Page #N (tail) flags _compound_pad_1 hpage_pinned_refcount deferred_list . . . page struct vs folio struct Compound pages
  • 20. folio flags struct list_head lru void *__filler mlock_count struct address_space *mapping pgoff_t index union void *private atomic_t _mapcount atomic_t _refcount unsigned long memcg_data struct page page struct struct union _flags_1 _head_1 unsigned char _folio_dtor unsigned char _folio_order atomic_t _entire_mapcount atomic_t _nr_pages_mapped atomic_t _pincount unsigned int _folio_nr_pages struct union struct page __page_1 _flags_2 _head_2 void *_hugetlb_subpool void *_hugetlb_cgroup void *_hugetlb_cgroup_rsvd void *_hugetlb_hwpoison _flags_2a _head_2a struct union struct page __page_2 struct struct list_head _deferred_list Page #0 (head) flags … Page #1 (tail) flags compound_head compound_dtor compound_order compound_mapcount compound_nr … Page #2 (tail) flags _compound_pad_1 hpage_pinned_refcount deferred_list . . . page struct vs folio struct Compound pages folio’s benefit • [Example] 512KB compound page ✓ page struct: Need to maintain 128 page structs (1 head page and 127 tail pages) ✓ folio struct: Maintain 3 page structs regardless of the size of compound pages.
  • 21. folio flags struct list_head lru void *__filler mlock_count struct address_space *mapping pgoff_t index union void *private atomic_t _mapcount atomic_t _refcount unsigned long memcg_data struct page page struct struct union _flags_1 _head_1 unsigned char _folio_dtor unsigned char _folio_order atomic_t _entire_mapcount atomic_t _nr_pages_mapped atomic_t _pincount unsigned int _folio_nr_pages struct union struct page __page_1 → won’t be used _flags_2 _head_2 void *_hugetlb_subpool void *_hugetlb_cgroup void *_hugetlb_cgroup_rsvd void *_hugetlb_hwpoison _flags_2a _head_2a struct union struct page __page_2 → won’t be used struct struct list_head _deferred_list page struct vs folio struct folio struct’s members • _entire_mapcount ✓ The compound page is mapped via a single PMD (huge page). • _nr_pages_mapped ✓ Number of individual subpages (PTE: 4KB pages) are mapped. ✓ Scenario: Two processes map the same memory range ✓ One process maps the entire 2MB compound page (Transparent Huge Page - THP): mapped via a single PMD ✓ The other process maps some 4KB pages within this 2MB memory area: mapped via PTEs ✓ Benefit about THP: No need to split the huge page if other processes map 4KB pages within the same memory area. • _folio_nr_pages ✓ Number of pages in this folio. ✓ _folio_nr_pages = 1 << order, where order > 0.
  • 22. folio flags struct list_head lru void *__filler mlock_count struct address_space *mapping pgoff_t index union void *private atomic_t _mapcount atomic_t _refcount unsigned long memcg_data struct page page struct struct union _flags_1 _head_1 unsigned char _folio_dtor unsigned char _folio_order atomic_t _entire_mapcount atomic_t _nr_pages_mapped atomic_t _pincount unsigned int _folio_nr_pages struct union struct page __page_1 _flags_2 _head_2 void *_hugetlb_subpool void *_hugetlb_cgroup void *_hugetlb_cgroup_rsvd void *_hugetlb_hwpoison _flags_2a _head_2a struct union struct page __page_2 struct struct list_head _deferred_list Page #0 (head) flags … Page #1 (tail) flags compound_head compound_dtor compound_order compound_mapcount compound_nr … Page #2 (tail) flags _compound_pad_1 hpage_pinned_refcount deferred_list . . . folio struct vs legacy compound page Page #1 of folio and legacy page struct has the same mapping
  • 23. Agenda • Problem description ✓Normal high-order page & compound page ✓Legacy page cache • Memory folio’s goal • Solution ✓Page cache with memory folio ✓page struct vs folio struct ✓[Example] total_mapcount() implementation difference between legacy approach and folio
  • 24. [kernel v5.11] total_mapcount() page Page cache and anonymous pages struct union page_pool used by netstack struct slab, slob and slub struct Tail pages of compound page struct Second tail page of compound page struct Page table pages: 1. PMD huge PTE 2. x86 pgd page <-> mm_struct struct ZONE_DEVICE pages struct rcu_head: free a page by RCU struct union atomic_t _mapcount: the number of this page is referenced by page table unsigned int page_type unsigned int active: used by slab int units: used by slob atomic_t _refcount … Case 1: Singleton page(s) Get _mapcount directly total_mapcount() users: • Huge page • rmap (reverse mapping)
  • 25. page (head) Page cache and anonymous pages struct union page_pool used by netstack struct slab, slob and slub struct Tail pages of compound page struct Second tail page of compound page struct Page table pages: 1. PMD huge PTE 2. x86 pgd page <-> mm_struct struct ZONE_DEVICE pages struct rcu_head: free a page by RCU struct union atomic_t _mapcount: the number of this page is referenced by page table unsigned int page_type unsigned int active: used by slab int units: used by slob atomic_t _refcount … Case 2: Compound page && hugetlb (hugetlbfs) page page (first tail) compound_head struct union compound_dtor compound_order compound_mapcount compound_nr = 1 << compound_nr struct union atomic_t _mapcount unsigned int page_type unsigned int active int units atomic_t _refcount … . . . struct . . . page (second tail) _compound_pad_1 struct union hpage_pinned_refcount deferred_list struct union atomic_t _mapcount unsigned int page_type unsigned int active int units atomic_t _refcount … . . . struct . . . . . . _compound_pad_1 (compound_head) page (second tail) Get compound_mapcount directly compound_mapcount: • Map count of the whole compound page (does not include mapped sub-pages) Steps: 1. Get the head page based on any page (page head or page tail) 2. Read ‘compound_mapcount’ of the first tail page A. page[1].compound_mapcount [kernel v5.11] total_mapcount()
  • 26. page (head) Page cache and anonymous pages struct union page_pool used by netstack struct slab, slob and slub struct Tail pages of compound page struct Second tail page of compound page struct Page table pages: 1. PMD huge PTE 2. x86 pgd page <-> mm_struct struct ZONE_DEVICE pages struct rcu_head: free a page by RCU struct union atomic_t _mapcount: the number of this page is referenced by page table unsigned int page_type unsigned int active: used by slab int units: used by slob atomic_t _refcount … Case 3: [Anonymous page] Compound page && transparent huge page page (first tail) compound_head struct union compound_dtor compound_order compound_mapcount compound_nr = 1 << compound_nr struct union atomic_t _mapcount unsigned int page_type unsigned int active int units atomic_t _refcount … . . . struct . . . page (second tail) _compound_pad_1 struct union hpage_pinned_refcount deferred_list struct union atomic_t _mapcount unsigned int page_type unsigned int active int units atomic_t _refcount … . . . struct . . . . . . _compound_pad_1 (compound_head) page (second tail) Steps: 1. Get the head page based on any page (page head or page tail) 2. Read ‘compound_mapcount’ of the first tail page A. page[1].compound_mapcount: Map count of the whole compound page 3. `Accumulate each subpage._mapcount`: A. One process maps 2MB range as a single huge page (a single PMD) B. Another process maps 512 individual PTEs 4. `Accumulate each subpage._mapcount` + page[1].compound_mapcount [kernel v5.11] total_mapcount()
  • 27. page (head) Page cache and anonymous pages struct union page_pool used by netstack struct slab, slob and slub struct Tail pages of compound page struct Second tail page of compound page struct Page table pages: 1. PMD huge PTE 2. x86 pgd page <-> mm_struct struct ZONE_DEVICE pages struct rcu_head: free a page by RCU struct union atomic_t _mapcount: the number of this page is referenced by page table unsigned int page_type unsigned int active: used by slab int units: used by slob atomic_t _refcount … Case 3: [Page cache] Compound page && transparent huge page page (first tail) compound_head struct union compound_dtor compound_order compound_mapcount compound_nr = 1 << compound_nr struct union atomic_t _mapcount unsigned int page_type unsigned int active int units atomic_t _refcount … . . . struct . . . page (second tail) _compound_pad_1 struct union hpage_pinned_refcount deferred_list struct union atomic_t _mapcount unsigned int page_type unsigned int active int units atomic_t _refcount … . . . struct . . . . . . _compound_pad_1 (compound_head) page (second tail) Steps: 1. Get the head page based on any page (page head or page tail) 2. Read ‘compound_mapcount’ of the first tail page A. page[1].compound_mapcount: Map count of the whole compound page 3. `Accumulate each subpage._mapcount`: A. One process maps 2MB range as a single huge page (a single PMD) B. Another process maps 512 individual PTEs 4. `Accumulate each subpage._mapcount` + page[1].compound_mapcount - page[1].compound_mapcount * page[1].compound_nr A. File pages has compound_mapcount included in _mapcount [kernel v5.11] total_mapcount()
  • 28. [kernel v6.3] total_mapcount() and folio_mapcount() folio flags struct list_head lru void *__filler mlock_count struct address_space *mapping pgoff_t index union void *private atomic_t _mapcount atomic_t _refcount unsigned long memcg_data struct page page struct struct union _flags_1 _head_1 unsigned char _folio_dtor unsigned char _folio_order atomic_t _entire_mapcount atomic_t _nr_pages_mapped atomic_t _pincount unsigned int _folio_nr_pages struct union struct page __page_1 … void *_hugetlb_subpool void *_hugetlb_cgroup void *_hugetlb_cgroup_rsvd void *_hugetlb_hwpoison _flags_2a _head_2a struct union struct page __page_2 struct struct list_head _deferred_list Case 1: Singleton page – Not a compound page 1 2
  • 29. [kernel v6.3] total_mapcount() and folio_mapcount() folio flags struct list_head lru void *__filler mlock_count struct address_space *mapping pgoff_t index union void *private atomic_t _mapcount atomic_t _refcount unsigned long memcg_data struct page page struct struct union _flags_1 _head_1 unsigned char _folio_dtor unsigned char _folio_order atomic_t _entire_mapcount atomic_t _nr_pages_mapped atomic_t _pincount unsigned int _folio_nr_pages struct union struct page __page_1 … void *_hugetlb_subpool void *_hugetlb_cgroup void *_hugetlb_cgroup_rsvd void *_hugetlb_hwpoison _flags_2a _head_2a struct union struct page __page_2 struct struct list_head _deferred_list Case 2: Compound page is mapped via PMD (huge page) 1 3 2 Get _entire_mapcount 4 5 Get _nr_pages_mapped
  • 30. [kernel v6.3] total_mapcount() and folio_mapcount() folio flags struct list_head lru void *__filler mlock_count struct address_space *mapping pgoff_t index union void *private atomic_t _mapcount atomic_t _refcount unsigned long memcg_data struct page page struct struct union _flags_1 _head_1 unsigned char _folio_dtor unsigned char _folio_order atomic_t _entire_mapcount atomic_t _nr_pages_mapped atomic_t _pincount unsigned int _folio_nr_pages struct union struct page __page_1 … void *_hugetlb_subpool void *_hugetlb_cgroup void *_hugetlb_cgroup_rsvd void *_hugetlb_hwpoison _flags_2a _head_2a struct union struct page __page_2 struct struct list_head _deferred_list Case 3: Compound page is mapped via PMD (huge page) and some subpages are mapped by PTE 1 2 mapcount = folio’s _entire_mapcount + sum(each subpage’s _mapcount)
  • 31. Reference • Memory Folios • LWN - A memory-folio update • LWN - An introduction to compound pages • LWN - Huge pages part 1 (Introduction) • Documentation/mm/transhuge.rst
  • 33. Learn new C standard (C11) from folio: Generic Selection * Reference from: ISO/IEC 9899:201x • C99 defines type-generic macros in the standardized library: the type of argument is detected automatically, and the corresponding function is invoked based on that type. ✓ Example: sqrt(X), ➢ X is double → invoke sqrt() ➢ X is float → invoke sqrtf() ➢ X is long double → invoke sqrtl() • However, programmers cannot define their own type- generic macros in C99. • In C11, programmers can define their own type-generic macros: