SlideShare a Scribd company logo
1 of 23
A Robust and Flexible Operating System
Compatibility Architecture
Takahiro Shinagawa Shinichi HonidenYuichi NishiwakiTakaya Saeki*
* Mr. Saeki is currently at Microsoft Development Co., Ltd.
The 16th ACM SIGPLAN/SIGOPS International Conference on Virtual Execution Environments (VEE 2020, March 17, 2020)
Running Applications on Another OS
• Useful in various cases
(a) Running commercial binary applications
• Only supplied in the binary format of a specific OS
(b) Using a build environment with several toolchains and libraries
• E.g., building Linux applications on macOS
A Robust and Flexible Operating System Compatibility Architecture (VEE 2020, March 17, 2020) 2
OS1 OS2
Application
Binary
for OS1
Application
Binary
for OS1
(a) Running commercial binary applications (b) Building Linux applications on macOS
macOS
Build environment
with toolchains and libraries
Existing Approaches
A) Porting applications
👍 No runtime overhead
👎 Much porting effort (by developers and users)
B) Running applications with the OS in a virtual machine (VM)
👍 No porting effort, strong isolation, …
👎 Challenges in resource sharing due to the existence of two OS kernels
A Robust and Flexible Operating System Compatibility Architecture (VEE 2020, March 17, 2020) 3
OS1 OS2
Application
Binary
for OS1
Application
Binary
for OS2
much
effort
Guest OS
Host OS
Guest
Application
A) Porting applications B) Running apps and OS in a VM
challenges in seamless
resource sharing
Using OS Compatibility Layers
• No porting effort
• Absorb the differences between the guest and host interfaces
• Seamless resource management
• The host OS manages both the guest and host resources in the same way
• Guest and host applications can communicate seamlessly via host system calls
A Robust and Flexible Operating System Compatibility Architecture (VEE 2020, March 17, 2020) 4
Host OS
Guest Application
Binary
OS Compatibility Layer ← converting system calls from guest to host
Host ApplicationsHost ApplicationsHost Applications
← managing resources of guest and host apps
Kernel-space v.s. User-space Implementations
• Kernel space
👍 Flexibility to achieve binary compatibility
• System calls and memory management can be easily handled
👎 Vulnerability against bugs in OS compatibility layers
• A bug could lead to system crashes
• User space
👍 Robustness against bugs
• Bugs do not affect the OS stability
👎 Inflexibility to achieve full compatibility
• E.g., copy-on-write not implemented in Cygwin
A Robust and Flexible Operating System Compatibility Architecture (VEE 2020, March 17, 2020) 5
Host OS Kernel
Guest Application Binary
OS Compatibility Layer
Host OS Kernel
Guest
Application
Binary
OS
Compat.
Layer
Proposed OS Compatibility Architecture
• Running each guest process in a VM (without its OS kernel)
👍 Robustness
• Most part of OS compatibility layers can be implemented in user space
• Bugs do not cause kernel crashes
👍 Flexibility
• Hardware virtualization technology provides low-layer event handling functionalities
• E.g., trapping system calls and page faults, manipulating page tables, …
A Robust and Flexible Operating System Compatibility Architecture (VEE 2020, March 17, 2020) 6
Host OS Kernel
OS Compatibility Layer
VMHost Process
Standardized Virtualization Interface
Guest Application Process
CPUHardware Virtualization Function
Overall Design
• Three main components
• Guest VM
• VMM module
• Monitor process
A Robust and Flexible Operating System Compatibility Architecture (VEE 2020, March 17, 2020) 7
monitor process guest process
Guest VMs
kernel
emulate
system calls
User
Space
Kernel
space
trap
system calls &
exceptions
no kernel
upcall
monitor
VMM module
manage
VMs
Host OS
Guest VM
• An instance of the guest execution environment
• Run on CPU directly
• Without overhead
• Provide a virtual address space
• With a host-managed page table
• Trap events
• E.g., system calls, page faults, …
Assume hardware-assisted virtualization
• E.g., Intel VT and AMD SVM
A Robust and Flexible Operating System Compatibility Architecture (VEE 2020, March 17, 2020) 8
monitor process guest process
Guest VMs
kernel
emulate
system calls
User
Space
Kernel
space
trap
system calls &
page faults
no kernel
upcall
monitor
VMM module
manage
VMs
Host OS
VMM module
• A small driver of hardware-assisted virtualization
• Provide API to manage VMs
• Create and destroy VMs and vCPUs
• Read and write VM states
• Manipulate page tables
• Manage control transfer
• Assume its own robustness
• Small size
• No kernel dependency
• Widely available
• E.g., Apple Hypervisor.framework, Windows Hypervisor Platform, KVM, …
A Robust and Flexible Operating System Compatibility Architecture (VEE 2020, March 17, 2020) 9
monitor process guest process
Guest VMs
kernel
emulate
system calls
User
Space
Kernel
space
trap
system calls &
page faults
no kernel
upcall
monitor
VMM module
manage
VMs
Host OS
Monitor Process
• The main component to implement OS compatibility functions
• Prepare execution environments
• Create a VM and setup the page table
• Load the program
• Emulate guest system calls
• With host system calls
• Clean up on exit
• Free allocated resources
• Destroy the VM
• Terminate itself
A Robust and Flexible Operating System Compatibility Architecture (VEE 2020, March 17, 2020) 10
monitor process guest process
Guest VMs
kernel
emulate
system calls
User
Space
Kernel
space
trap
system calls &
page faults
no kernel
upcall
monitor
VMM module
manage
VMs
Host OS
Advantages of Our Architecture
• Robustness
• User-space implementation: bugs of the monitor process will not cause crashes
• Flexibility
• Compatibility: achieve full binary compatibility
• Performance: copy-on-write can be supported
• Others (of OS Compatibility layers)
• Development cost
• Rich host OS functionalities: system calls, libraries, high-level languages, …
• Seamlessness
• Single kernel: share the same file system, IPCs between guest and host, process
scheduling, resource management, ...
A Robust and Flexible Operating System Compatibility Architecture (VEE 2020, March 17, 2020) 11
Implementation
• Target Linux 4.6 of x86-64 (Intel VT-x)
• Noah: Linux compatibility layer for macOS
• Run on macOS 10.12 Sierra or higher
• Use Apple Hypervisor.framework as the VMM module
• NoahW: Linux compatibility layer for Windows (preliminary)
• Run on Windows 7 or higher
• Use Intel Hardware Accelerated Execution Manager as the VMM module
A Robust and Flexible Operating System Compatibility Architecture (VEE 2020, March 17, 2020) 12
Boot and Load
(1) Create a VM
• Through the VMM module
(2) Setup the VM state
• Start in x86-64 long mode directly
• Trap SYSCALL instructions
• Use trampoline code in Windows
(3) Load the Linux ELF loader (ld.so)
• Using our original ELF loader
• Using the internal mmap() to map the ELF file
(4) Pass the control to the Linux ELF loader
A Robust and Flexible Operating System Compatibility Architecture (VEE 2020, March 17, 2020) 13
guest program
(1) Create VM
VM state
Mode: longmode
IA32_EFER.SCE: 0
VMM module
(2) Setup VM state
ld.so
(3) Load the
ELF loader
monitor process
ELF loader
Memory Management
• Two page tables
(a) Guest page table in the VM
(b) Nested page table (EPT) in the VMM
• Fix (a) and modify (b)
• (a) is fixed to the straight mapping
• Virtual address = Physical address
• (b) can be manipulated with the API
• Provided by the VMM module
Limitation: GVA is up to 512 GiB
• 39-bit physical address in Intel CPU
• 48-bit virtual address
• Stack is moved to the lower address
• No kernel area
A Robust and Flexible Operating System Compatibility Architecture (VEE 2020, March 17, 2020) 14
511 GiB
0
GVA GPA HPA
GVA: Guest Virtual Address
GPA: Guest Physical Address
HPA: Host Physical Address
512 GiB
Guest
page table
(fixed)
Nested
page table
(modified)
1-GiB guest system data area
(page tables, segment descriptors, …)
Process Management (fork)
• Noah (on macOS)
• Implement a subset of clone()
• Apple Hypervisor.framework does not support fork() with a VM
• Save and destroy the VM before fork()
• Restore the VM after fork()
• NoahW (on Windows)
• Implement fork() with copy-on-write using shared memory and virtualization
• Create a memory region shared among monitor processes
• Save, restore, and modify the VM states on fork()
• Trap page faults in the VMs to implement copy-on-write
A Robust and Flexible Operating System Compatibility Architecture (VEE 2020, March 17, 2020) 15
Evaluation
• Performance
• Primitive benchmark: CPU cycles of the dup() system call on macOS
• Micro benchmark: lmbench-3.0-a9 on macOS
• Macro benchmark: Phoronix Test Suite v7.6.0 on macOS
• Performance comparison of OS compatibility layers on Windows
• Setup
• MacBook Pro (13-inch, 2017) for Noah
• Intel Core i7-7567U, 16GB DDR3 memory, 500GB SSD
• Ubuntu 16.04 on macOS High Sierra 10.13.3
• Surface Pro 2017 for NoahW
• Intel Core i7-7660U, 16GB DDR3 memory, 512GB SSD
• Windows 10 Professional Version 1709 and HAXM 6.2.1
A Robust and Flexible Operating System Compatibility Architecture (VEE 2020, March 17, 2020) 16
Primitive Benchmark: dup() system call
A Robust and Flexible Operating System Compatibility Architecture (VEE 2020, March 17, 2020) 17
270
3202520
11091
1330
7044
2770
2118
588
5504
2809
11091
251
297
0
5,000
10,000
15,000
20,000
25,000
30,000
35,000
40,000
macOS Windows
CycleNumber
VM enter
downcall
post-process
host syscall
pre-process
upcall
VM exit
Micro Benchmark: lmbench (processor, process)
A Robust and Flexible Operating System Compatibility Architecture (VEE 2020, March 17, 2020) 18
410%
310%
175%
172%
13%
329%
239%
256%
-24%
46%
-100% 0% 100% 200% 300% 400%
null call
null I/O
stat
open clos
slct TCP
sig inst
sig hndl
fork proc
exec proc
sh proc
Micro Benchmark: lmbench (File & VM latency)
A Robust and Flexible Operating System Compatibility Architecture (VEE 2020, March 17, 2020) 19
42%
5%
17%
4%
28%
-45%
-92%
8%
-100% -50% 0% 50%
0K Create
0K Delete
10K Create
10K Delete
Mmap Latency
Prot Fault
Page Fault
100fd selct
Macro Benchmark (Phoronix Test Suite + α)
A Robust and Flexible Operating System Compatibility Architecture (VEE 2020, March 17, 2020) 20
16%
-23%
-4%
50%
-58%
9%
-200% -100% 0% 100% 200%
Linux kernel build
unpack-linux
postmark
sqlite
openssl
compress-7zip
Comparison of OS Compatibility Layers
Benchmark NoahW Cygwin WSL1
dup2() [call per second] 36,723 556,453 693,309
write() [call per second] 0.30 0.56 0.57
fork() (0 MiB array) [ms] 106.4 219.4 2.06
fork() (512 MiB array) [ms] 338.9 789.9 32.51
fork() (1 GiB array) [ms] 458.4 1531.8 62.66
A Robust and Flexible Operating System Compatibility Architecture (VEE 2020, March 17, 2020) 21
Conclusion
• Proposed a novel OS compatibility architecture
• Exploited the OS-standard virtualization technology support
• Achieved both robustness and flexibility
• Consist of three components
• VMs to run guest processes
• The VMM module to provide API for hardware virtualization technology
• Monitor processes to implement OS compatibility functions
• Run Linux binary on macOS and Windows (preliminary)
• Noah implemented 172 out of 329 Linux system calls
• The overhead of Linux kernel build time on Noah was 16%
A Robust and Flexible Operating System Compatibility Architecture (VEE 2020, March 17, 2020) 22
Acknowledgement and Availability
• This work was partially supported by Exploratory IT Human
Resources Project (MITOU Program) of Information technology
Promotion Agency, Japan (IPA) in the fiscal 2016 and JSPS
KAKENHI
• Noah is publicly available from https://github.com/linux-noah/noah
under the MIT / GPL dual licenses
A Robust and Flexible Operating System Compatibility Architecture (VEE 2020, March 17, 2020) 23

More Related Content

What's hot

Container Network Interface: Network Plugins for Kubernetes and beyond
Container Network Interface: Network Plugins for Kubernetes and beyondContainer Network Interface: Network Plugins for Kubernetes and beyond
Container Network Interface: Network Plugins for Kubernetes and beyondKubeAcademy
 
Giới thiệu docker và ứng dụng trong ci-cd
Giới thiệu docker và ứng dụng trong ci-cdGiới thiệu docker và ứng dụng trong ci-cd
Giới thiệu docker và ứng dụng trong ci-cdGMO-Z.com Vietnam Lab Center
 
Traffic Control with Envoy Proxy
Traffic Control with Envoy ProxyTraffic Control with Envoy Proxy
Traffic Control with Envoy ProxyMark McBride
 
[KubeConEU2023] Lima pavilion
[KubeConEU2023] Lima pavilion[KubeConEU2023] Lima pavilion
[KubeConEU2023] Lima pavilionAkihiro Suda
 
Tadx - Présentation Conteneurisation
Tadx -  Présentation ConteneurisationTadx -  Présentation Conteneurisation
Tadx - Présentation ConteneurisationTADx
 
L’hyperconvergence au cœur du Software-defined data center
L’hyperconvergence au cœur du Software-defined data centerL’hyperconvergence au cœur du Software-defined data center
L’hyperconvergence au cœur du Software-defined data centerColloqueRISQ
 
[KubeCon EU 2022] Running containerd and k3s on macOS
[KubeCon EU 2022] Running containerd and k3s on macOS[KubeCon EU 2022] Running containerd and k3s on macOS
[KubeCon EU 2022] Running containerd and k3s on macOSAkihiro Suda
 
Architectures de virtualisation
Architectures de virtualisationArchitectures de virtualisation
Architectures de virtualisationAntoine Benkemoun
 
Docker 사내교육 자료
Docker 사내교육 자료Docker 사내교육 자료
Docker 사내교육 자료Juneyoung Oh
 
Helm - Application deployment management for Kubernetes
Helm - Application deployment management for KubernetesHelm - Application deployment management for Kubernetes
Helm - Application deployment management for KubernetesAlexei Ledenev
 
1. Docker Introduction.pdf
1. Docker Introduction.pdf1. Docker Introduction.pdf
1. Docker Introduction.pdfAmarGautam15
 
Présentation docker et kubernetes
Présentation docker et kubernetesPrésentation docker et kubernetes
Présentation docker et kubernetesKiwi Backup
 
Appalications JEE avec Servlet/JSP
Appalications JEE avec Servlet/JSPAppalications JEE avec Servlet/JSP
Appalications JEE avec Servlet/JSPYouness Boukouchi
 
Routed Fabrics For Ceph
Routed Fabrics For CephRouted Fabrics For Ceph
Routed Fabrics For CephShapeBlue
 
Transactions and Concurrency Control Patterns
Transactions and Concurrency Control PatternsTransactions and Concurrency Control Patterns
Transactions and Concurrency Control PatternsJ On The Beach
 

What's hot (20)

Container Network Interface: Network Plugins for Kubernetes and beyond
Container Network Interface: Network Plugins for Kubernetes and beyondContainer Network Interface: Network Plugins for Kubernetes and beyond
Container Network Interface: Network Plugins for Kubernetes and beyond
 
Sockets
SocketsSockets
Sockets
 
Giới thiệu docker và ứng dụng trong ci-cd
Giới thiệu docker và ứng dụng trong ci-cdGiới thiệu docker và ứng dụng trong ci-cd
Giới thiệu docker và ứng dụng trong ci-cd
 
Traffic Control with Envoy Proxy
Traffic Control with Envoy ProxyTraffic Control with Envoy Proxy
Traffic Control with Envoy Proxy
 
[KubeConEU2023] Lima pavilion
[KubeConEU2023] Lima pavilion[KubeConEU2023] Lima pavilion
[KubeConEU2023] Lima pavilion
 
Ansible tp
Ansible tpAnsible tp
Ansible tp
 
Tadx - Présentation Conteneurisation
Tadx -  Présentation ConteneurisationTadx -  Présentation Conteneurisation
Tadx - Présentation Conteneurisation
 
L’hyperconvergence au cœur du Software-defined data center
L’hyperconvergence au cœur du Software-defined data centerL’hyperconvergence au cœur du Software-defined data center
L’hyperconvergence au cœur du Software-defined data center
 
[KubeCon EU 2022] Running containerd and k3s on macOS
[KubeCon EU 2022] Running containerd and k3s on macOS[KubeCon EU 2022] Running containerd and k3s on macOS
[KubeCon EU 2022] Running containerd and k3s on macOS
 
Architectures de virtualisation
Architectures de virtualisationArchitectures de virtualisation
Architectures de virtualisation
 
Docker 사내교육 자료
Docker 사내교육 자료Docker 사내교육 자료
Docker 사내교육 자료
 
Helm - Application deployment management for Kubernetes
Helm - Application deployment management for KubernetesHelm - Application deployment management for Kubernetes
Helm - Application deployment management for Kubernetes
 
1. Docker Introduction.pdf
1. Docker Introduction.pdf1. Docker Introduction.pdf
1. Docker Introduction.pdf
 
Présentation docker et kubernetes
Présentation docker et kubernetesPrésentation docker et kubernetes
Présentation docker et kubernetes
 
Appalications JEE avec Servlet/JSP
Appalications JEE avec Servlet/JSPAppalications JEE avec Servlet/JSP
Appalications JEE avec Servlet/JSP
 
Tp n 4 linux
Tp n 4 linuxTp n 4 linux
Tp n 4 linux
 
Cloud_2022.pdf
Cloud_2022.pdfCloud_2022.pdf
Cloud_2022.pdf
 
Routed Fabrics For Ceph
Routed Fabrics For CephRouted Fabrics For Ceph
Routed Fabrics For Ceph
 
Demystfying container-networking
Demystfying container-networkingDemystfying container-networking
Demystfying container-networking
 
Transactions and Concurrency Control Patterns
Transactions and Concurrency Control PatternsTransactions and Concurrency Control Patterns
Transactions and Concurrency Control Patterns
 

Similar to A Robust and Flexible Operating System Compatibility Architecture

Noah - Robust and Flexible Operating System Compatibility Architecture - Cont...
Noah - Robust and Flexible Operating System Compatibility Architecture - Cont...Noah - Robust and Flexible Operating System Compatibility Architecture - Cont...
Noah - Robust and Flexible Operating System Compatibility Architecture - Cont...Takaya Saeki
 
Bridging the Semantic Gap in Virtualized Environment
Bridging the Semantic Gap in Virtualized EnvironmentBridging the Semantic Gap in Virtualized Environment
Bridging the Semantic Gap in Virtualized EnvironmentAndy Lee
 
Packaging tool options
Packaging tool optionsPackaging tool options
Packaging tool optionsLen Bass
 
Techdays SE 2016 - Micros.. err Microcosmos
Techdays SE 2016 - Micros.. err MicrocosmosTechdays SE 2016 - Micros.. err Microcosmos
Techdays SE 2016 - Micros.. err MicrocosmosMike Martin
 
Lecture 1-vs.pptx.......................
Lecture 1-vs.pptx.......................Lecture 1-vs.pptx.......................
Lecture 1-vs.pptx.......................HassamShahid2
 
Web Assembly Big Picture
Web Assembly Big PictureWeb Assembly Big Picture
Web Assembly Big PictureYousif Shalaby
 
Introduction to Containers - SQL Server and Docker
Introduction to Containers - SQL Server and DockerIntroduction to Containers - SQL Server and Docker
Introduction to Containers - SQL Server and DockerChris Taylor
 
Open source and cross platform .net
Open source and cross platform .netOpen source and cross platform .net
Open source and cross platform .netIbon Landa
 
Twine: An Embedded Trusted Runtime for WebAssembly - Presentation slides
Twine: An Embedded Trusted Runtime for WebAssembly - Presentation slidesTwine: An Embedded Trusted Runtime for WebAssembly - Presentation slides
Twine: An Embedded Trusted Runtime for WebAssembly - Presentation slidesJämes Ménétrey
 
javalightspeed-jakartatech-2023.pdf
javalightspeed-jakartatech-2023.pdfjavalightspeed-jakartatech-2023.pdf
javalightspeed-jakartatech-2023.pdfRichHagarty
 
VIRTUAL MACHINE VERSATILE PLATFORM01~chapter 1 (1).ppt
VIRTUAL MACHINE VERSATILE PLATFORM01~chapter 1 (1).pptVIRTUAL MACHINE VERSATILE PLATFORM01~chapter 1 (1).ppt
VIRTUAL MACHINE VERSATILE PLATFORM01~chapter 1 (1).pptnagarajans87
 
Virtualization using VMWare Workstation
Virtualization using VMWare WorkstationVirtualization using VMWare Workstation
Virtualization using VMWare WorkstationHitesh Gupta
 
What's new in System Center 2012 R2: Virtual Machine Manager
What's new in System Center 2012 R2: Virtual Machine ManagerWhat's new in System Center 2012 R2: Virtual Machine Manager
What's new in System Center 2012 R2: Virtual Machine ManagerTomica Kaniski
 
stackconf 2020 | Replace your Docker based Containers with Cri-o Kata Contain...
stackconf 2020 | Replace your Docker based Containers with Cri-o Kata Contain...stackconf 2020 | Replace your Docker based Containers with Cri-o Kata Contain...
stackconf 2020 | Replace your Docker based Containers with Cri-o Kata Contain...NETWAYS
 
Introduction to ASP.NET 5
Introduction to ASP.NET 5Introduction to ASP.NET 5
Introduction to ASP.NET 5mbaric
 
Cloud-computing.ppt
Cloud-computing.pptCloud-computing.ppt
Cloud-computing.pptAjit Mali
 
VMworld 2014: The Software-Defined Datacenter, VMs, and Containers
VMworld 2014: The Software-Defined Datacenter, VMs, and ContainersVMworld 2014: The Software-Defined Datacenter, VMs, and Containers
VMworld 2014: The Software-Defined Datacenter, VMs, and ContainersVMworld
 
Operational Best Practices in the Cloud
Operational Best Practices in the CloudOperational Best Practices in the Cloud
Operational Best Practices in the CloudRightScale
 

Similar to A Robust and Flexible Operating System Compatibility Architecture (20)

Noah - Robust and Flexible Operating System Compatibility Architecture - Cont...
Noah - Robust and Flexible Operating System Compatibility Architecture - Cont...Noah - Robust and Flexible Operating System Compatibility Architecture - Cont...
Noah - Robust and Flexible Operating System Compatibility Architecture - Cont...
 
Bridging the Semantic Gap in Virtualized Environment
Bridging the Semantic Gap in Virtualized EnvironmentBridging the Semantic Gap in Virtualized Environment
Bridging the Semantic Gap in Virtualized Environment
 
SynapseIndia java and .net development
SynapseIndia java and .net developmentSynapseIndia java and .net development
SynapseIndia java and .net development
 
Packaging tool options
Packaging tool optionsPackaging tool options
Packaging tool options
 
Techdays SE 2016 - Micros.. err Microcosmos
Techdays SE 2016 - Micros.. err MicrocosmosTechdays SE 2016 - Micros.. err Microcosmos
Techdays SE 2016 - Micros.. err Microcosmos
 
Lecture 1-vs.pptx.......................
Lecture 1-vs.pptx.......................Lecture 1-vs.pptx.......................
Lecture 1-vs.pptx.......................
 
Web Assembly Big Picture
Web Assembly Big PictureWeb Assembly Big Picture
Web Assembly Big Picture
 
Introduction to Containers - SQL Server and Docker
Introduction to Containers - SQL Server and DockerIntroduction to Containers - SQL Server and Docker
Introduction to Containers - SQL Server and Docker
 
Open source and cross platform .net
Open source and cross platform .netOpen source and cross platform .net
Open source and cross platform .net
 
Twine: An Embedded Trusted Runtime for WebAssembly - Presentation slides
Twine: An Embedded Trusted Runtime for WebAssembly - Presentation slidesTwine: An Embedded Trusted Runtime for WebAssembly - Presentation slides
Twine: An Embedded Trusted Runtime for WebAssembly - Presentation slides
 
javalightspeed-jakartatech-2023.pdf
javalightspeed-jakartatech-2023.pdfjavalightspeed-jakartatech-2023.pdf
javalightspeed-jakartatech-2023.pdf
 
VIRTUAL MACHINE VERSATILE PLATFORM01~chapter 1 (1).ppt
VIRTUAL MACHINE VERSATILE PLATFORM01~chapter 1 (1).pptVIRTUAL MACHINE VERSATILE PLATFORM01~chapter 1 (1).ppt
VIRTUAL MACHINE VERSATILE PLATFORM01~chapter 1 (1).ppt
 
Virtualization using VMWare Workstation
Virtualization using VMWare WorkstationVirtualization using VMWare Workstation
Virtualization using VMWare Workstation
 
What's new in System Center 2012 R2: Virtual Machine Manager
What's new in System Center 2012 R2: Virtual Machine ManagerWhat's new in System Center 2012 R2: Virtual Machine Manager
What's new in System Center 2012 R2: Virtual Machine Manager
 
Usenix Invited Talk
Usenix Invited TalkUsenix Invited Talk
Usenix Invited Talk
 
stackconf 2020 | Replace your Docker based Containers with Cri-o Kata Contain...
stackconf 2020 | Replace your Docker based Containers with Cri-o Kata Contain...stackconf 2020 | Replace your Docker based Containers with Cri-o Kata Contain...
stackconf 2020 | Replace your Docker based Containers with Cri-o Kata Contain...
 
Introduction to ASP.NET 5
Introduction to ASP.NET 5Introduction to ASP.NET 5
Introduction to ASP.NET 5
 
Cloud-computing.ppt
Cloud-computing.pptCloud-computing.ppt
Cloud-computing.ppt
 
VMworld 2014: The Software-Defined Datacenter, VMs, and Containers
VMworld 2014: The Software-Defined Datacenter, VMs, and ContainersVMworld 2014: The Software-Defined Datacenter, VMs, and Containers
VMworld 2014: The Software-Defined Datacenter, VMs, and Containers
 
Operational Best Practices in the Cloud
Operational Best Practices in the CloudOperational Best Practices in the Cloud
Operational Best Practices in the Cloud
 

More from Shinagawa Laboratory, The University of Tokyo

ライブマイグレーションにおけるサブページ書き込み保護の評価
ライブマイグレーションにおけるサブページ書き込み保護の評価ライブマイグレーションにおけるサブページ書き込み保護の評価
ライブマイグレーションにおけるサブページ書き込み保護の評価Shinagawa Laboratory, The University of Tokyo
 

More from Shinagawa Laboratory, The University of Tokyo (11)

Towards Isolated Execution at the Machine Level
Towards Isolated Execution at the Machine LevelTowards Isolated Execution at the Machine Level
Towards Isolated Execution at the Machine Level
 
DMAFV: Testing Device Drivers against DMA Faults
DMAFV: Testing Device Drivers against DMA FaultsDMAFV: Testing Device Drivers against DMA Faults
DMAFV: Testing Device Drivers against DMA Faults
 
Deriving Optimal Deep Learning Models for Image-based Malware Classification
Deriving Optimal Deep Learning Models for Image-based Malware ClassificationDeriving Optimal Deep Learning Models for Image-based Malware Classification
Deriving Optimal Deep Learning Models for Image-based Malware Classification
 
遅延レイヤ取得による高互換コンテナ起動高速化手法
遅延レイヤ取得による高互換コンテナ起動高速化手法遅延レイヤ取得による高互換コンテナ起動高速化手法
遅延レイヤ取得による高互換コンテナ起動高速化手法
 
ライブマイグレーションにおけるサブページ書き込み保護の評価
ライブマイグレーションにおけるサブページ書き込み保護の評価ライブマイグレーションにおけるサブページ書き込み保護の評価
ライブマイグレーションにおけるサブページ書き込み保護の評価
 
FaultVisor2: Testing Hypervisor Device Drivers against Real Hardware Failures
FaultVisor2: Testing Hypervisor Device Drivers against Real Hardware FailuresFaultVisor2: Testing Hypervisor Device Drivers against Real Hardware Failures
FaultVisor2: Testing Hypervisor Device Drivers against Real Hardware Failures
 
Distributed Denial of Service Attack Prevention at Source Machines
Distributed Denial of Service Attack Prevention at Source MachinesDistributed Denial of Service Attack Prevention at Source Machines
Distributed Denial of Service Attack Prevention at Source Machines
 
The Quick Migration of File Servers
The Quick Migration of File ServersThe Quick Migration of File Servers
The Quick Migration of File Servers
 
Unified Hardware Abstraction Layer with Device Masquerade
Unified Hardware Abstraction Layer with Device MasqueradeUnified Hardware Abstraction Layer with Device Masquerade
Unified Hardware Abstraction Layer with Device Masquerade
 
BMCArmor: A Hardware Protection Scheme for Bare-metal Clouds
BMCArmor: A Hardware Protection Scheme for Bare-metal CloudsBMCArmor: A Hardware Protection Scheme for Bare-metal Clouds
BMCArmor: A Hardware Protection Scheme for Bare-metal Clouds
 
VM-aware Adaptive Storage Cache Prefetching
VM-aware Adaptive Storage Cache PrefetchingVM-aware Adaptive Storage Cache Prefetching
VM-aware Adaptive Storage Cache Prefetching
 

Recently uploaded

Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...MyIntelliSource, Inc.
 
Dealing with Cultural Dispersion — Stefano Lambiase — ICSE-SEIS 2024
Dealing with Cultural Dispersion — Stefano Lambiase — ICSE-SEIS 2024Dealing with Cultural Dispersion — Stefano Lambiase — ICSE-SEIS 2024
Dealing with Cultural Dispersion — Stefano Lambiase — ICSE-SEIS 2024StefanoLambiase
 
React Server Component in Next.js by Hanief Utama
React Server Component in Next.js by Hanief UtamaReact Server Component in Next.js by Hanief Utama
React Server Component in Next.js by Hanief UtamaHanief Utama
 
ODSC - Batch to Stream workshop - integration of Apache Spark, Cassandra, Pos...
ODSC - Batch to Stream workshop - integration of Apache Spark, Cassandra, Pos...ODSC - Batch to Stream workshop - integration of Apache Spark, Cassandra, Pos...
ODSC - Batch to Stream workshop - integration of Apache Spark, Cassandra, Pos...Christina Lin
 
Advancing Engineering with AI through the Next Generation of Strategic Projec...
Advancing Engineering with AI through the Next Generation of Strategic Projec...Advancing Engineering with AI through the Next Generation of Strategic Projec...
Advancing Engineering with AI through the Next Generation of Strategic Projec...OnePlan Solutions
 
chapter--4-software-project-planning.ppt
chapter--4-software-project-planning.pptchapter--4-software-project-planning.ppt
chapter--4-software-project-planning.pptkotipi9215
 
Adobe Marketo Engage Deep Dives: Using Webhooks to Transfer Data
Adobe Marketo Engage Deep Dives: Using Webhooks to Transfer DataAdobe Marketo Engage Deep Dives: Using Webhooks to Transfer Data
Adobe Marketo Engage Deep Dives: Using Webhooks to Transfer DataBradBedford3
 
Automate your Kamailio Test Calls - Kamailio World 2024
Automate your Kamailio Test Calls - Kamailio World 2024Automate your Kamailio Test Calls - Kamailio World 2024
Automate your Kamailio Test Calls - Kamailio World 2024Andreas Granig
 
Intelligent Home Wi-Fi Solutions | ThinkPalm
Intelligent Home Wi-Fi Solutions | ThinkPalmIntelligent Home Wi-Fi Solutions | ThinkPalm
Intelligent Home Wi-Fi Solutions | ThinkPalmSujith Sukumaran
 
Implementing Zero Trust strategy with Azure
Implementing Zero Trust strategy with AzureImplementing Zero Trust strategy with Azure
Implementing Zero Trust strategy with AzureDinusha Kumarasiri
 
software engineering Chapter 5 System modeling.pptx
software engineering Chapter 5 System modeling.pptxsoftware engineering Chapter 5 System modeling.pptx
software engineering Chapter 5 System modeling.pptxnada99848
 
Professional Resume Template for Software Developers
Professional Resume Template for Software DevelopersProfessional Resume Template for Software Developers
Professional Resume Template for Software DevelopersVinodh Ram
 
GOING AOT WITH GRAALVM – DEVOXX GREECE.pdf
GOING AOT WITH GRAALVM – DEVOXX GREECE.pdfGOING AOT WITH GRAALVM – DEVOXX GREECE.pdf
GOING AOT WITH GRAALVM – DEVOXX GREECE.pdfAlina Yurenko
 
Salesforce Certified Field Service Consultant
Salesforce Certified Field Service ConsultantSalesforce Certified Field Service Consultant
Salesforce Certified Field Service ConsultantAxelRicardoTrocheRiq
 
Russian Call Girls in Karol Bagh Aasnvi ➡️ 8264348440 💋📞 Independent Escort S...
Russian Call Girls in Karol Bagh Aasnvi ➡️ 8264348440 💋📞 Independent Escort S...Russian Call Girls in Karol Bagh Aasnvi ➡️ 8264348440 💋📞 Independent Escort S...
Russian Call Girls in Karol Bagh Aasnvi ➡️ 8264348440 💋📞 Independent Escort S...soniya singh
 
EY_Graph Database Powered Sustainability
EY_Graph Database Powered SustainabilityEY_Graph Database Powered Sustainability
EY_Graph Database Powered SustainabilityNeo4j
 
Asset Management Software - Infographic
Asset Management Software - InfographicAsset Management Software - Infographic
Asset Management Software - InfographicHr365.us smith
 
What is Fashion PLM and Why Do You Need It
What is Fashion PLM and Why Do You Need ItWhat is Fashion PLM and Why Do You Need It
What is Fashion PLM and Why Do You Need ItWave PLM
 
MYjobs Presentation Django-based project
MYjobs Presentation Django-based projectMYjobs Presentation Django-based project
MYjobs Presentation Django-based projectAnoyGreter
 

Recently uploaded (20)

Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...
 
Dealing with Cultural Dispersion — Stefano Lambiase — ICSE-SEIS 2024
Dealing with Cultural Dispersion — Stefano Lambiase — ICSE-SEIS 2024Dealing with Cultural Dispersion — Stefano Lambiase — ICSE-SEIS 2024
Dealing with Cultural Dispersion — Stefano Lambiase — ICSE-SEIS 2024
 
React Server Component in Next.js by Hanief Utama
React Server Component in Next.js by Hanief UtamaReact Server Component in Next.js by Hanief Utama
React Server Component in Next.js by Hanief Utama
 
ODSC - Batch to Stream workshop - integration of Apache Spark, Cassandra, Pos...
ODSC - Batch to Stream workshop - integration of Apache Spark, Cassandra, Pos...ODSC - Batch to Stream workshop - integration of Apache Spark, Cassandra, Pos...
ODSC - Batch to Stream workshop - integration of Apache Spark, Cassandra, Pos...
 
Advancing Engineering with AI through the Next Generation of Strategic Projec...
Advancing Engineering with AI through the Next Generation of Strategic Projec...Advancing Engineering with AI through the Next Generation of Strategic Projec...
Advancing Engineering with AI through the Next Generation of Strategic Projec...
 
chapter--4-software-project-planning.ppt
chapter--4-software-project-planning.pptchapter--4-software-project-planning.ppt
chapter--4-software-project-planning.ppt
 
Adobe Marketo Engage Deep Dives: Using Webhooks to Transfer Data
Adobe Marketo Engage Deep Dives: Using Webhooks to Transfer DataAdobe Marketo Engage Deep Dives: Using Webhooks to Transfer Data
Adobe Marketo Engage Deep Dives: Using Webhooks to Transfer Data
 
Automate your Kamailio Test Calls - Kamailio World 2024
Automate your Kamailio Test Calls - Kamailio World 2024Automate your Kamailio Test Calls - Kamailio World 2024
Automate your Kamailio Test Calls - Kamailio World 2024
 
Intelligent Home Wi-Fi Solutions | ThinkPalm
Intelligent Home Wi-Fi Solutions | ThinkPalmIntelligent Home Wi-Fi Solutions | ThinkPalm
Intelligent Home Wi-Fi Solutions | ThinkPalm
 
Implementing Zero Trust strategy with Azure
Implementing Zero Trust strategy with AzureImplementing Zero Trust strategy with Azure
Implementing Zero Trust strategy with Azure
 
software engineering Chapter 5 System modeling.pptx
software engineering Chapter 5 System modeling.pptxsoftware engineering Chapter 5 System modeling.pptx
software engineering Chapter 5 System modeling.pptx
 
Professional Resume Template for Software Developers
Professional Resume Template for Software DevelopersProfessional Resume Template for Software Developers
Professional Resume Template for Software Developers
 
GOING AOT WITH GRAALVM – DEVOXX GREECE.pdf
GOING AOT WITH GRAALVM – DEVOXX GREECE.pdfGOING AOT WITH GRAALVM – DEVOXX GREECE.pdf
GOING AOT WITH GRAALVM – DEVOXX GREECE.pdf
 
Salesforce Certified Field Service Consultant
Salesforce Certified Field Service ConsultantSalesforce Certified Field Service Consultant
Salesforce Certified Field Service Consultant
 
Russian Call Girls in Karol Bagh Aasnvi ➡️ 8264348440 💋📞 Independent Escort S...
Russian Call Girls in Karol Bagh Aasnvi ➡️ 8264348440 💋📞 Independent Escort S...Russian Call Girls in Karol Bagh Aasnvi ➡️ 8264348440 💋📞 Independent Escort S...
Russian Call Girls in Karol Bagh Aasnvi ➡️ 8264348440 💋📞 Independent Escort S...
 
EY_Graph Database Powered Sustainability
EY_Graph Database Powered SustainabilityEY_Graph Database Powered Sustainability
EY_Graph Database Powered Sustainability
 
Asset Management Software - Infographic
Asset Management Software - InfographicAsset Management Software - Infographic
Asset Management Software - Infographic
 
What is Fashion PLM and Why Do You Need It
What is Fashion PLM and Why Do You Need ItWhat is Fashion PLM and Why Do You Need It
What is Fashion PLM and Why Do You Need It
 
Call Girls In Mukherjee Nagar 📱 9999965857 🤩 Delhi 🫦 HOT AND SEXY VVIP 🍎 SE...
Call Girls In Mukherjee Nagar 📱  9999965857  🤩 Delhi 🫦 HOT AND SEXY VVIP 🍎 SE...Call Girls In Mukherjee Nagar 📱  9999965857  🤩 Delhi 🫦 HOT AND SEXY VVIP 🍎 SE...
Call Girls In Mukherjee Nagar 📱 9999965857 🤩 Delhi 🫦 HOT AND SEXY VVIP 🍎 SE...
 
MYjobs Presentation Django-based project
MYjobs Presentation Django-based projectMYjobs Presentation Django-based project
MYjobs Presentation Django-based project
 

A Robust and Flexible Operating System Compatibility Architecture

  • 1. A Robust and Flexible Operating System Compatibility Architecture Takahiro Shinagawa Shinichi HonidenYuichi NishiwakiTakaya Saeki* * Mr. Saeki is currently at Microsoft Development Co., Ltd. The 16th ACM SIGPLAN/SIGOPS International Conference on Virtual Execution Environments (VEE 2020, March 17, 2020)
  • 2. Running Applications on Another OS • Useful in various cases (a) Running commercial binary applications • Only supplied in the binary format of a specific OS (b) Using a build environment with several toolchains and libraries • E.g., building Linux applications on macOS A Robust and Flexible Operating System Compatibility Architecture (VEE 2020, March 17, 2020) 2 OS1 OS2 Application Binary for OS1 Application Binary for OS1 (a) Running commercial binary applications (b) Building Linux applications on macOS macOS Build environment with toolchains and libraries
  • 3. Existing Approaches A) Porting applications 👍 No runtime overhead 👎 Much porting effort (by developers and users) B) Running applications with the OS in a virtual machine (VM) 👍 No porting effort, strong isolation, … 👎 Challenges in resource sharing due to the existence of two OS kernels A Robust and Flexible Operating System Compatibility Architecture (VEE 2020, March 17, 2020) 3 OS1 OS2 Application Binary for OS1 Application Binary for OS2 much effort Guest OS Host OS Guest Application A) Porting applications B) Running apps and OS in a VM challenges in seamless resource sharing
  • 4. Using OS Compatibility Layers • No porting effort • Absorb the differences between the guest and host interfaces • Seamless resource management • The host OS manages both the guest and host resources in the same way • Guest and host applications can communicate seamlessly via host system calls A Robust and Flexible Operating System Compatibility Architecture (VEE 2020, March 17, 2020) 4 Host OS Guest Application Binary OS Compatibility Layer ← converting system calls from guest to host Host ApplicationsHost ApplicationsHost Applications ← managing resources of guest and host apps
  • 5. Kernel-space v.s. User-space Implementations • Kernel space 👍 Flexibility to achieve binary compatibility • System calls and memory management can be easily handled 👎 Vulnerability against bugs in OS compatibility layers • A bug could lead to system crashes • User space 👍 Robustness against bugs • Bugs do not affect the OS stability 👎 Inflexibility to achieve full compatibility • E.g., copy-on-write not implemented in Cygwin A Robust and Flexible Operating System Compatibility Architecture (VEE 2020, March 17, 2020) 5 Host OS Kernel Guest Application Binary OS Compatibility Layer Host OS Kernel Guest Application Binary OS Compat. Layer
  • 6. Proposed OS Compatibility Architecture • Running each guest process in a VM (without its OS kernel) 👍 Robustness • Most part of OS compatibility layers can be implemented in user space • Bugs do not cause kernel crashes 👍 Flexibility • Hardware virtualization technology provides low-layer event handling functionalities • E.g., trapping system calls and page faults, manipulating page tables, … A Robust and Flexible Operating System Compatibility Architecture (VEE 2020, March 17, 2020) 6 Host OS Kernel OS Compatibility Layer VMHost Process Standardized Virtualization Interface Guest Application Process CPUHardware Virtualization Function
  • 7. Overall Design • Three main components • Guest VM • VMM module • Monitor process A Robust and Flexible Operating System Compatibility Architecture (VEE 2020, March 17, 2020) 7 monitor process guest process Guest VMs kernel emulate system calls User Space Kernel space trap system calls & exceptions no kernel upcall monitor VMM module manage VMs Host OS
  • 8. Guest VM • An instance of the guest execution environment • Run on CPU directly • Without overhead • Provide a virtual address space • With a host-managed page table • Trap events • E.g., system calls, page faults, … Assume hardware-assisted virtualization • E.g., Intel VT and AMD SVM A Robust and Flexible Operating System Compatibility Architecture (VEE 2020, March 17, 2020) 8 monitor process guest process Guest VMs kernel emulate system calls User Space Kernel space trap system calls & page faults no kernel upcall monitor VMM module manage VMs Host OS
  • 9. VMM module • A small driver of hardware-assisted virtualization • Provide API to manage VMs • Create and destroy VMs and vCPUs • Read and write VM states • Manipulate page tables • Manage control transfer • Assume its own robustness • Small size • No kernel dependency • Widely available • E.g., Apple Hypervisor.framework, Windows Hypervisor Platform, KVM, … A Robust and Flexible Operating System Compatibility Architecture (VEE 2020, March 17, 2020) 9 monitor process guest process Guest VMs kernel emulate system calls User Space Kernel space trap system calls & page faults no kernel upcall monitor VMM module manage VMs Host OS
  • 10. Monitor Process • The main component to implement OS compatibility functions • Prepare execution environments • Create a VM and setup the page table • Load the program • Emulate guest system calls • With host system calls • Clean up on exit • Free allocated resources • Destroy the VM • Terminate itself A Robust and Flexible Operating System Compatibility Architecture (VEE 2020, March 17, 2020) 10 monitor process guest process Guest VMs kernel emulate system calls User Space Kernel space trap system calls & page faults no kernel upcall monitor VMM module manage VMs Host OS
  • 11. Advantages of Our Architecture • Robustness • User-space implementation: bugs of the monitor process will not cause crashes • Flexibility • Compatibility: achieve full binary compatibility • Performance: copy-on-write can be supported • Others (of OS Compatibility layers) • Development cost • Rich host OS functionalities: system calls, libraries, high-level languages, … • Seamlessness • Single kernel: share the same file system, IPCs between guest and host, process scheduling, resource management, ... A Robust and Flexible Operating System Compatibility Architecture (VEE 2020, March 17, 2020) 11
  • 12. Implementation • Target Linux 4.6 of x86-64 (Intel VT-x) • Noah: Linux compatibility layer for macOS • Run on macOS 10.12 Sierra or higher • Use Apple Hypervisor.framework as the VMM module • NoahW: Linux compatibility layer for Windows (preliminary) • Run on Windows 7 or higher • Use Intel Hardware Accelerated Execution Manager as the VMM module A Robust and Flexible Operating System Compatibility Architecture (VEE 2020, March 17, 2020) 12
  • 13. Boot and Load (1) Create a VM • Through the VMM module (2) Setup the VM state • Start in x86-64 long mode directly • Trap SYSCALL instructions • Use trampoline code in Windows (3) Load the Linux ELF loader (ld.so) • Using our original ELF loader • Using the internal mmap() to map the ELF file (4) Pass the control to the Linux ELF loader A Robust and Flexible Operating System Compatibility Architecture (VEE 2020, March 17, 2020) 13 guest program (1) Create VM VM state Mode: longmode IA32_EFER.SCE: 0 VMM module (2) Setup VM state ld.so (3) Load the ELF loader monitor process ELF loader
  • 14. Memory Management • Two page tables (a) Guest page table in the VM (b) Nested page table (EPT) in the VMM • Fix (a) and modify (b) • (a) is fixed to the straight mapping • Virtual address = Physical address • (b) can be manipulated with the API • Provided by the VMM module Limitation: GVA is up to 512 GiB • 39-bit physical address in Intel CPU • 48-bit virtual address • Stack is moved to the lower address • No kernel area A Robust and Flexible Operating System Compatibility Architecture (VEE 2020, March 17, 2020) 14 511 GiB 0 GVA GPA HPA GVA: Guest Virtual Address GPA: Guest Physical Address HPA: Host Physical Address 512 GiB Guest page table (fixed) Nested page table (modified) 1-GiB guest system data area (page tables, segment descriptors, …)
  • 15. Process Management (fork) • Noah (on macOS) • Implement a subset of clone() • Apple Hypervisor.framework does not support fork() with a VM • Save and destroy the VM before fork() • Restore the VM after fork() • NoahW (on Windows) • Implement fork() with copy-on-write using shared memory and virtualization • Create a memory region shared among monitor processes • Save, restore, and modify the VM states on fork() • Trap page faults in the VMs to implement copy-on-write A Robust and Flexible Operating System Compatibility Architecture (VEE 2020, March 17, 2020) 15
  • 16. Evaluation • Performance • Primitive benchmark: CPU cycles of the dup() system call on macOS • Micro benchmark: lmbench-3.0-a9 on macOS • Macro benchmark: Phoronix Test Suite v7.6.0 on macOS • Performance comparison of OS compatibility layers on Windows • Setup • MacBook Pro (13-inch, 2017) for Noah • Intel Core i7-7567U, 16GB DDR3 memory, 500GB SSD • Ubuntu 16.04 on macOS High Sierra 10.13.3 • Surface Pro 2017 for NoahW • Intel Core i7-7660U, 16GB DDR3 memory, 512GB SSD • Windows 10 Professional Version 1709 and HAXM 6.2.1 A Robust and Flexible Operating System Compatibility Architecture (VEE 2020, March 17, 2020) 16
  • 17. Primitive Benchmark: dup() system call A Robust and Flexible Operating System Compatibility Architecture (VEE 2020, March 17, 2020) 17 270 3202520 11091 1330 7044 2770 2118 588 5504 2809 11091 251 297 0 5,000 10,000 15,000 20,000 25,000 30,000 35,000 40,000 macOS Windows CycleNumber VM enter downcall post-process host syscall pre-process upcall VM exit
  • 18. Micro Benchmark: lmbench (processor, process) A Robust and Flexible Operating System Compatibility Architecture (VEE 2020, March 17, 2020) 18 410% 310% 175% 172% 13% 329% 239% 256% -24% 46% -100% 0% 100% 200% 300% 400% null call null I/O stat open clos slct TCP sig inst sig hndl fork proc exec proc sh proc
  • 19. Micro Benchmark: lmbench (File & VM latency) A Robust and Flexible Operating System Compatibility Architecture (VEE 2020, March 17, 2020) 19 42% 5% 17% 4% 28% -45% -92% 8% -100% -50% 0% 50% 0K Create 0K Delete 10K Create 10K Delete Mmap Latency Prot Fault Page Fault 100fd selct
  • 20. Macro Benchmark (Phoronix Test Suite + α) A Robust and Flexible Operating System Compatibility Architecture (VEE 2020, March 17, 2020) 20 16% -23% -4% 50% -58% 9% -200% -100% 0% 100% 200% Linux kernel build unpack-linux postmark sqlite openssl compress-7zip
  • 21. Comparison of OS Compatibility Layers Benchmark NoahW Cygwin WSL1 dup2() [call per second] 36,723 556,453 693,309 write() [call per second] 0.30 0.56 0.57 fork() (0 MiB array) [ms] 106.4 219.4 2.06 fork() (512 MiB array) [ms] 338.9 789.9 32.51 fork() (1 GiB array) [ms] 458.4 1531.8 62.66 A Robust and Flexible Operating System Compatibility Architecture (VEE 2020, March 17, 2020) 21
  • 22. Conclusion • Proposed a novel OS compatibility architecture • Exploited the OS-standard virtualization technology support • Achieved both robustness and flexibility • Consist of three components • VMs to run guest processes • The VMM module to provide API for hardware virtualization technology • Monitor processes to implement OS compatibility functions • Run Linux binary on macOS and Windows (preliminary) • Noah implemented 172 out of 329 Linux system calls • The overhead of Linux kernel build time on Noah was 16% A Robust and Flexible Operating System Compatibility Architecture (VEE 2020, March 17, 2020) 22
  • 23. Acknowledgement and Availability • This work was partially supported by Exploratory IT Human Resources Project (MITOU Program) of Information technology Promotion Agency, Japan (IPA) in the fiscal 2016 and JSPS KAKENHI • Noah is publicly available from https://github.com/linux-noah/noah under the MIT / GPL dual licenses A Robust and Flexible Operating System Compatibility Architecture (VEE 2020, March 17, 2020) 23

Editor's Notes

  1. Hello everyone. I’m Takahiro Shinagawa from the University of Tokyo. Today, I’d like to talk about a robust and flexible operating system compatibility architecture. This is a joint work with Mr. Saeki, Mr. Nishiwaki, and professor Honiden. This work was done mainly by Mr. Saeki in cooperation with Mr. Nishiwaki while they were master course students. Unfortunately, Mr. Saeki has already graduated and Mr. Nishiwaki is in a different field laboratory, so I’m going to make this presentation.
  2. So, let’s begin with the background. Running applications for one operating system on another operating system is a useful feature in various cases. For example, we may want to run a commercial binary application that is only supplied in the binary format of a specific operating system. We also may want to build Linux applications on macOS and we need a build environment with several toolchains and libraries which are not available on macOS. So, we need a mechanism to bridge the gap between the application binary and the operating system on which you want to run the application.
  3. One approach to achieve this is porting applications. Porting applications has an advantage that the applications run on the target OS natively without any runtime overhead. However, it usually requires much effort by developers, and sometime by users to setup the runtime environment. Also, porting applications may not be completed or up to date. Another approach is to use a virtual machine and run the application with the original operating system. This approach has the advantage that it requires no porting effort, and it also provides additional functionalities such as strong isolation. However, virtual machines have challenges in seamless resource sharing between the guest and host applications due to the exisitence of two OS kernels. Much effort has been devoted to improving the seamlessness between virtual machines and it is now practical. However, there is still room for improving the seamlessness in terms of resource sharing and resource consumption.
  4. Using OS compatibility layers is a promising approach. It requires no porting effort because the OS compatibility layer absorbs the difference between the guest applications and host interfaces. It can also achieve seamless resource management because the host OS manages the resources for both the guest and host applications in the same way. For example, the guest and host applications can communicate with each other via the host system calls, so they share the same process scheduling policy, the same free memory pages, and the same file system instances.
  5. There are two ways to implement OS compatibility layers. One way is to implement them in kernel space. Kernel-space implementation has the advantage that it has flexibility to achieve binary compatibility. However, it is vulnerable against bugs in the OS compatibility layers. For example, the former Windows Subsystem for Linux has several bugs that could cause the blue screen of death of Windows. The other way is to implement them in user space. It has the advantage that bugs do not affect the stability of the operating system. However, user-space-only implementations are inflexible to achieve full binary compatibility because they cannot trap system call instructions or manipulate page tables, unless we use binary modification. For example, Cygwin cannot implement the copy-on-write capability in the fork() system call.
  6. So, we propose a novel operating system compatibility architecture. In this architecture, we run each guest process in a separate VM without its OS kernel, and the process of the OS compatibility layer running on the host operating system manages the VM to emulate the execution environment for the guest application process. This architecture can achieve robustness because most of OS compatibility layers can be implemented in user space and bugs in the layers do not cause kernel crashes. It can also achieve flexibility to realize full binary compatibility because the virtualization technology allows low-level event handling such as trapping system calls and page faults, manipulating page tables, and so on. We need a host OS support to handle hardware-assisted virtualization technology, but fortunately recent operating systems provide standard virtualization interfaces and we can reuse them. So, we do not need to modify the OS kernels by ourselves.
  7. Here is the overall design of our proposed architecture. Our system consists of three main components, that is, guest VMs, the VMM module, and monitor processes. We will explain each of them in the following slides.
  8. A guest VM is an instance of the guest execution environment. The application process inside the VM runs on the CPU directly without runtime overhead, so we do not need to modify the application binaries. The VM provides a virtual address space for the guest process, and the address space is managed by the page table in the host OS. The VM can trap necessary events, such as system calls and page faults. So, we can use the VM as a container to run the guest process. We assume that the CPU supports hardware-assisted virtualization functions such as Intel VT and AMD SVM so that we can easily create and manipulate VMs.
  9. The VMM module is a small driver of the hardware-assisted virtualization function of the CPU. It provides API to manage VMs, such as creating and destroying VMs and virtual CPUs, reading and writing VM states, manipulating page tables, and managing control transfer. Since the VMM module runs inside the host OS kernel, its stability affects the robustness of the OS compatibility layer. However, we can assume that the VMM module is robust because it is small, it does not normally depend on the other part of OS kernels, and recent operating systems support the VMM module as a standard function. For example, macOS provides Apple Hypervisor.framework, Windows provides Windows Hypervisor Platform, and Linux provides KVM. So, we can expect that the VMM module will become stable and mature easily, and it will be properly maintained in the future.
  10. The monitor process is the main component to implement the OS compatibility functions. It prepares the execution environment for the guest process through the VMM module, such as creating a virtual machine, setup the page table, and load the guest program. It also traps the system calls issued by the guest process and emulate them using the host system calls. When the guest process exits, it frees the allocated resources, destroy the VM, and terminates itself.
  11. This is the summary of the advantages of our architecture. It can achieve robustness because the monitor process is implemented in user space and bugs of them will not cause system crashes. It can also achieve flexibility to realize full binary compatibility and good performance with the copy-on-write capability. In addition, it inherits the advantages of using OS compatibility layers in general. For example, it can achieve low development cost because it can use the rich host OS functionalities such as system calls, useful libraries, and high-level languages such as Rust and Go. It also achieve seamlessness because there is only a single kernel and all system resources are managed by the kernel with the single management policy.
  12. Now, we explain the implementation. Our target is Linux 4 point 6 running on x eighty-six-sixty-four processors with the support of Inter VT-x. We implemented a Linux compatibility layer for macOS called Noah, that is, we can run Linux binaries on macOS without modifications. This implementation is mature enough to run many Linux applications. For example, we can build Linux kernels and run several X11 applications on Noah. We also implemented a preliminary version for Windows that supports the copy-on-write capability so that we can confirm the advantage of our architecture. Unfortunately, it does not implement many system calls yet.
  13. We first explain the boot process and program loading. To start the guest process, the monitor process first create a virtual machine through the VMM module. Then, it sets up the VM state to make the execution environment for the guest process. For example, it sets the CPU mode in the VM state so that it starts in the long mode directly. It sets a flag in the exception bitmap so that system call instructions issued in the VM cause VM exits, and can be trapped by the VMM module. Then, it load the Linux ELF loader, that is, ld.so, so that it can further execute the ELF program with shared libraries. To do so, we implemented our original ELF loader using the internal version of mmap() so that the ELF file is mapped into the VM address space. Finally, it passes the control to the start address of the Linux ELF loader.
  14. As for memory management, there are two page tables in our architecture. That is, the guest page table in the VM and nested page table in the VMM. To avoid handling two page tables, we should fix one page table and manipulate only the other. Which one to choose is a design choice. We chose to fix the guest page table and manipulate the nested page table for easy implementation and debugging. The VMM module provides the API to manipulate the nested page tables and the monitor process can map memory pages to the VM by specifying the virtual address of the monitor process. So, the monitor process can easily change the page mappings of the VMs. This approach has one limitation. Since current Intel CPUs support only up to 39-bit physical address, we cannot use the all 48-bit virtual address. Fortunately, the only problem of this in Linux is the default stack address, and we can safely move the stack to the lower address. We should note that there is no kernel in the VM, so we do not need the higher region of the guest virtual address space.
  15. As for process management, we used different approaches on macOS and Windows. On macOS, we can use the fork system call to fork the monitor process, and implemented a subset of the clone system call. Unfortunately, Apple Hypervisor.framework does not support the fork of the process with a virtual machine. Therefore, we first save and destroy the VM before fork, and then restore the VM state after fork. On Windows, the monitor process cannot use fork because Windows does not support it. So, we implemented the fork functionality by ourself using shared memory and virtualization technology. That is, we created a memory region shared among the monitor processes, and the monitor processes trap the page faults and performs the copy-on-write. The implementation is a little bit complicated, but basically similar to the implementation in the OS kernel.
  16. Now, we show the result of evaluation. We measured the performance of our implementation with a primitive benchmark, micro benchmark, and macro benchmark on macOS. We also compare the performance between a few OS compatibility layers available on Windows. We used these laptop computers for the evaluation.
  17. This is the result of the primitive benchmark. We measured the CPU cycles of dup() system call. dup() is a simple system call but it actually call the OS kernel, so we used it to measure the breakdown of the system call. As shown in the figure, VM enter and exit took around three hundred cycles and they were not so high. The system call itself took only around two thousands cycles. Unfortunately, the VMM module took extra CPU cycles to enter and exit the VM. We believe there is still room for optimization in this part and we can further reduce the overhead on system calls.
  18. This is the result of the micro benchmark using lmbench. This benchmark measured the performance related to the process and processor. From this figure, we found that the overhead of simple system calls was high, because the cost of context switches is high, but in complex system calls, the context switch cost became relatively lower. One interesting result is the performance of the exec system call. It became faster than macOS because the exec system call is mainly implemented by our system in a simple way, and the implementation in macOS may be very complicated due to its kernel structure.
  19. This is another result of lmbench. The result shows that Noah incurred up to 42 percent overhead on basic file-related system calls. Another interesting result is that page fault and protection fault handling was much faster in our system than macOS. This is probably because of the same reason with the exec system call, because memory management is mainly implemented by our system.
  20. This is the result of macro benchmark using Phoronix Test Suite. We can see from the graph that some applications became slower and some applications became faster depending on their characteristics. If the application issues many simple system calls, the overhead will become higher. If the application issues many complicated system calls, the overhead will become lower. If the application causes many page faults, it could become faster. Since one of our target application is build environments, kernel build performance is a good benchmark. It was 16% percent overhead and we believe this is a reasonable performance.
  21. Finally, we show the performance comparison of OS Compatibility layers. We used the Windows version of our implementation, and compared with Cygwin that is a user-level implementation of OS compatibility layers, and Windows Subsystem for Linux, shown as WSL1, as a kernel-level implementation. We can see that the performance of the dup2 system call is much slower in our system because this is a simple system call and the virtualization overhead is dominant. We can also see that write performance is comparable to the other two systems because this performance is mainly determined by the host I/O performance. Finally, we can confirm that fork performance of our system is much faster than Cygwin, especially the guest process has large data, because we support the copy-on-write capability.
  22. So, this is the conclusion. We proposed a novel OS compatibility architecture that exploits the OS-standard virtualization technology and achieved both robustness and flexibility. In our architecture, an OS compatibility layer consists of three components, that is, virtual machines to run guest processes, the VMM module to provide API for hardware virtualization technology, and monitor processes that implement the OS compatibility functions. Our implementations can run many Linux binaries on macOS and simple Linux binaries on Windows. The overhead of Linux kernel build time on Noah was six-teen percent.
  23. You can get the source code of Noah from GitHub. So, that’s all. Thank you for listening.