VMworld 2017 Core Storage

Cormac Hogan
Cody Hosterman
SER1143BU
#VMworld #SER1143BU
A Deep Dive into
vSphere 6.5 Core Storage
Features and
Functionality

• This presentation may contain product features that are currently under development.
• This overview of new technology represents no commitment from VMware to deliver these
features in any generally available product.
• Features are subject to change, and must not be included in contracts, purchase orders, or
sales agreements of any kind.
• Technical feasibility and market demand will affect final delivery.
• Pricing and packaging for any new technologies or features discussed or presented have not
been determined.
Disclaimer
2#SER1143BU CONFIDENTIAL

Welcome from Cormac and Cody
• Cormac
• Director and Chief Technologist
• VMware
• @CormacJHogan
• http://cormachogan.com
4
• Cody
• Technical Director for VMware Solutions
• Pure Storage
• @CodyHosterman
• https://codyhosterman.com
#SER1143BU CONFIDENTIAL

Agenda Slide
1 Limits
2 VMFS-6
3 VAAI (ATS Miscompare and UNMAP)
4 SPBM (SIOCv2 and vSphere VM Encryption)
5 NFS v4.1
6 iSCSI
7 NVMe

vSphere 6.5 Scaling and Limits
Paths
• ESXi hosts now support up to 2000 paths
– Increase from the 1024 per host paths supported previously
Devices
• ESXi hosts now support up to 512 devices
– Increase from the 256 devices supported per host previously
– Multiple targets are required to address more than 256 devices
– This does not impact Virtual Volumes (aka VVols), which can address 16,383 VVol per PE

vSphere 6.5 Scaling and Limits
• 512e Advanced Format Device Support
• Capacity limits are now an issue with 512n (native) sector size used currently in disk drives
• New Advanced Format (AF) drives use a 4K native sector size for higher capacity
• These 4Kn devices are not yet supported on vSphere
• For legacy applications and operating systems that cannot support 4KN drives, new 4K sector
size drives that run in 512 emulation (512e) mode are now available
– These drives will have a physical sector size of 4K but the logical sector size of 512 bytes
• These drives are now supported on vSphere 6.5 for VMFS and RDM (Raw Device Mappings)

512n/512e
• # esxcli storage core device capacity list
9
[root@esxi-dell-e:~] esxcli storage core device capacity list
Device Physical Logical Logical Size Format Type
Blocksize Blocksize Block Count
------------------------------------ --------- --------- ----------- ---------- -----------
naa.624a9370d4d78052ea564a7e00011014 512 512 20971520 10240 MiB 512n
naa.624a9370d4d78052ea564a7e00011015 512 512 20971520 10240 MiB 512n
naa.624a9370d4d78052ea564a7e00011138 512 512 1048576000 512000 MiB 512n
naa.624a9370d4d78052ea564a7e00011139 512 512 1048576000 512000 MiB 512n
naa.55cd2e404c31fa00 4096 512 390721968 190782 MiB 512e
naa.500a07510f86d6bb 4096 512 1562824368 763097 MiB 512e
naa.500a07510f86d685 4096 512 1562824368 763097 MiB 512e
naa.5001e820026415f0 512 512 390721968 190782 MiB 512n
512e (emulated)
512n (native)

DSNRO
The setting “Disk.SchedNumReqOutstanding” aka “No of outstanding IOs with
competing worlds” has changed in behavior
• DSRNO can be set to a maximum of:
– 6.0 and earlier: 256
– 6.5 and on: Whatever the HBA Device Queue Depth Limit is
• Allows for extreme levels of performance

VMFS-6: On-disk Format Changes
• File System Resource Management - File Block Format
• VMFS-6 has two new “internal” block sizes, small file block (SFB) and large file block (LFB)
– The SFB size is set to 1MB; the LFB size is set to 512MB
– These are internal concepts for ”files” only; the VMFS block size is still 1MB
• Thin disks created on VMFS-6 are initially backed with SFBs
• Thick disks created on VMFS-6 are allocated LFBs as much as possible
– For the portion of the thick disk which does not fit into an LFB, SFBs are allocated
• These enhancements should result in much faster file creation times
– Especially true with swap file creation so long as the swap file can be created with all LFBs
– Swap files are always thickly provisioned
12
VMFS-6

• Dynamic System Resource Files
• System resource files (.fdc.sf, .pbc.sf, .sbc.sf, .jbc.sf) are now extended dynamically
for VMFS-6
– Previously these were static in size
– These may show a much smaller size initially, when compared to previous versions of
VMFS, but they will grow over time
• If the filesystem exhausts any resources, the respective system resource file is
extended to create additional resources
• VMFS-6 can now support millions of files / pointer blocks / sub blocks (as long as
volume has free space)
13
VMFS-6

vmkfstools – 500GB VMFS-5 Volume
#SER1143BU CONFIDENTIAL 14
# vmkfstools –P –v10 /vmfs/devices/<device id>

vmkfstools – 500GB VMFS-6 volume
15
Large file blocksIn VMFS-6, Sub Blocks for are used for Pointer Blocks.
That's why Ptr Blocks max is shown as 0 here

• File System Resource Management - Journaling
• VMFS is a distributed journaling filesystem
• Journals are used on VMFS when performing metadata updates on the filesystem
• Previous versions of VMFS used regular file blocks as journal resource blocks
• In VMFS-6, journal blocks tracked in a separate system resource file called .jbc.sf.
• Introduced to address VMFS journal related issues on previous versions of VMFS,
due to the use of regular files blocks as journal blocks and vice-versa
– E.g. full file system, see VMware KB article 1010931
16
VMFS-6

New Journal System File Resource
17
VMFS-5
VMFS-6

VMFS-6: VM-based Block Allocation Affinity
• Resources for VMs (blocks, file descriptors, etc.) on earlier VMFS versions were
allocated on a per host basis (host-based block allocation affinity)
• Host contention issues arose when a VM/VMDK was created on one host, and then
vMotion was used to migrate the VM to another host
• If additional blocks were allocated to the VM/VMDK by the new host at the same time
as the original host tried to allocate blocks for a different VM in the same resource
group, the different hosts could contend for resource locks on the same resource
• This change introduces VM-based block allocation affinity, which will decrease
resource lock contention
18
VMFS-6

VMFS-6: Parallelism/Concurrency Improvements
• Some of the biggest delays on VMFS were in device scanning and filesystem probing
• vSphere 6.5 has new, highly parallel, device discovery and filesystem probing
mechanisms
– Previous versions of VMFS only allowed one transaction at a time per host on a given
filesystem; VMFS-6 supports multiple, concurrent transactions at a time per host
• These improvements are significant for fail-over event, and Site Recover Manager
(SRM) should especially benefit
• Requirement to support higher limits on number of devices and paths in vSphere 6.5
19
VMFS-6

Hot Extend Support
• Prior to ESXi 6.5, VMDKs on a powered on VM
could only be grown if size was less than 2TB
• If the size of a VMDK was 2TB or larger, or the
expand operation caused it to exceed 2TB, the hot
extend operation would fail
• This required administrators to typically shut down
the virtual machine to expand it beyond 2TB
• The behavior has been changed in vSphere 6.5
and hot extend no longer has this limitation
20
This is a vSphere 6.5 improvement, not specific to VMFS-6.
This will also work on VMFS-5 volumes.

“Upgrading” to VMFS-6
• No direct ‘in-place’ upgrade of filesystem to VMFS-6 available.
New datastores only.
• Customers upgrading to vSphere 6.5 release should continue to
use VMFS-5 datastores (or older) until they can create new
VMFS-6 datastores
• Use migration techniques such as Storage vMotion to move
VMs from the old datastore to the new VMFS-6 datastore

22
VMFS-6
Performance Improvements with resignature
(discovery and filesystem probing)

VAAI
vSphere APIs for Array Integration

ATS Miscompare Handling (1 of 3)
• The heartbeat region of VMFS is used
for on-disk locking
• Every host that uses the VMFS volume
has its own heartbeat region
• This region is updated by the host on
every heartbeat
• The region that is updated is the time
stamp, which tells others that this host
is alive
• When the host is down, this region is
used to communicate lock state to
other hosts
25
ATS

• In vSphere 5.5 U2, we started using ATS for maintaining the heartbeat
• ATS is the Atomic Test and Set primitive which is one of the VAAI primitives
• Prior to this release, we only used ATS when the heartbeat state changed
• For example, we would use ATS in the following cases:
– Acquire a heartbeat
– Clear a heartbeat
– Replay a heartbeat
– Reclaim a heartbeat
• We did not use ATS for maintaining the ‘liveness’ of a heartbeat
• This change for using ATS to maintain ‘liveness’ of a heartbeat appears to have led to issues for
certain storage arrays
26
ATS

• When an ATS Miscompare is received, all outstanding IO is aborted
• This led to additional stress and load being placed on the storage arrays
– In some cases, this led to the controllers crashing on the array
• In vSphere 6.5, there are new heuristics added so that when we get a miscompare event, we
retry the read and verify that there is a miscompare
• If the miscompare is real, then we do the same as before, i.e. abort outstanding I/O
• If the on-disk HB data has not changed, then this is a false miscompare
• In the event of a false miscompare:
– VMFS will not immediately abort IOs
– VMFS will re-attempt ATS HB after a short interval (usually less than 100ms)
27
ATS

An Introduction to UNMAP
UNMAP via datastore
• VAAI UNMAP was introduced in vSphere 5.0
• Enables ESXi host to inform the backing storage that files or VMs had be moved or
deleted from a Thin Provisioned VMFS datastore
• Allows the backing storage to reclaim the freed blocks
• No way of doing this previously, resulting in stranded space on Thin Provisioned
VMFS datastores
28
UNMAP

Automated UNMAP in vSphere 6.5
Introducing Automated UNMAP Space Reclamation
• In vSphere 6.5, there is now an automated UNMAP crawler mechanism for
reclaiming dead or stranded space on VMFS datastores
• Now UNMAP will run continuously in the background
• UNMAP granularity on the storage array
– The granularity of the reclaim is set to 1MB chunk
– Automatic UNMAP is not supported on arrays with UNMAP granularity greater
than 1MB
– Auto UNMAP feature support is footnoted in the VMware Hardware
Compatibility Guide (HCL)
29
UNMAP

Some Considerations with Automated UNMAP
• Only is issued to VMFS datastores that are VMFS-6 and have powered-on VMs
• Can take 12-24 hours to fully reclaim
• Default behavior is turned on, but can be turned off on the host (host won’t
participate) …
– EnableVMFS6Unmap
• …or on the datastore (no hosts will reclaim it)
30
UNMAP

An Introduction to Guest OS UNMAP
UNMAP via Guest OS
• In vSphere 6.0, additional improvements to UNMAP facilitate the reclaiming of
stranded space from within a Guest OS
• Effectively, ability for a Guest OS in a thinly provisioned VM to tell the backing
storage that blocks
• Backing storage to reclaim this capacity, and shrink the size of the VMDK
31
UNMAP

Some Considerations with Automated UNMAP
TRIM Handling
• UNMAP work at certain block boundaries on VMFS, whereas TRIM does not have
such restrictions
• While this should be fine on VMFS-6, which is now 4K aligned, certain TRIMs
converted into UNMAPs may fail due to block alignment issues on previous
versions of VMFS
Linux Guest OS SPC-4 support
• Initially in-guest UNMAP support to reclaim in-guest dead space natively was
limited to Windows 2012 R2
• Linux distributions check the SCSI version, and unless it is version 5 or greater, it
does not send UNMAPs
• With SPC-4 support introduced in vSphere 6.5, Linux Guest OS’es will now also
be able to issue UNMAPs
32
UNMAP

Automated UNMAP Limits and Considerations
Guest OS filesystem alignment
• VMDK alignment is aligned on 1 MB block boundaries
• However un-alignment may still occur within the guest OS filesystem
• This may also prevent UNMAP from working correctly
• A best practice is to align guest OS partitions to the 1MB granularity
boundary
33
UNMAP

Known Automated UNMAP Issues
• vSphere 6.5
– Tools in guest operating system might send unmap requests that are not aligned
to the VMFS unmap granularity.
– Such requests are not passed to the storage array for space reclamation.
– Further info in KB article 2148987
– This issue is addressed in vSphere 6.5 P01
• vSphere 6.5 P01
– Certain versions of Windows Guest OS running in a VM may appear
unresponsive if UNMAP is used.
– Further info in KB article 2150591.
– This issue is addressed in vSphere 6.5 U1.
34
UNMAP

35
UNMAP in action

SPBM
Storage Policy Based Management

The Storage Policy Based Management (SPBM) Paradigm
• SPBM is the foundation of
VMware's Software Defined
Storage vision
• Common framework to allow
storage and host related
capabilities to be consumed
via policies.
• Applies data services (e.g.
protection, encryption,
performance) on a per VM, or
even per VMDK level

Creating Policies via Rules and Rule Sets
• Rule
– A Rule references a combination of a metadata tag and a related value, indicating
the quality or quantity of the capability that is desired
– These two items act as a key and a value that, when referenced together through
a Rule, become a condition that must be met for compliance
• Rule Sets
– A Rule Set is comprised of one or more Rules
– A storage policy includes one or more Rule Sets that describe requirements for
virtual machine storage resources
– Multiple “Rule Sets” can be leveraged to allow a single storage policy to define
alternative selection parameters, even from several storage providers

#SER1143BU CONFIDENTIAL 40
VAIO
vSAN,
VVOLs,
VMFS

41
SPBM and Common Rules
for
Data Services provided by hosts
- VM Encryption
- Storage I/O Control v2

42
2 new features introduced with vSphere 6.5
- Encryption
- Storage I/O Control v2
Implementation is done via I/O Filters

Storage I/O Control v2
• VM Storage Policies in vSphere 6.5 has a new option called “Common Rules”
• These are used for configuring data services provided by hosts, such as Storage I/O Control
and Encryption. It is the same mechanism used for VAIO/IO Filters
43

vSphere VM Encryption
• vSphere 6.5 introduces a new VM encryption mechanism
• It requires an external Key Management Server (KMS). Check the HCL for supported vendors
• This encryption mechanism is implemented in the hypervisor, making vSphere VM encryption
agnostic to the Guest OS
• This not only encrypts the VMDK, but it also encrypts some of the VM Home directory contents,
e.g. VMX file, metadata files, etc.
• Like SIOCv2, vSphere VM Encryption in vSphere 6.5 is policy driven

vSphere VM Encryption I/O Filter
• Common rules must be enabled add vSphere VM Encryption to a policy.
• Only setting in the custom encryption policy is to allow I/O filters before encryption.
45

46
VM Encryption and SIOC policy

NFS v4.1 Improvements
• Hardware Acceleration/VAAI-NAS Improvements
– NFS 4.1 client in vSphere 6.5 supports hardware acceleration by offloading certain
operations to the storage array.
– This comes in the form of a plugin to the ESXi host that is developed/provided by the
storage array partner.
– Refer to your NAS storage array vendor for further information.
• Kerberos IPv6 Support
– NFS v4.1 Kerberos adds IPV6 support in vSphere 6.5.
• Kerberos AES Encryption Support
– NFS v4.1 Kerberos adds Advanced Encryption Standards (AES) encryption support in
vSphere 6.5
49
NFSv4.1

iSCSI Enhancements
• ISCSI Routing and Port Binding
– ESXi 6.5 now supports having the iSCSI initiator and the iSCSI target residing in different
network subnets with port binding
• UEFI iSCSI Boot
– VMware now supports UEFI (Unified Extensible Firmware Interface) iSCSI Boot on Dell
13th generation servers with Intel x540 dual port Network Interface Card (NIC).
51
iSCSI

NVMe (1 of 2)
• Virtual NVMe Device
– New virtual storage HBA for all flash
SAN/vSAN storages
– New Operating Systems now leverage
multiple queues with NVMe devices
• Virtual NVMe device allows VMs to take
advantage of such in-guest IO stack
improvements
– Improved performance compared to Virtual
SATA device on local PCIe SSD devices
• Virtual NVMe device provides 30-50%
lower CPU cost per I/O
• Virtual NVMe device achieve 30-80%
higher IOPS

NVMe (2 of 2)
• Supported configuration information of virtual NVMe device.
54
Number of Controllers per VM 4 Enumerated as nvme0,…, nvme3.
Number of namespaces per controller 15
Each namespace is mapped to a virtual disk.
Enumerated as nvme0:0, …, nvme0:15
Maximum queues and interrupts 16 1 admin + 15 I/O queues
Maximum queue depth 256 4K in-flight commands per controller
• Supports NVMe Specification v1.0e mandatory admin and I/O commands
• Interoperability with all existing vSphere features, except SMP-FT

Cormac Hogan
chogan@vmware.com
@cormacjhogan
Cody Hosterman
cody@purestorage.com
@codyhosterman

VMworld 2017 Core Storage

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Viewers also liked

Viewers also liked (17)

Similar to VMworld 2017 Core Storage

Similar to VMworld 2017 Core Storage (20)

Recently uploaded

Recently uploaded (20)

VMworld 2017 Core Storage

Editor's Notes