3. WhatisMachineLearning?
3
“Machine learning is an application of artificial intelligence (AI) that provides systems
the ability to automatically learn and improve from experience without being explicitly
programmed. Machine learning focuses on the development of computer
programs that can access data and use it learn for themselves.”
*Source:expertsystem.com
Training Inference
7. End-to-endaicompute
datacenter gateway Edge
Many-to-many hyperscale for stream
and massive batch data processing
1-to-many with majority
streaming data from devices
1-to-1 devices with lower power and
often UX requirements
Ethernet
& Wireless
Wireless and non-IP wired
protocols
ü Secure
ü High throughput
ü Real-time
Intel® Xeon® Processors
Intel® Core™ & Atom™ Processors
Intel® FPGA
Intel® Xeon Phi™ Processors*
Crest Family (Nervana ASIC)*
Intel® Processor Graphics
Movidius Myriad (VPU)Vision
Intel® GNA (IP)*Speech
11. DirectML
• low-level API for machine learning (ML)
• Hardware-accelerated machine learning primitives (called operators) are the
building blocks of DirectML
• Can get integrated part of D3D12 games, applications
• Meta Command
• DirectML provides Direct3D 12 metacommands feature which allows HW vendors to provide
the most efficient implementation for the primitives for the underlying HW
• Achieves high HW efficiency on Intel® hardware using MetaCommand
11
15. InferenceArchitecture
Inference Application 1
Vision
Core ML
Accelerate and BNNS Metal Performance Shaders
CPU iGPU
Inference Application 2
Natural Language Processing GamePlayKit
• CoreML
• CPU, GPU, Accelerators
• Image analysis, natural language processing, audio
to text, identifying sounds in audio
• Built on top of low-level primitives
like Accelerate and BNNS, Metal Performance
Shaders (MPS)
• Metal Performance Shaders (MPS)
• GPU only
• Low level primitive API (MPS Graph API is also
supported) providing for ML, Image processing,
RayTracing needs
• Most efficient for underlying Intel® architecture
• Can get integrated part of Metal games,
applications and dispatched part of same GPU
command buffer
17. CreateML
• ML models now can be created directly
using CreateML on the macOS device
*Source:Apple.com
18. macosMLArchitecturewithTraining
Inference Application 1
Vision
Core ML
Accelerate and BNNS Metal Performance Shaders
CPU iGPU
Inference Application 2
Natural Language Processing GamePlayKit
Inference Training
Turi CreateCreate ML
Training Application 1 Training Application 2
21. WebMachineLearning:withTensorflow.js
21
Platform
TensorFlow.js
(WebGL) (ms-)
TensorFlow.js
(WebML/MPS) (ms-
)
Speedup
MBP 15" 2016 2.7GHz
Intel Core i7 + Intel HD
Graphics 530 1536MB 130.810 18.371 7.120
MBP 15" 2016 2.7GHz
Intel Core i7 + AMD
Radeon Pro 455
1536MB
46.756 19.362 2.415
MBP 13" 2017 3.5GHz
Intel Core i7 + Intel Iris
Plus Graphics 650
1536MB 66.479 19.885 3.343
MBP 13" 2016 2.9GHz
Intel Core i5 + Intel Iris
Graphics 550 1536MB 71.128 18.904 3.763
Disclaimer
• Platforms used for these numbers: macbook pro 13”, 15” with Intel Graphics 530, 550, 650 and AMD Radeon Pro
455. it was run on macOS highSierra (10.13.4)
• All testing was performed at Intel. Numbers may differ based on actual hardware used and/or based on how the
benchmark is written. Intel® makes no guarantee on the specific numbers and it is intended for providing
reference
23. WebMLusingMetalPerformanceShaders(MPS)
vsWebGL,WASM(Legacy)
23
0
100
200
300
400
500
600
MobileNet (ms-) SqueezeNet (ms-) TensorFlow.js (ms-)
WebML Chromium POC
msecs (lower is better, inference time)
WASM WebGL 2 WebMLwith MPS
•Disclaimer
• Configurations used for test and perf data: MacBook Pro 13” with Intel Iris Graphics 550, 530 some with fixed 850 Mhz frequency and some with dynamic frequency
• All testing was performed at Intel® Folsom
• Numbers may differ based on actual hardware used and/or based on how the benchmark is written. Intel® makes no guarantee on the specific numbers and it is intended for providing reference
26. WindowsOCT2018=>WindowsMAY2019
26
0
20
40
60
80
100
120
140
160
Canon22 MP Canon50 MP Fuji 24 MP
Adobe LightRoom Enhanced Detail
%improvement Windows Oct2018->May2019
Disclaimer
• Configurations used for test and perf data: Latest Windows OS and Intel® Kabylake Graphics
• All testing was performed at Intel. Numbers may differ based on actual hardware used and/or based on how the benchmark is written. Intel® makes no
guarantee on the specific numbers and it is intended for providing reference
28. 28
Photoenhancement–PixelMatorPro
Intel GPU on MacOS using CoreML AI framework
Professionally Enhance Your Photos without Time Consuming Manual Trial and Error
Original – Nice, But Overexposed Post ML Enhance on Pixelmator Pro
30. SmartRetail–cashier-lessstore
Kiosk
Recognize who pick up what and how many, add the goods into user account’s shopping cart for payment
Smart Shelf with pressure sensor
Tracking stop position and
count gender, age of
people to generate
thermodynamic chart
Recognize goods, how
many, how much and
payment
Camera on the
shelf also could
check if goods
were displayed in
the right position
IA edge
computing
workstation
Smart
weighting station
Identify customer and associate
with account
Recognize
people’s gender
and age to push
ad
Intel GPU on
Linux using OpenVINO AI SDK
31. Reinforcementlearningfordevelopingagentsingames
Demonstrated on intel graphics by Unity at Game
Developers conference March 2019
A real dog uses vision and other senses to orient itself and to
decide where to go. Puppo follows the same methodology. It
collects observations about the scene such as proximity to
the target, the relative position between itself and the target
and the orientation of its own legs, so it can decide what
action to take next. In Puppo’s case, the action describes
how to rotate the joint motors in order to move.
After each action Puppo performs, we give a reward to the
agent. The reward is comprised of:
The dog learned to walk rather quickly in about 1 min.
Then, as the training continued, the dog learned to run.
https://blogs.unity3d.com/wp-
content/uploads/2018/10/DogFetchTraining.mp4?_=1
Courtesy Unity
Link to Demo
Intel GPU on Windows using DirectML AI
Framework
Save Developer Time to Deliver Game Agents; Improve Game Experience
34. Posenet
Real-time human pose estimation in the browser
Browser based PoseNet using WebML on Intel GPU with clDNN (Winodws/Linux) and MetalPerformanceShaders
(macOS) backend
38. • 10 nm process
• 64 execution units (EUs) which
increases the core compute
capability by 2.67x1 over Gen9
• Gen11 addresses the
corresponding bandwidth needs
by improving compression,
increasing L3 cache as well as
increasing peak memory
bandwidth
• ~ 1 TF FP32 perf; ~2 TF FP16 perf
• Improved SharedLocalMemory
(SLM) performance (~1/4 latency
vs Gen9)
CPU
Core
System
Agent
Display
Controller
PCIe
Memory
Controller
CPU
Cores
LLC
Cache
slice
Intel® Processor Graphics Gen11
Intel® Core Processor
SoC Ring Interconnect
L3$
SliceCommon
SubSlice
EU EU
I$ & thread dispatch
EU
EU
EU
EU
EU EU
Sampler
SLM
Dataport
[LD/ST]
Tex$
Media
Sampler
SubSlice
EU EU
I$ & thread dispatch
EU
EU
EU
EU
EU EU
Sampler
SLM
Dataport
[LD/ST]
Tex$
Media
Sampler
SubSlice
EU EU
I$ & thread dispatch
EU
EU
EU
EU
EU EU
Sampler
SLM
Dataport
[LD/ST]
Tex$
Media
Sampler
SubSlice
EU EU
I$ & thread dispatch
EU
EU
EU
EU
EU EU
Sampler
SLM
Dataport
[LD/ST]
Tex$
Media
Sampler
SubSlice
EU EU
I$ & thread dispatch
EU
EU
EU
EU
EU EU
Sampler
SLM
Dataport
[LD/ST]
Tex$
Media
Sampler
SubSlice
EU EU
I$ & thread dispatch
EU
EU
EU
EU
EU EU
Sampler
SLM
Dataport
[LD/ST]
Tex$
Media
Sampler
SubSlice
EU EU
I$ & thread dispatch
EU
EU
EU
EU
EU EU
Sampler
SLM
Dataport
[LD/ST]
Tex$
Media
Sampler
SubSlice
EU EU
I$ & thread dispatch
EU
EU
EU
EU
EU EU
Sampler
SLM
Dataport
[LD/ST]
Tex$
Media
Sampler
SubSlice
Geometry
Global AssetsGTI BlitterMedia Fixed Function
Raster
HiZ/Depth
Pixel Dispatch
Pixel Backend
39. 3
9
Disclaimer
• Configurations used for test and perf data: with Intel Gen9 graphics (24 EU) and Intel Gen11 graphics (64 EU) some with fixed frequency and some with dynamic frequency. All testing was performed at Intel
• Numbers may differ based on actual hardware used and/or based on how the benchmark is written. Intel® makes no guarantee on the specific numbers and it is intended for providing reference
• All testing was permed at Intel® Folsom
1.50
1.90
2.30
2.70
VGG16_b01 VGG16_b04 VGG16_b16 VGG19_b01 VGG19_b04 VGG19_b16 InceptionV3_b01 InceptionV3_b04 InceptionV3_b16 ResNet50_b01 ResNet50_b04 ResNet50_b16
ML Bench
x improvement Gen9 vs Gen11
40. ISVApplicationImprovements
40
Disclaimer
• Configurations used for test and perf data: with Intel Gen9 graphics (24 EU) and Intel Gen11 graphics (64 EU) some with fixed frequency and some with dynamic frequency. All testing was performed at Intel
• Numbers may differ based on actual hardware used and/or based on how the benchmark is written. Intel® makes no guarantee on the specific numbers and it is intended for providing reference
• All testing was permed at Intel® Folsom
1.88
1.89
1.90
1.91
1.92
1.93
1.94
1.95
1.96
1.97
1.98
Fuji 22 MP Fuji 24 MP Canon22 MP Canon50 MP
Adobe LightRoom Enhance Detail
x improvement Gen9 vs Gen11
41. AI/MLpossibilities
41
Stylizea15minvideo
w/AI
Software and workloads used in performance tests may have been optimized for performance only on Intel microprocessors. Performance tests, such as SYSmark and MobileMark, are measured using specific computer systems, components, software, operations
and functions.
Any change to any of those factors may cause the results to vary. You should consult other information and performance tests to assist you in fully evaluating your contemplated purchases,
including the performance of that product when combined with other products. For more information go to www.intel.com/benchmarks
Results have been simulated and are provided for informational purposes only. Results were derived using simulations run on an architecture simulator. Any difference in system hardware or software design or configuration may affect actual performance.
System Configurations: ICL Media performance is based on projections and subject to change. Gen 9 performance
is based on KBL-R U42 system
1. Stylize video using Cyberlink PowerDirector Style Transfer leveraging Intel OpenVINO
2. 250 22MP images uses WinML, CoreML and Adobe Lightroom Classic and CC
48
Minutes 30
Minutes
Gen11
Enhancing250
imagesw/ML
1.1
hours 42
Minutes
Gen9
1
2
Cyberlink PowerDirector
Adobe Lightroom Classic & CC
Performance 1.0x 1.7-2.7x
42. summary
• Machine Learning is here on the Edge!!
• Use Intel® Integrated Graphics for your Machine learning acceleration
• Ships with most Windows and Mac platforms
• Intel optimized ML stack is enabled by default
• Automatic improvements delivered with OS and driver updates
• Large improvement with 11th Gen Intel® Processor Graphics
• Intel is continuously working with OSVs(Apple, Microsoft), ISVs, Open
Source Community and others to improve the Intel® Graphics
Software and Hardware for ML needs
42
43. references
• Intel® processor Graphics gen11 aka “Icelake”
• Apple Machine learning on Intel®
• CreateML
• CoreML
• Metal Performance Shaders
• Windows AI
• WebML
• Intel® Open Image Denoiser
• Windows May2019 ML improvements on Intel®
• Adobe Enhance Details
• Unity AI
• WinML Get Started
• DirectML
43
44. Acknowledgements
44
• Aaftab Munshi
• Joseph Van De Water
• Sudhir Tonse
• Ningxin Hu
• Gokul N Tonpe
• Insoo Woo
• Ben Ashbaugh
• Murali Ramadoss
• Thanh-Kevin Dang
• Jay Patel
• Prashanth Palaniappan
• Xiaoqing Wu
• Sachin Sane
• Katen Shah
• Brian Jacobosky
• Arzhange Safdarzadeh
• Anthony Bernecky
• Leland E Martin
• Antal Tungler
• Damien Triolet
• Jacek Krol
• Jacek Nowak
• Kalyan Muthukumar