SlideShare a Scribd company logo
1 of 38
Download to read offline
Debugging GPU faults: QoL
tools for your driver
Danylo Piliaiev
2023-10-17
1
Who Am I?
Currently implementing Adreno 7XX GPU generation
in Turnip
My blog:
In the past
Worked on mobile video games
Debugging unruly games since 2018
At Igalia since November 2020
blogs.igalia.com/dpiliaiev
2
The Problem
"What if I was able to quickly edit this GPU packet?"
"What if I was able to dump this buffer here?"
Or "It would have been nice to print that shader
register!"
"I'll implement it later...."
3
Unrecoverable Hangs - Roadblocks
Computer completely locks up and has to be rebooted
Last few seconds of logs/anything else are lost
Existing tooling isn't of much use with such
constraints
GFR (Graphics Flight Recorder):
VK layer for breadcrumbs
Dumps command buffers with commands' status
4
Unrecoverable Hangs - Solution
More BREADCRUMBS!
GFR writes results to the disk
GFR logging is far behind what's actually runs on GPU
GFR could be too high level:
Blits/BeginRenderPass/EndRenderPass could
internally use a lot of different 2d and 3d blits
5
Unrecoverable Hangs - Solution
Observations
Unrecoverable hangs are rarely caused by sync issues
Cannot allow GPU to race ahead of the last know
breadcrumb
A hang may happen asynchronously to the GPU
packet that triggered it, e.g.
A job is scheduled to another GPU unit
That GPU unit hangs some time afterwards
There may not be a way to synchronize it
6
Unrecoverable Hangs - Solution
The current solution in Turnip is:
Breadcrumbs are inserted after each GPU command
GPU writes a breadcrumb and immediately waits for
this value to be acknowledged
CPU in a busy loop checks the breadcrumb value
If new one is found, it is sent over the network
The CPU acks the breadcrumb and GPU continues
execution
7
How Our Breadcrumbs Work
BR1 BR2 BR3
BR1
Send
GPU
CPU
GPU executes this
command and
hangs
Kernel BR1
Send over
network
Wait Wait Wait
Read Write
BR2
Send
BR2
Read Write
8
Asynchronous hangs?
Require explicit input in tty for each breadcrumb
GPU is on breadcrumb 18, continue?y
GPU is on breadcrumb 19, continue?y
GPU is on breadcrumb 20, continue?y
GPU is on breadcrumb 21, continue?
9
Breadcrumbs In Practice
Increase GPU hang timeout
Receive breadcrumbs on another machine via bash
spaghetti
nc -lvup $PORT | stdbuf -o0 xxd -pc -c 4 | 
awk -Wposix '{printf("%u:%un", "0x" $0, a[$0]++)}
Launch workload with TU_BREADCRUMBS envvar
TU_BREADCRUMBS=$IP:$PORT,break=$BREAKPOINT:$BREAKPOI
Launch workload and break on the last known
10
Further Material
https://blogs.igalia.com/dpiliaiev/debugging-unrecoverable-hangs/
https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15452
11
Faster Way To Debug Hangs
12
Breadcrumbs Shortcoming
Breadcrumbs are good for finding a command that
hangs
They cannot tell which part of the GPU state caused
it
They are useless for misrenderings
Some issues are not reproducible with breadcrumbs
13
Reproducing Hangs
Drivers are already able to capture command streams
And all used buffers
With this it is trivial to replay the submissions back
Ideally requires user-space specified GPU addresses
14
Replaying - Caveats
Multiple queues
When to re-upload memory?
Just force a single queue?
Timeline semaphores?
Recordings may be huge
15
Editing The Command Stream
Even the most minimalistic editing is useful:
/* pkt4: GRAS_2D_RESOLVE_CNTL_2 = { X = 63 | Y = 63 } */
pkt(cs, 4128831);
/* pkt4: RB_BLIT_SCISSOR_TL = { X = 0 | Y = 0 } */
pkt4(cs, 0x88d1, (2), 0);
/* pkt4: RB_BLIT_SCISSOR_BR = { X = 63 | Y = 63 } */
pkt(cs, 4128831);
pkt7(cs, CP_MEM_WRITE, 20);
/* { ADDR_LO = 0x1f7580 } */
pkt(cs, 2061696);
/* { ADDR_HI = 0x40 } */
pkt(cs, 64);
pkt(cs, 1216352390);
pkt(cs, 1107296256);
16
Editing Shaders
const char *source = R"(
shps #l37
getone #l37
cov.u32f32 r1.w, c504.z
cov.u32f32 r2.x, c504.w
cov.u32f32 r1.y, c504.x
....
end
)";
upload_shader(&ctx, 0x100200d80, source);
emit_shader_iova(&ctx, cs, 0x100200d80);
17
Replaying Edited Command Stream
The decompiler emits C code with raw commands
The replay tool takes original submissions capture:
Finds unused memory range
Emits edited command stream there
Overrides target submission
18
Dumping GPU Memory
Dumping GPU memory is simple to implement
Act of copying may disturb GPU caches
Kernel cooperation is needed to implement it
properly:
GPU interrupts execution e.g. by faulting
Now memory could be read undisturbed
19
Dumping Shader's Registers
print %tmp_regs, %src_reg
%tmp_regs - 3 consecutive free regs
For 64b address and 32b tmp offset
%src_reg - a single register to print
Shader Log Entries: 6
[0] 00000004 0.0000
[1] 00000000 0.0000
[2] 00000000 0.0000
[3] 00000000 0.0000
[4] beadc429 -0.3394
[5] beadc429 -0.3394
20
Dumping Shader's Registers
Want a nicer print? Just print $src_reg?
You still need to allocate temporary registers
What if there are no free regs?
Spilling regs may not be that easy at this stage
Too much trouble for a little gain...
21
Short Summary
A tool to replay command stream submissions
A tool to decompile a command stream into C code
An option to replay edit command stream
Helpers to dump GPU memory from the command
stream
Helpers to dump shader registers
22
Stale Regs In Command Stream
23
Debugging Stale Registers
It could be hard to spot stale reg usage:
It may appear as a random geometry flicker
Game hanging at a random moment
Rare CTS test failure
24
Stomping Registers - Caveats
Could be a bit tricky if a combination of regs causes
an issue
VK pipelines could be set outside a renderpass
Doesn't help if stale regs are between draw calls
Default invalid value may be valid for some registers
25
Stomping Registers
We mark each register with where it is used:
<reg32 offset="0x9600" name="VPC_DBG_ECO_CNTL" usage="cmd"/>
<reg32 offset="0xb987" name="HLSQ_CS_CNTL" variants="A6XX" usage="cmd"/>
<reg32 offset="0xb984" name="HLSQ_CONTROL_3_REG" variants="A6XX" usage="rp
<reg32 offset="0xb985" name="HLSQ_CONTROL_4_REG" variants="A6XX" usage="rp
To stomp register you need to specify:
export TU_DEBUG_STALE_REGS_RANGE=0x0c00,0xbe01
export TU_DEBUG_STALE_REGS_FLAGS=cmdbuf,renderpass
26
Turnip Tooling - Summary
Unique tooling:
Driver breadcrumbs
Command stream replaying and editing
GPU memory dumping
Shader register dumping
Debug option to find stale reg usage
27
Other Drivers and Tooling
28
Generic
GFR - Graphics Flight Recorder
Instruments command buffers with completion tags
Uses VK_AMD_buffer_marker (nothing vendor
specific)
In vkd3d-proton:
Breadcrumbs
Shader printf
Descriptor debugging
29
Other Mesa Drivers
Feature toggles and debug flags
Shader assembly replacement for debugging
GPU submissions decoding
GPU crash dumps decoding
30
Radeon - UMR
GPU register dumps
SGPR / VGPR shader register dumps
Shader wavefront Debugging
Shader disassembly around the crash site
See Maister's blog post for it in action
https://themaister.net/blog/2023/08/20/hardcor
e-vulkan-debugging-digging-deep-on-linux-amd
gpu/
31
Unreleased - Radeon - Shader Debugger
32
Proprietary - Radeon™ GPU Detective
Postmortem analysis of GPU crashes
Information about page faults
Breadcrumbs reflecting done and in-progress GPU
work
Command Buffer ID: 0x107c
=========================
[>] "Frame 1040 CL0"
├─[X] "Depth + Normal + Motion Vector PrePass"
├─[>] "DownSamplePS"
│ ├─[X] Draw
│ ├─[>] Draw
└─[>] "Bloom"
├─[>] "BlurPS"
│ ├─[>] Draw
│ └─[>] Draw
├─[ ] Draw
├─[ ] "BlurPS"
33
Proprietary - NVIDIA Aftermath
34
Proprietary - NVIDIA Aftermath
Collects GPU “mini-dumps”
Visualizes GPU state at the moment of crash
Collects breadcrumbs
Shows crashing shader and it registers
35
Q&A
Any good tools I haven't mentioned?
Maybe you tried something before?
Maybe you have an idea for a tool to implement?
We're hiring!
igalia.com/jobs/
36
37
Debugging GPU faults: QoL tools for your driver – XDC 2023

More Related Content

Similar to Debugging GPU faults: QoL tools for your driver – XDC 2023

eBPF Trace from Kernel to Userspace
eBPF Trace from Kernel to UserspaceeBPF Trace from Kernel to Userspace
eBPF Trace from Kernel to UserspaceSUSE Labs Taipei
 
Monte Carlo on GPUs
Monte Carlo on GPUsMonte Carlo on GPUs
Monte Carlo on GPUsfcassier
 
Exploring Compiler Optimization Opportunities for the OpenMP 4.x Accelerator...
Exploring Compiler Optimization Opportunities for the OpenMP 4.x Accelerator...Exploring Compiler Optimization Opportunities for the OpenMP 4.x Accelerator...
Exploring Compiler Optimization Opportunities for the OpenMP 4.x Accelerator...Akihiro Hayashi
 
Embedded Recipes 2019 - Introduction to JTAG debugging
Embedded Recipes 2019 - Introduction to JTAG debuggingEmbedded Recipes 2019 - Introduction to JTAG debugging
Embedded Recipes 2019 - Introduction to JTAG debuggingAnne Nicolas
 
Reverse engineering Swisscom's Centro Grande Modem
Reverse engineering Swisscom's Centro Grande ModemReverse engineering Swisscom's Centro Grande Modem
Reverse engineering Swisscom's Centro Grande ModemCyber Security Alliance
 
BlueHat v18 || A mitigation for kernel toctou vulnerabilities
BlueHat v18 || A mitigation for kernel toctou vulnerabilitiesBlueHat v18 || A mitigation for kernel toctou vulnerabilities
BlueHat v18 || A mitigation for kernel toctou vulnerabilitiesBlueHat Security Conference
 
Monte Carlo G P U Jan2010
Monte  Carlo  G P U  Jan2010Monte  Carlo  G P U  Jan2010
Monte Carlo G P U Jan2010John Holden
 
Share the Experience of Using Embedded Development Board
Share the Experience of Using Embedded Development BoardShare the Experience of Using Embedded Development Board
Share the Experience of Using Embedded Development BoardJian-Hong Pan
 
Lrz kurs: gpu and mic programming with r
Lrz kurs: gpu and mic programming with rLrz kurs: gpu and mic programming with r
Lrz kurs: gpu and mic programming with rFerdinand Jamitzky
 
Make ARM Shellcode Great Again - HITB2018PEK
Make ARM Shellcode Great Again - HITB2018PEKMake ARM Shellcode Great Again - HITB2018PEK
Make ARM Shellcode Great Again - HITB2018PEKSaumil Shah
 
Linux+sensor+device-tree+shell=IoT !
Linux+sensor+device-tree+shell=IoT !Linux+sensor+device-tree+shell=IoT !
Linux+sensor+device-tree+shell=IoT !Dobrica Pavlinušić
 
7nm "Navi" GPU - A GPU Built For Performance
7nm "Navi" GPU - A GPU Built For Performance 7nm "Navi" GPU - A GPU Built For Performance
7nm "Navi" GPU - A GPU Built For Performance AMD
 
Code Red Security
Code Red SecurityCode Red Security
Code Red SecurityAmr Ali
 
The n00bs guide to ovs dpdk
The n00bs guide to ovs dpdkThe n00bs guide to ovs dpdk
The n00bs guide to ovs dpdkmarkdgray
 
Exploiting the Linux Kernel via Intel's SYSRET Implementation
Exploiting the Linux Kernel via Intel's SYSRET ImplementationExploiting the Linux Kernel via Intel's SYSRET Implementation
Exploiting the Linux Kernel via Intel's SYSRET Implementationnkslides
 
SF Big Analytics 2019112: Uncovering performance regressions in the TCP SACK...
 SF Big Analytics 2019112: Uncovering performance regressions in the TCP SACK... SF Big Analytics 2019112: Uncovering performance regressions in the TCP SACK...
SF Big Analytics 2019112: Uncovering performance regressions in the TCP SACK...Chester Chen
 
LAS16-403: GDB Linux Kernel Awareness
LAS16-403: GDB Linux Kernel AwarenessLAS16-403: GDB Linux Kernel Awareness
LAS16-403: GDB Linux Kernel AwarenessLinaro
 
LAS16-403 - GDB Linux Kernel Awareness
LAS16-403 - GDB Linux Kernel Awareness LAS16-403 - GDB Linux Kernel Awareness
LAS16-403 - GDB Linux Kernel Awareness Peter Griffin
 

Similar to Debugging GPU faults: QoL tools for your driver – XDC 2023 (20)

eBPF Trace from Kernel to Userspace
eBPF Trace from Kernel to UserspaceeBPF Trace from Kernel to Userspace
eBPF Trace from Kernel to Userspace
 
Monte Carlo on GPUs
Monte Carlo on GPUsMonte Carlo on GPUs
Monte Carlo on GPUs
 
XS Boston 2008 Debugging Xen
XS Boston 2008 Debugging XenXS Boston 2008 Debugging Xen
XS Boston 2008 Debugging Xen
 
Exploring Compiler Optimization Opportunities for the OpenMP 4.x Accelerator...
Exploring Compiler Optimization Opportunities for the OpenMP 4.x Accelerator...Exploring Compiler Optimization Opportunities for the OpenMP 4.x Accelerator...
Exploring Compiler Optimization Opportunities for the OpenMP 4.x Accelerator...
 
Embedded Recipes 2019 - Introduction to JTAG debugging
Embedded Recipes 2019 - Introduction to JTAG debuggingEmbedded Recipes 2019 - Introduction to JTAG debugging
Embedded Recipes 2019 - Introduction to JTAG debugging
 
Reverse engineering Swisscom's Centro Grande Modem
Reverse engineering Swisscom's Centro Grande ModemReverse engineering Swisscom's Centro Grande Modem
Reverse engineering Swisscom's Centro Grande Modem
 
BlueHat v18 || A mitigation for kernel toctou vulnerabilities
BlueHat v18 || A mitigation for kernel toctou vulnerabilitiesBlueHat v18 || A mitigation for kernel toctou vulnerabilities
BlueHat v18 || A mitigation for kernel toctou vulnerabilities
 
Monte Carlo G P U Jan2010
Monte  Carlo  G P U  Jan2010Monte  Carlo  G P U  Jan2010
Monte Carlo G P U Jan2010
 
Share the Experience of Using Embedded Development Board
Share the Experience of Using Embedded Development BoardShare the Experience of Using Embedded Development Board
Share the Experience of Using Embedded Development Board
 
Lrz kurs: gpu and mic programming with r
Lrz kurs: gpu and mic programming with rLrz kurs: gpu and mic programming with r
Lrz kurs: gpu and mic programming with r
 
Make ARM Shellcode Great Again - HITB2018PEK
Make ARM Shellcode Great Again - HITB2018PEKMake ARM Shellcode Great Again - HITB2018PEK
Make ARM Shellcode Great Again - HITB2018PEK
 
Linux+sensor+device-tree+shell=IoT !
Linux+sensor+device-tree+shell=IoT !Linux+sensor+device-tree+shell=IoT !
Linux+sensor+device-tree+shell=IoT !
 
7nm "Navi" GPU - A GPU Built For Performance
7nm "Navi" GPU - A GPU Built For Performance 7nm "Navi" GPU - A GPU Built For Performance
7nm "Navi" GPU - A GPU Built For Performance
 
Code Red Security
Code Red SecurityCode Red Security
Code Red Security
 
The n00bs guide to ovs dpdk
The n00bs guide to ovs dpdkThe n00bs guide to ovs dpdk
The n00bs guide to ovs dpdk
 
Exploiting the Linux Kernel via Intel's SYSRET Implementation
Exploiting the Linux Kernel via Intel's SYSRET ImplementationExploiting the Linux Kernel via Intel's SYSRET Implementation
Exploiting the Linux Kernel via Intel's SYSRET Implementation
 
SF Big Analytics 2019112: Uncovering performance regressions in the TCP SACK...
 SF Big Analytics 2019112: Uncovering performance regressions in the TCP SACK... SF Big Analytics 2019112: Uncovering performance regressions in the TCP SACK...
SF Big Analytics 2019112: Uncovering performance regressions in the TCP SACK...
 
Writing bios
Writing biosWriting bios
Writing bios
 
LAS16-403: GDB Linux Kernel Awareness
LAS16-403: GDB Linux Kernel AwarenessLAS16-403: GDB Linux Kernel Awareness
LAS16-403: GDB Linux Kernel Awareness
 
LAS16-403 - GDB Linux Kernel Awareness
LAS16-403 - GDB Linux Kernel Awareness LAS16-403 - GDB Linux Kernel Awareness
LAS16-403 - GDB Linux Kernel Awareness
 

More from Igalia

Running JS via WASM faster with JIT
Running JS via WASM      faster with JITRunning JS via WASM      faster with JIT
Running JS via WASM faster with JITIgalia
 
To crash or not to crash: if you do, at least recover fast!
To crash or not to crash: if you do, at least recover fast!To crash or not to crash: if you do, at least recover fast!
To crash or not to crash: if you do, at least recover fast!Igalia
 
Implementing a Vulkan Video Encoder From Mesa to GStreamer
Implementing a Vulkan Video Encoder From Mesa to GStreamerImplementing a Vulkan Video Encoder From Mesa to GStreamer
Implementing a Vulkan Video Encoder From Mesa to GStreamerIgalia
 
8 Years of Open Drivers, including the State of Vulkan in Mesa
8 Years of Open Drivers, including the State of Vulkan in Mesa8 Years of Open Drivers, including the State of Vulkan in Mesa
8 Years of Open Drivers, including the State of Vulkan in MesaIgalia
 
Introducción a Mesa. Caso específico dos dispositivos Raspberry Pi por Igalia
Introducción a Mesa. Caso específico dos dispositivos Raspberry Pi por IgaliaIntroducción a Mesa. Caso específico dos dispositivos Raspberry Pi por Igalia
Introducción a Mesa. Caso específico dos dispositivos Raspberry Pi por IgaliaIgalia
 
2023 in Chimera Linux
2023 in Chimera                    Linux2023 in Chimera                    Linux
2023 in Chimera LinuxIgalia
 
Building a Linux distro with LLVM
Building a Linux distro        with LLVMBuilding a Linux distro        with LLVM
Building a Linux distro with LLVMIgalia
 
Graphics stack updates for Raspberry Pi devices
Graphics stack updates for Raspberry Pi devicesGraphics stack updates for Raspberry Pi devices
Graphics stack updates for Raspberry Pi devicesIgalia
 
Delegated Compositing - Utilizing Wayland Protocols for Chromium on ChromeOS
Delegated Compositing - Utilizing Wayland Protocols for Chromium on ChromeOSDelegated Compositing - Utilizing Wayland Protocols for Chromium on ChromeOS
Delegated Compositing - Utilizing Wayland Protocols for Chromium on ChromeOSIgalia
 
MessageFormat: The future of i18n on the web
MessageFormat: The future of i18n on the webMessageFormat: The future of i18n on the web
MessageFormat: The future of i18n on the webIgalia
 
Replacing the geometry pipeline with mesh shaders
Replacing the geometry pipeline with mesh shadersReplacing the geometry pipeline with mesh shaders
Replacing the geometry pipeline with mesh shadersIgalia
 
I'm not an AMD expert, but...
I'm not an AMD expert, but...I'm not an AMD expert, but...
I'm not an AMD expert, but...Igalia
 
Status of Vulkan on Raspberry
Status of Vulkan on RaspberryStatus of Vulkan on Raspberry
Status of Vulkan on RaspberryIgalia
 
Enable hardware acceleration for GL applications without glamor on Xorg modes...
Enable hardware acceleration for GL applications without glamor on Xorg modes...Enable hardware acceleration for GL applications without glamor on Xorg modes...
Enable hardware acceleration for GL applications without glamor on Xorg modes...Igalia
 
Async page flip in DRM atomic API
Async page flip in DRM  atomic APIAsync page flip in DRM  atomic API
Async page flip in DRM atomic APIIgalia
 
From the proposal to ECMAScript – Step by Step
From the proposal to ECMAScript – Step by StepFrom the proposal to ECMAScript – Step by Step
From the proposal to ECMAScript – Step by StepIgalia
 
Migrating Babel from CommonJS to ESM
Migrating Babel     from CommonJS to ESMMigrating Babel     from CommonJS to ESM
Migrating Babel from CommonJS to ESMIgalia
 
The rainbow treasure map: Advanced color management on Linux with AMD/Steam D...
The rainbow treasure map: Advanced color management on Linux with AMD/Steam D...The rainbow treasure map: Advanced color management on Linux with AMD/Steam D...
The rainbow treasure map: Advanced color management on Linux with AMD/Steam D...Igalia
 
Freedreno on Android – XDC 2023
Freedreno on Android          – XDC 2023Freedreno on Android          – XDC 2023
Freedreno on Android – XDC 2023Igalia
 
On-going challenges in the Raspberry Pi driver stack – XDC 2023
On-going challenges in the Raspberry Pi driver stack – XDC 2023On-going challenges in the Raspberry Pi driver stack – XDC 2023
On-going challenges in the Raspberry Pi driver stack – XDC 2023Igalia
 

More from Igalia (20)

Running JS via WASM faster with JIT
Running JS via WASM      faster with JITRunning JS via WASM      faster with JIT
Running JS via WASM faster with JIT
 
To crash or not to crash: if you do, at least recover fast!
To crash or not to crash: if you do, at least recover fast!To crash or not to crash: if you do, at least recover fast!
To crash or not to crash: if you do, at least recover fast!
 
Implementing a Vulkan Video Encoder From Mesa to GStreamer
Implementing a Vulkan Video Encoder From Mesa to GStreamerImplementing a Vulkan Video Encoder From Mesa to GStreamer
Implementing a Vulkan Video Encoder From Mesa to GStreamer
 
8 Years of Open Drivers, including the State of Vulkan in Mesa
8 Years of Open Drivers, including the State of Vulkan in Mesa8 Years of Open Drivers, including the State of Vulkan in Mesa
8 Years of Open Drivers, including the State of Vulkan in Mesa
 
Introducción a Mesa. Caso específico dos dispositivos Raspberry Pi por Igalia
Introducción a Mesa. Caso específico dos dispositivos Raspberry Pi por IgaliaIntroducción a Mesa. Caso específico dos dispositivos Raspberry Pi por Igalia
Introducción a Mesa. Caso específico dos dispositivos Raspberry Pi por Igalia
 
2023 in Chimera Linux
2023 in Chimera                    Linux2023 in Chimera                    Linux
2023 in Chimera Linux
 
Building a Linux distro with LLVM
Building a Linux distro        with LLVMBuilding a Linux distro        with LLVM
Building a Linux distro with LLVM
 
Graphics stack updates for Raspberry Pi devices
Graphics stack updates for Raspberry Pi devicesGraphics stack updates for Raspberry Pi devices
Graphics stack updates for Raspberry Pi devices
 
Delegated Compositing - Utilizing Wayland Protocols for Chromium on ChromeOS
Delegated Compositing - Utilizing Wayland Protocols for Chromium on ChromeOSDelegated Compositing - Utilizing Wayland Protocols for Chromium on ChromeOS
Delegated Compositing - Utilizing Wayland Protocols for Chromium on ChromeOS
 
MessageFormat: The future of i18n on the web
MessageFormat: The future of i18n on the webMessageFormat: The future of i18n on the web
MessageFormat: The future of i18n on the web
 
Replacing the geometry pipeline with mesh shaders
Replacing the geometry pipeline with mesh shadersReplacing the geometry pipeline with mesh shaders
Replacing the geometry pipeline with mesh shaders
 
I'm not an AMD expert, but...
I'm not an AMD expert, but...I'm not an AMD expert, but...
I'm not an AMD expert, but...
 
Status of Vulkan on Raspberry
Status of Vulkan on RaspberryStatus of Vulkan on Raspberry
Status of Vulkan on Raspberry
 
Enable hardware acceleration for GL applications without glamor on Xorg modes...
Enable hardware acceleration for GL applications without glamor on Xorg modes...Enable hardware acceleration for GL applications without glamor on Xorg modes...
Enable hardware acceleration for GL applications without glamor on Xorg modes...
 
Async page flip in DRM atomic API
Async page flip in DRM  atomic APIAsync page flip in DRM  atomic API
Async page flip in DRM atomic API
 
From the proposal to ECMAScript – Step by Step
From the proposal to ECMAScript – Step by StepFrom the proposal to ECMAScript – Step by Step
From the proposal to ECMAScript – Step by Step
 
Migrating Babel from CommonJS to ESM
Migrating Babel     from CommonJS to ESMMigrating Babel     from CommonJS to ESM
Migrating Babel from CommonJS to ESM
 
The rainbow treasure map: Advanced color management on Linux with AMD/Steam D...
The rainbow treasure map: Advanced color management on Linux with AMD/Steam D...The rainbow treasure map: Advanced color management on Linux with AMD/Steam D...
The rainbow treasure map: Advanced color management on Linux with AMD/Steam D...
 
Freedreno on Android – XDC 2023
Freedreno on Android          – XDC 2023Freedreno on Android          – XDC 2023
Freedreno on Android – XDC 2023
 
On-going challenges in the Raspberry Pi driver stack – XDC 2023
On-going challenges in the Raspberry Pi driver stack – XDC 2023On-going challenges in the Raspberry Pi driver stack – XDC 2023
On-going challenges in the Raspberry Pi driver stack – XDC 2023
 

Recently uploaded

Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubKalema Edgar
 
My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024The Digital Insurer
 
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsMemoori
 
Pigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
Story boards and shot lists for my a level piece
Story boards and shot lists for my a level pieceStory boards and shot lists for my a level piece
Story boards and shot lists for my a level piececharlottematthew16
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLScyllaDB
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024Lorenzo Miniero
 
SAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxSAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxNavinnSomaal
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebUiPathCommunity
 
Commit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyCommit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyAlfredo García Lavilla
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Commit University
 
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks..."LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...Fwdays
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii SoldatenkoFwdays
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Mattias Andersson
 
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)Wonjun Hwang
 
Powerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time ClashPowerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time Clashcharlottematthew16
 

Recently uploaded (20)

Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding Club
 
My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024
 
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial Buildings
 
Pigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food Manufacturing
 
Hot Sexy call girls in Panjabi Bagh 🔝 9953056974 🔝 Delhi escort Service
Hot Sexy call girls in Panjabi Bagh 🔝 9953056974 🔝 Delhi escort ServiceHot Sexy call girls in Panjabi Bagh 🔝 9953056974 🔝 Delhi escort Service
Hot Sexy call girls in Panjabi Bagh 🔝 9953056974 🔝 Delhi escort Service
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptxE-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
 
DMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special EditionDMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special Edition
 
Story boards and shot lists for my a level piece
Story boards and shot lists for my a level pieceStory boards and shot lists for my a level piece
Story boards and shot lists for my a level piece
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQL
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024
 
SAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxSAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptx
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio Web
 
Commit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyCommit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easy
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!
 
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks..."LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?
 
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
 
Powerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time ClashPowerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time Clash
 

Debugging GPU faults: QoL tools for your driver – XDC 2023

  • 1. Debugging GPU faults: QoL tools for your driver Danylo Piliaiev 2023-10-17 1
  • 2. Who Am I? Currently implementing Adreno 7XX GPU generation in Turnip My blog: In the past Worked on mobile video games Debugging unruly games since 2018 At Igalia since November 2020 blogs.igalia.com/dpiliaiev 2
  • 3. The Problem "What if I was able to quickly edit this GPU packet?" "What if I was able to dump this buffer here?" Or "It would have been nice to print that shader register!" "I'll implement it later...." 3
  • 4. Unrecoverable Hangs - Roadblocks Computer completely locks up and has to be rebooted Last few seconds of logs/anything else are lost Existing tooling isn't of much use with such constraints GFR (Graphics Flight Recorder): VK layer for breadcrumbs Dumps command buffers with commands' status 4
  • 5. Unrecoverable Hangs - Solution More BREADCRUMBS! GFR writes results to the disk GFR logging is far behind what's actually runs on GPU GFR could be too high level: Blits/BeginRenderPass/EndRenderPass could internally use a lot of different 2d and 3d blits 5
  • 6. Unrecoverable Hangs - Solution Observations Unrecoverable hangs are rarely caused by sync issues Cannot allow GPU to race ahead of the last know breadcrumb A hang may happen asynchronously to the GPU packet that triggered it, e.g. A job is scheduled to another GPU unit That GPU unit hangs some time afterwards There may not be a way to synchronize it 6
  • 7. Unrecoverable Hangs - Solution The current solution in Turnip is: Breadcrumbs are inserted after each GPU command GPU writes a breadcrumb and immediately waits for this value to be acknowledged CPU in a busy loop checks the breadcrumb value If new one is found, it is sent over the network The CPU acks the breadcrumb and GPU continues execution 7
  • 8. How Our Breadcrumbs Work BR1 BR2 BR3 BR1 Send GPU CPU GPU executes this command and hangs Kernel BR1 Send over network Wait Wait Wait Read Write BR2 Send BR2 Read Write 8
  • 9. Asynchronous hangs? Require explicit input in tty for each breadcrumb GPU is on breadcrumb 18, continue?y GPU is on breadcrumb 19, continue?y GPU is on breadcrumb 20, continue?y GPU is on breadcrumb 21, continue? 9
  • 10. Breadcrumbs In Practice Increase GPU hang timeout Receive breadcrumbs on another machine via bash spaghetti nc -lvup $PORT | stdbuf -o0 xxd -pc -c 4 | awk -Wposix '{printf("%u:%un", "0x" $0, a[$0]++)} Launch workload with TU_BREADCRUMBS envvar TU_BREADCRUMBS=$IP:$PORT,break=$BREAKPOINT:$BREAKPOI Launch workload and break on the last known 10
  • 12. Faster Way To Debug Hangs 12
  • 13. Breadcrumbs Shortcoming Breadcrumbs are good for finding a command that hangs They cannot tell which part of the GPU state caused it They are useless for misrenderings Some issues are not reproducible with breadcrumbs 13
  • 14. Reproducing Hangs Drivers are already able to capture command streams And all used buffers With this it is trivial to replay the submissions back Ideally requires user-space specified GPU addresses 14
  • 15. Replaying - Caveats Multiple queues When to re-upload memory? Just force a single queue? Timeline semaphores? Recordings may be huge 15
  • 16. Editing The Command Stream Even the most minimalistic editing is useful: /* pkt4: GRAS_2D_RESOLVE_CNTL_2 = { X = 63 | Y = 63 } */ pkt(cs, 4128831); /* pkt4: RB_BLIT_SCISSOR_TL = { X = 0 | Y = 0 } */ pkt4(cs, 0x88d1, (2), 0); /* pkt4: RB_BLIT_SCISSOR_BR = { X = 63 | Y = 63 } */ pkt(cs, 4128831); pkt7(cs, CP_MEM_WRITE, 20); /* { ADDR_LO = 0x1f7580 } */ pkt(cs, 2061696); /* { ADDR_HI = 0x40 } */ pkt(cs, 64); pkt(cs, 1216352390); pkt(cs, 1107296256); 16
  • 17. Editing Shaders const char *source = R"( shps #l37 getone #l37 cov.u32f32 r1.w, c504.z cov.u32f32 r2.x, c504.w cov.u32f32 r1.y, c504.x .... end )"; upload_shader(&ctx, 0x100200d80, source); emit_shader_iova(&ctx, cs, 0x100200d80); 17
  • 18. Replaying Edited Command Stream The decompiler emits C code with raw commands The replay tool takes original submissions capture: Finds unused memory range Emits edited command stream there Overrides target submission 18
  • 19. Dumping GPU Memory Dumping GPU memory is simple to implement Act of copying may disturb GPU caches Kernel cooperation is needed to implement it properly: GPU interrupts execution e.g. by faulting Now memory could be read undisturbed 19
  • 20. Dumping Shader's Registers print %tmp_regs, %src_reg %tmp_regs - 3 consecutive free regs For 64b address and 32b tmp offset %src_reg - a single register to print Shader Log Entries: 6 [0] 00000004 0.0000 [1] 00000000 0.0000 [2] 00000000 0.0000 [3] 00000000 0.0000 [4] beadc429 -0.3394 [5] beadc429 -0.3394 20
  • 21. Dumping Shader's Registers Want a nicer print? Just print $src_reg? You still need to allocate temporary registers What if there are no free regs? Spilling regs may not be that easy at this stage Too much trouble for a little gain... 21
  • 22. Short Summary A tool to replay command stream submissions A tool to decompile a command stream into C code An option to replay edit command stream Helpers to dump GPU memory from the command stream Helpers to dump shader registers 22
  • 23. Stale Regs In Command Stream 23
  • 24. Debugging Stale Registers It could be hard to spot stale reg usage: It may appear as a random geometry flicker Game hanging at a random moment Rare CTS test failure 24
  • 25. Stomping Registers - Caveats Could be a bit tricky if a combination of regs causes an issue VK pipelines could be set outside a renderpass Doesn't help if stale regs are between draw calls Default invalid value may be valid for some registers 25
  • 26. Stomping Registers We mark each register with where it is used: <reg32 offset="0x9600" name="VPC_DBG_ECO_CNTL" usage="cmd"/> <reg32 offset="0xb987" name="HLSQ_CS_CNTL" variants="A6XX" usage="cmd"/> <reg32 offset="0xb984" name="HLSQ_CONTROL_3_REG" variants="A6XX" usage="rp <reg32 offset="0xb985" name="HLSQ_CONTROL_4_REG" variants="A6XX" usage="rp To stomp register you need to specify: export TU_DEBUG_STALE_REGS_RANGE=0x0c00,0xbe01 export TU_DEBUG_STALE_REGS_FLAGS=cmdbuf,renderpass 26
  • 27. Turnip Tooling - Summary Unique tooling: Driver breadcrumbs Command stream replaying and editing GPU memory dumping Shader register dumping Debug option to find stale reg usage 27
  • 28. Other Drivers and Tooling 28
  • 29. Generic GFR - Graphics Flight Recorder Instruments command buffers with completion tags Uses VK_AMD_buffer_marker (nothing vendor specific) In vkd3d-proton: Breadcrumbs Shader printf Descriptor debugging 29
  • 30. Other Mesa Drivers Feature toggles and debug flags Shader assembly replacement for debugging GPU submissions decoding GPU crash dumps decoding 30
  • 31. Radeon - UMR GPU register dumps SGPR / VGPR shader register dumps Shader wavefront Debugging Shader disassembly around the crash site See Maister's blog post for it in action https://themaister.net/blog/2023/08/20/hardcor e-vulkan-debugging-digging-deep-on-linux-amd gpu/ 31
  • 32. Unreleased - Radeon - Shader Debugger 32
  • 33. Proprietary - Radeon™ GPU Detective Postmortem analysis of GPU crashes Information about page faults Breadcrumbs reflecting done and in-progress GPU work Command Buffer ID: 0x107c ========================= [>] "Frame 1040 CL0" ├─[X] "Depth + Normal + Motion Vector PrePass" ├─[>] "DownSamplePS" │ ├─[X] Draw │ ├─[>] Draw └─[>] "Bloom" ├─[>] "BlurPS" │ ├─[>] Draw │ └─[>] Draw ├─[ ] Draw ├─[ ] "BlurPS" 33
  • 34. Proprietary - NVIDIA Aftermath 34
  • 35. Proprietary - NVIDIA Aftermath Collects GPU “mini-dumps” Visualizes GPU state at the moment of crash Collects breadcrumbs Shows crashing shader and it registers 35
  • 36. Q&A Any good tools I haven't mentioned? Maybe you tried something before? Maybe you have an idea for a tool to implement? We're hiring! igalia.com/jobs/ 36
  • 37. 37