SlideShare a Scribd company logo
1 of 29
Gimli:Server Process Monitoring& Fault Analysis
Agenda What problem does Gimli solve? How does it work? How can I leverage Gimli? Where can I get Gimli? How can I contribute to Gimli?
The Setting You have software deployed in Production Production: has stability and/or downtime requirements May prohibit gdb and interactive debugging tools for security purposes May be at the other end of a VPN and/or be subject to draconian access policies
The Problem Your software is manifesting a crash/hang problem in Production Can’t use gdb: Takes too long to poke around May not be able to run gdb Getting a usable core file is hard: Big server processes leave big cores Can’t debug the core on the box Can’t get the core back to your box How can you effectively get a bead on the issue?
Gimli – a solution in 2 parts Monitor: supervises your target process Optional watchdog support via heartbeat If a fault is detected (crash, hang) invokes the analysis component, then re-spawns the target process Glider (the analyzer): Pstack on steroids DWARF-3 assisted stack unwinding Full backtrace and deep structure printing Optional extension modules to augment trace Runs on: Linux, Solaris, Darwin OS/X. i386, x86_64/amd64. Incomplete support for Solaris Sparc (mostly there!)
The Gimli Monitor A bit like “init”, but focused on a single process Detaches from the terminal and establishes a new session Spawns the child process and monitors: Did the child exit abnormally (signalled)? Did the child get STOP’d? If the child opted in to hearbeat, did its heart stop beating? If one of these conditions triggers, and the process is still running, the Glider is invoked The process is then respawned (and throttled if respawning too fast)
Monitor options All options can be command line, environmental or in a config file Watchdog interval – heart must beat at least once in this many seconds Detach, setstid: alter daemonizing behavior Trace-dir – where to record trace files Pidfile – where to store pidfile Uid, gid – setuid/setgid to the specified user before spawning the child Respawn-frequency – brake on respawning to avoid hammering the system
Heart beat Monitor assumes that the child does not support heartbeat until it beats at least once Two mechanisms for heart beat: Simple: kill(getppid(), SIGUSR1) Easy for scripts to implement Shared Memory via libgimli: hb = gimli_heartbeat_attach() gimli_heartbeat_set(hb, GIMLI_HB_RUNNING);
Effective Heart beat Intended to issue one heart beat per iteration of the main loop in your app If it stops ticking over at least once every watchdog interval (60 seconds by default), it is deemed to have hanged Generally not good to heart beat inside utility functions; better to keep it at the higher level in the app to avoid false negatives Exception to this rule might be an infrequent but long running function
Trapping Faults If you don’t install a signal handler, the Monitor won’t be able to trap a fault and generate a trace libgimli provides a gimli_establish_signal_handlers() to set up suitable fault trapping Handler: Clears signal handler so a double fault will kill us SIGSTOP’s self; monitor is notified via SIGCHLD When it resumes, calls an application supplied shutdown hook Sends the faulting signal to itself so that it terminates abnormally
Tracing When the Monitor decides that the child is dead, it initiates tracing Creates a file named appname.pid.trc in the specified trace dir (default: /tmp) Emits a header describing the reason for the fault and when it happened Spawns Glider and has its output append to the trace file
Trace Header This is a trace file generated by Gimli. Process: pid=6704 /path/to/faulty-app Traced because: fault detected Time of trace: (1278702746) Fri Jul  9 15:12:26 2010 Invoking trace program: /opt/msys/gimli/bin/glider
The Gimli Glider Name comes from a combination of well known Dwarf and Elf fantasy literature and an incident in Manitoba where an aircraft ran out of fuel Fundamental purpose is to get a stack trace of all threads in the target process in a human readableform Uses DWARF-3 debugging information gcc -gdwarf-2 -g3 Extracts siginfo from signal trampolines to show more detail about the fault
Example partial trace Thread 0 (LWP 6704) #0  0x00000035ad69a1a1 /lib64/libc-2.5.so`nanosleep+41 #1  0x00000035ad699fc3 /lib64/libc-2.5.so`sleep+93 #2  0x00002b029434bde6 /opt/msys/3rdParty/lib/perl5/5.10.0/x86_64-linux-thread-multi/CORE/libperl.so`Perl_pp_sleep+56 (pp_sys.c:4525) PerlInterpreter *my_perl = 0x6ff2010 (/opt/msys/3rdParty/bin/perl`0x6ff2010)  [deref'ingmy_perl] PerlInterpreter  @ 0x6ff2010 = {       SV **Istack_sp = 0x71938c8 (/opt/msys/3rdParty/bin/perl`0x71938c8)       OP *Iop = 0x7cd8f30 (/opt/msys/3rdParty/bin/perl`0x7cd8f30)       SV **Icurpad = 0x7019fc0 (/opt/msys/3rdParty/bin/perl`0x7019fc0)       SV **Istack_base = 0x71938c0 (/opt/msys/3rdParty/bin/perl`0x71938c0)       SV **Istack_max = 0x71958a8 (/opt/msys/3rdParty/bin/perl`0x71958a8)       I32 *Iscopestack = 0x76331c0 (/opt/msys/3rdParty/bin/perl`0x76331c0)       I32 Iscopestack_ix = 5 (0x5)       I32 Iscopestack_max = 108 (0x6c)
Glider vspstack Pstack is terse, does not include file:line info Glider is verbose (full backtrace) Glider is extensible
Extending Glider It is often necessary to get more than just a stack trace out of a faulting application Want a readable snapshot of important datastructures like: Circular log buffer Stats State machine dumps Memory utilization Glider provides an API for extracting this type of data from your process
Gimli Modules Gimli modules are shared objects that are loaded and executed by the glider process, not the target, faulting process Glider examines the mapped objects in the target process and locates corresponding gimli modules. Example: if I link against libfoo.so, gimli looks for a gimli_libfoo.so If a module in the target has a “gimli_tracer_module_name” symbol, it is interpreted as the name of a module to load in the glider instead
Glider Modules Gimli modules export a function named “gimli_ana_init” which is invoked when the module is loaded into the Glider It is expected to return a Gimli module structure that allows the module to hook into aspects of the tracing When all the modules have been loaded, the trace begins Can intercept each threads trace and each frame of the trace Can either emit additional info or suppress the item After the stack trace, modules have their generic tracing hook invoked to allow arbitrary other data to be emitted
Glider API: libgimli_ana.h Provides functions that: resolve symbols Reverse map an address to a symbol Copy memory from the target process Copy a string from the target process Get source (file:line) information for an address Examine DWARF parameter information for a named parameter in a given stack frame
Extracting data from globals Given the following global data in the target, how can I print it out in my Gimli module? char *my_str_ptr = “hello”; intmy_int; char my_str[1024]; structfoo { int value; } my_foo; structfoo *my_foo_ptr = &my_foo;
static void perform_trace(conststructgimli_ana_api *api, const char *object) { 	// First the string pointer 	char *s = api->get_string_symbol(object,		“my_str_ptr”); printf(“%s”, s ? s : “null”);
// Now the int. The 0 below means don’t deref intival; if (api->copy_from_symbol(object, “my_int”, 			0, &ival, sizeof(ival)) { printf(“my_int: %d”, ival); }
// Now the string buffer char buf[1024]; if (api->copy_from_symbol(object, “my_str”, 			0, &buf, sizeof(buf)) { printf(“my_str: %s”, buf); }
// Now the foostruct; you need to #include its // definition, or re-declare it here structfoof; if (api->copy_from_symbol(object, “my_foo”, 			0, &f, sizeof(f)) { printf(“foo.value: %d”, f.value); } if (api->copy_from_symbol(object, “my_foo_ptr”, 			1, &f, sizeof(f)) { // 1 = deref once printf(“foo_ptr->value: %d”, f.value); }
Making an address human readable // Some address I plucked from the target void *addr = ???; // What is the closest symbol? char buf[1024]; printf(“%p = %s”, addr, api->sym_name(addr, buf, sizeof(buf)); // Gives: 0x00000035ad699fc3 = /lib64/libc-2.5.so`sleep+93
File:line info // if it is code, what’s the file and line? int line; if (api->get_source_info(addr, buf, sizeof(buf), 				&line)) { printf(“%p => %s:%d”, addr, buf, line); }
Keep in mind Gimli modules are automated debuggers There is no direct access to pointers You need to copy the memory space from the target process before you can work with it naturally in C The target faulted; its pointers may be bogus and point to invalid memory The read routines will indicate an error or short read if the address is invalid or if the address range is only partially valid If you crash the glider, you may leave the target in limbo
Future Directions Gimli modules for Lua, Perl etc. to render the script stack trace interspersed with the C stack trace Lua (and/or other script) bindings that leverage DWARF info and allow scripted analyzer modules Analysis across traces? Integration with a system like Mozilla Socorro?  One monitor process for multiple children and/or “init” integration?
Where can I get Gimli? Available from https://bitbucket.org/wez/gimli/ OpenSource: 3-clause BSD license Interested in contributing? Docs, code, testing, bug reports and test cases are all welcome! Email me: wez@messagesystems.com

More Related Content

Viewers also liked

Role of school college education
Role of school college educationRole of school college education
Role of school college educationDhruva Trivedy
 
Convergencia Nº 1 Septiembre 2005
Convergencia Nº 1 Septiembre 2005Convergencia Nº 1 Septiembre 2005
Convergencia Nº 1 Septiembre 2005revistaconvergencia
 
The 2008 Museum Landscape
The 2008 Museum LandscapeThe 2008 Museum Landscape
The 2008 Museum LandscapeMike Ellis
 
Hot Chocolate: You got cocoa in my PHP
Hot Chocolate: You got cocoa in my PHPHot Chocolate: You got cocoa in my PHP
Hot Chocolate: You got cocoa in my PHPWez Furlong
 

Viewers also liked (7)

Role of school college education
Role of school college educationRole of school college education
Role of school college education
 
PHP and COM
PHP and COMPHP and COM
PHP and COM
 
Convergencia Nº 1 Septiembre 2005
Convergencia Nº 1 Septiembre 2005Convergencia Nº 1 Septiembre 2005
Convergencia Nº 1 Septiembre 2005
 
PresentacióN1popopo
PresentacióN1popopoPresentacióN1popopo
PresentacióN1popopo
 
The 2008 Museum Landscape
The 2008 Museum LandscapeThe 2008 Museum Landscape
The 2008 Museum Landscape
 
Hot Chocolate: You got cocoa in my PHP
Hot Chocolate: You got cocoa in my PHPHot Chocolate: You got cocoa in my PHP
Hot Chocolate: You got cocoa in my PHP
 
Getting It Done
Getting It DoneGetting It Done
Getting It Done
 

Similar to Gimli: Server Process Monitoring and Fault Analysis

How to recognise that the user has just uninstalled your android app droidc...
How to recognise that the user has just uninstalled your android app   droidc...How to recognise that the user has just uninstalled your android app   droidc...
How to recognise that the user has just uninstalled your android app droidc...Przemek Jakubczyk
 
How to recognise that the user has just uninstalled your android app
How to recognise that the user has just uninstalled your android appHow to recognise that the user has just uninstalled your android app
How to recognise that the user has just uninstalled your android appPrzemek Jakubczyk
 
Brillo/Weave Part 2: Deep Dive
Brillo/Weave Part 2: Deep DiveBrillo/Weave Part 2: Deep Dive
Brillo/Weave Part 2: Deep DiveJalal Rohani
 
How to recognise that the user has just uninstalled your app
How to recognise that the user has just uninstalled your appHow to recognise that the user has just uninstalled your app
How to recognise that the user has just uninstalled your appAleksander Piotrowski
 
Linux Security APIs and the Chromium Sandbox
Linux Security APIs and the Chromium SandboxLinux Security APIs and the Chromium Sandbox
Linux Security APIs and the Chromium SandboxPatricia Aas
 
Working with core dump
Working with core dumpWorking with core dump
Working with core dumpThierry Gayet
 
Linux Security and How Web Browser Sandboxes Really Work (NDC Oslo 2017)
Linux Security  and How Web Browser Sandboxes Really Work (NDC Oslo 2017)Linux Security  and How Web Browser Sandboxes Really Work (NDC Oslo 2017)
Linux Security and How Web Browser Sandboxes Really Work (NDC Oslo 2017)Patricia Aas
 
Implementing of classical synchronization problem by using semaphores
Implementing of classical synchronization problem by using semaphoresImplementing of classical synchronization problem by using semaphores
Implementing of classical synchronization problem by using semaphoresGowtham Reddy
 
Lecture2 process structure and programming
Lecture2   process structure and programmingLecture2   process structure and programming
Lecture2 process structure and programmingMohammed Farrag
 
Desymfony 2011 - Habemus Bundles
Desymfony 2011 - Habemus BundlesDesymfony 2011 - Habemus Bundles
Desymfony 2011 - Habemus BundlesAlbert Jessurum
 
Chromium Sandbox on Linux (BlackHoodie 2018)
Chromium Sandbox on Linux (BlackHoodie 2018)Chromium Sandbox on Linux (BlackHoodie 2018)
Chromium Sandbox on Linux (BlackHoodie 2018)Patricia Aas
 
SBA Security Meetup: I want to break free - The attacker inside a Container
SBA Security Meetup: I want to break free - The attacker inside a ContainerSBA Security Meetup: I want to break free - The attacker inside a Container
SBA Security Meetup: I want to break free - The attacker inside a ContainerSBA Research
 
Android 5.0 Lollipop platform change investigation report
Android 5.0 Lollipop platform change investigation reportAndroid 5.0 Lollipop platform change investigation report
Android 5.0 Lollipop platform change investigation reporthidenorly
 
Chromium Sandbox on Linux (NDC Security 2019)
Chromium Sandbox on Linux (NDC Security 2019)Chromium Sandbox on Linux (NDC Security 2019)
Chromium Sandbox on Linux (NDC Security 2019)Patricia Aas
 
Finding target for hacking on internet is now easier
Finding target for hacking on internet is now easierFinding target for hacking on internet is now easier
Finding target for hacking on internet is now easierDavid Thomas
 
5 - ch3.pptx
5 - ch3.pptx5 - ch3.pptx
5 - ch3.pptxTemp35704
 
Jailbreak Detector Detector
Jailbreak Detector DetectorJailbreak Detector Detector
Jailbreak Detector DetectorNick Mooney
 

Similar to Gimli: Server Process Monitoring and Fault Analysis (20)

How to recognise that the user has just uninstalled your android app droidc...
How to recognise that the user has just uninstalled your android app   droidc...How to recognise that the user has just uninstalled your android app   droidc...
How to recognise that the user has just uninstalled your android app droidc...
 
How to recognise that the user has just uninstalled your android app
How to recognise that the user has just uninstalled your android appHow to recognise that the user has just uninstalled your android app
How to recognise that the user has just uninstalled your android app
 
Brillo/Weave Part 2: Deep Dive
Brillo/Weave Part 2: Deep DiveBrillo/Weave Part 2: Deep Dive
Brillo/Weave Part 2: Deep Dive
 
How to recognise that the user has just uninstalled your app
How to recognise that the user has just uninstalled your appHow to recognise that the user has just uninstalled your app
How to recognise that the user has just uninstalled your app
 
Linux Security APIs and the Chromium Sandbox
Linux Security APIs and the Chromium SandboxLinux Security APIs and the Chromium Sandbox
Linux Security APIs and the Chromium Sandbox
 
Working with core dump
Working with core dumpWorking with core dump
Working with core dump
 
Linux Security and How Web Browser Sandboxes Really Work (NDC Oslo 2017)
Linux Security  and How Web Browser Sandboxes Really Work (NDC Oslo 2017)Linux Security  and How Web Browser Sandboxes Really Work (NDC Oslo 2017)
Linux Security and How Web Browser Sandboxes Really Work (NDC Oslo 2017)
 
Implementing of classical synchronization problem by using semaphores
Implementing of classical synchronization problem by using semaphoresImplementing of classical synchronization problem by using semaphores
Implementing of classical synchronization problem by using semaphores
 
test
testtest
test
 
Lecture2 process structure and programming
Lecture2   process structure and programmingLecture2   process structure and programming
Lecture2 process structure and programming
 
Desymfony 2011 - Habemus Bundles
Desymfony 2011 - Habemus BundlesDesymfony 2011 - Habemus Bundles
Desymfony 2011 - Habemus Bundles
 
Chromium Sandbox on Linux (BlackHoodie 2018)
Chromium Sandbox on Linux (BlackHoodie 2018)Chromium Sandbox on Linux (BlackHoodie 2018)
Chromium Sandbox on Linux (BlackHoodie 2018)
 
SBA Security Meetup: I want to break free - The attacker inside a Container
SBA Security Meetup: I want to break free - The attacker inside a ContainerSBA Security Meetup: I want to break free - The attacker inside a Container
SBA Security Meetup: I want to break free - The attacker inside a Container
 
Android 5.0 Lollipop platform change investigation report
Android 5.0 Lollipop platform change investigation reportAndroid 5.0 Lollipop platform change investigation report
Android 5.0 Lollipop platform change investigation report
 
Chromium Sandbox on Linux (NDC Security 2019)
Chromium Sandbox on Linux (NDC Security 2019)Chromium Sandbox on Linux (NDC Security 2019)
Chromium Sandbox on Linux (NDC Security 2019)
 
Scaling django
Scaling djangoScaling django
Scaling django
 
Pyramid Framework
Pyramid FrameworkPyramid Framework
Pyramid Framework
 
Finding target for hacking on internet is now easier
Finding target for hacking on internet is now easierFinding target for hacking on internet is now easier
Finding target for hacking on internet is now easier
 
5 - ch3.pptx
5 - ch3.pptx5 - ch3.pptx
5 - ch3.pptx
 
Jailbreak Detector Detector
Jailbreak Detector DetectorJailbreak Detector Detector
Jailbreak Detector Detector
 

Recently uploaded

Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebUiPathCommunity
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfAddepto
 
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxThe Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxLoriGlavin3
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenHervé Boutemy
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationSlibray Presentation
 
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxPasskey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxLoriGlavin3
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsRizwan Syed
 
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdfHyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdfPrecisely
 
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxA Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxLoriGlavin3
 
What is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdfWhat is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdfMounikaPolabathina
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Mark Simos
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxLoriGlavin3
 
DSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningDSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningLars Bell
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsSergiu Bodiu
 
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024BookNet Canada
 
The State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxThe State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxLoriGlavin3
 
Advanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionAdvanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionDilum Bandara
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Commit University
 
SAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxSAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxNavinnSomaal
 

Recently uploaded (20)

Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio Web
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdf
 
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxThe Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache Maven
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck Presentation
 
DMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special EditionDMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special Edition
 
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxPasskey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL Certs
 
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdfHyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
 
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxA Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
 
What is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdfWhat is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdf
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
 
DSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningDSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine Tuning
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platforms
 
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
 
The State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxThe State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptx
 
Advanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionAdvanced Computer Architecture – An Introduction
Advanced Computer Architecture – An Introduction
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!
 
SAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxSAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptx
 

Gimli: Server Process Monitoring and Fault Analysis

  • 2. Agenda What problem does Gimli solve? How does it work? How can I leverage Gimli? Where can I get Gimli? How can I contribute to Gimli?
  • 3. The Setting You have software deployed in Production Production: has stability and/or downtime requirements May prohibit gdb and interactive debugging tools for security purposes May be at the other end of a VPN and/or be subject to draconian access policies
  • 4. The Problem Your software is manifesting a crash/hang problem in Production Can’t use gdb: Takes too long to poke around May not be able to run gdb Getting a usable core file is hard: Big server processes leave big cores Can’t debug the core on the box Can’t get the core back to your box How can you effectively get a bead on the issue?
  • 5. Gimli – a solution in 2 parts Monitor: supervises your target process Optional watchdog support via heartbeat If a fault is detected (crash, hang) invokes the analysis component, then re-spawns the target process Glider (the analyzer): Pstack on steroids DWARF-3 assisted stack unwinding Full backtrace and deep structure printing Optional extension modules to augment trace Runs on: Linux, Solaris, Darwin OS/X. i386, x86_64/amd64. Incomplete support for Solaris Sparc (mostly there!)
  • 6. The Gimli Monitor A bit like “init”, but focused on a single process Detaches from the terminal and establishes a new session Spawns the child process and monitors: Did the child exit abnormally (signalled)? Did the child get STOP’d? If the child opted in to hearbeat, did its heart stop beating? If one of these conditions triggers, and the process is still running, the Glider is invoked The process is then respawned (and throttled if respawning too fast)
  • 7. Monitor options All options can be command line, environmental or in a config file Watchdog interval – heart must beat at least once in this many seconds Detach, setstid: alter daemonizing behavior Trace-dir – where to record trace files Pidfile – where to store pidfile Uid, gid – setuid/setgid to the specified user before spawning the child Respawn-frequency – brake on respawning to avoid hammering the system
  • 8. Heart beat Monitor assumes that the child does not support heartbeat until it beats at least once Two mechanisms for heart beat: Simple: kill(getppid(), SIGUSR1) Easy for scripts to implement Shared Memory via libgimli: hb = gimli_heartbeat_attach() gimli_heartbeat_set(hb, GIMLI_HB_RUNNING);
  • 9. Effective Heart beat Intended to issue one heart beat per iteration of the main loop in your app If it stops ticking over at least once every watchdog interval (60 seconds by default), it is deemed to have hanged Generally not good to heart beat inside utility functions; better to keep it at the higher level in the app to avoid false negatives Exception to this rule might be an infrequent but long running function
  • 10. Trapping Faults If you don’t install a signal handler, the Monitor won’t be able to trap a fault and generate a trace libgimli provides a gimli_establish_signal_handlers() to set up suitable fault trapping Handler: Clears signal handler so a double fault will kill us SIGSTOP’s self; monitor is notified via SIGCHLD When it resumes, calls an application supplied shutdown hook Sends the faulting signal to itself so that it terminates abnormally
  • 11. Tracing When the Monitor decides that the child is dead, it initiates tracing Creates a file named appname.pid.trc in the specified trace dir (default: /tmp) Emits a header describing the reason for the fault and when it happened Spawns Glider and has its output append to the trace file
  • 12. Trace Header This is a trace file generated by Gimli. Process: pid=6704 /path/to/faulty-app Traced because: fault detected Time of trace: (1278702746) Fri Jul 9 15:12:26 2010 Invoking trace program: /opt/msys/gimli/bin/glider
  • 13. The Gimli Glider Name comes from a combination of well known Dwarf and Elf fantasy literature and an incident in Manitoba where an aircraft ran out of fuel Fundamental purpose is to get a stack trace of all threads in the target process in a human readableform Uses DWARF-3 debugging information gcc -gdwarf-2 -g3 Extracts siginfo from signal trampolines to show more detail about the fault
  • 14. Example partial trace Thread 0 (LWP 6704) #0 0x00000035ad69a1a1 /lib64/libc-2.5.so`nanosleep+41 #1 0x00000035ad699fc3 /lib64/libc-2.5.so`sleep+93 #2 0x00002b029434bde6 /opt/msys/3rdParty/lib/perl5/5.10.0/x86_64-linux-thread-multi/CORE/libperl.so`Perl_pp_sleep+56 (pp_sys.c:4525) PerlInterpreter *my_perl = 0x6ff2010 (/opt/msys/3rdParty/bin/perl`0x6ff2010) [deref'ingmy_perl] PerlInterpreter @ 0x6ff2010 = { SV **Istack_sp = 0x71938c8 (/opt/msys/3rdParty/bin/perl`0x71938c8) OP *Iop = 0x7cd8f30 (/opt/msys/3rdParty/bin/perl`0x7cd8f30) SV **Icurpad = 0x7019fc0 (/opt/msys/3rdParty/bin/perl`0x7019fc0) SV **Istack_base = 0x71938c0 (/opt/msys/3rdParty/bin/perl`0x71938c0) SV **Istack_max = 0x71958a8 (/opt/msys/3rdParty/bin/perl`0x71958a8) I32 *Iscopestack = 0x76331c0 (/opt/msys/3rdParty/bin/perl`0x76331c0) I32 Iscopestack_ix = 5 (0x5) I32 Iscopestack_max = 108 (0x6c)
  • 15. Glider vspstack Pstack is terse, does not include file:line info Glider is verbose (full backtrace) Glider is extensible
  • 16. Extending Glider It is often necessary to get more than just a stack trace out of a faulting application Want a readable snapshot of important datastructures like: Circular log buffer Stats State machine dumps Memory utilization Glider provides an API for extracting this type of data from your process
  • 17. Gimli Modules Gimli modules are shared objects that are loaded and executed by the glider process, not the target, faulting process Glider examines the mapped objects in the target process and locates corresponding gimli modules. Example: if I link against libfoo.so, gimli looks for a gimli_libfoo.so If a module in the target has a “gimli_tracer_module_name” symbol, it is interpreted as the name of a module to load in the glider instead
  • 18. Glider Modules Gimli modules export a function named “gimli_ana_init” which is invoked when the module is loaded into the Glider It is expected to return a Gimli module structure that allows the module to hook into aspects of the tracing When all the modules have been loaded, the trace begins Can intercept each threads trace and each frame of the trace Can either emit additional info or suppress the item After the stack trace, modules have their generic tracing hook invoked to allow arbitrary other data to be emitted
  • 19. Glider API: libgimli_ana.h Provides functions that: resolve symbols Reverse map an address to a symbol Copy memory from the target process Copy a string from the target process Get source (file:line) information for an address Examine DWARF parameter information for a named parameter in a given stack frame
  • 20. Extracting data from globals Given the following global data in the target, how can I print it out in my Gimli module? char *my_str_ptr = “hello”; intmy_int; char my_str[1024]; structfoo { int value; } my_foo; structfoo *my_foo_ptr = &my_foo;
  • 21. static void perform_trace(conststructgimli_ana_api *api, const char *object) { // First the string pointer char *s = api->get_string_symbol(object, “my_str_ptr”); printf(“%s”, s ? s : “null”);
  • 22. // Now the int. The 0 below means don’t deref intival; if (api->copy_from_symbol(object, “my_int”, 0, &ival, sizeof(ival)) { printf(“my_int: %d”, ival); }
  • 23. // Now the string buffer char buf[1024]; if (api->copy_from_symbol(object, “my_str”, 0, &buf, sizeof(buf)) { printf(“my_str: %s”, buf); }
  • 24. // Now the foostruct; you need to #include its // definition, or re-declare it here structfoof; if (api->copy_from_symbol(object, “my_foo”, 0, &f, sizeof(f)) { printf(“foo.value: %d”, f.value); } if (api->copy_from_symbol(object, “my_foo_ptr”, 1, &f, sizeof(f)) { // 1 = deref once printf(“foo_ptr->value: %d”, f.value); }
  • 25. Making an address human readable // Some address I plucked from the target void *addr = ???; // What is the closest symbol? char buf[1024]; printf(“%p = %s”, addr, api->sym_name(addr, buf, sizeof(buf)); // Gives: 0x00000035ad699fc3 = /lib64/libc-2.5.so`sleep+93
  • 26. File:line info // if it is code, what’s the file and line? int line; if (api->get_source_info(addr, buf, sizeof(buf), &line)) { printf(“%p => %s:%d”, addr, buf, line); }
  • 27. Keep in mind Gimli modules are automated debuggers There is no direct access to pointers You need to copy the memory space from the target process before you can work with it naturally in C The target faulted; its pointers may be bogus and point to invalid memory The read routines will indicate an error or short read if the address is invalid or if the address range is only partially valid If you crash the glider, you may leave the target in limbo
  • 28. Future Directions Gimli modules for Lua, Perl etc. to render the script stack trace interspersed with the C stack trace Lua (and/or other script) bindings that leverage DWARF info and allow scripted analyzer modules Analysis across traces? Integration with a system like Mozilla Socorro? One monitor process for multiple children and/or “init” integration?
  • 29. Where can I get Gimli? Available from https://bitbucket.org/wez/gimli/ OpenSource: 3-clause BSD license Interested in contributing? Docs, code, testing, bug reports and test cases are all welcome! Email me: wez@messagesystems.com