Nicholas Scott's presentation on advanced analytics Nagios.
The presentation was given during the Nagios World Conference North America held Sept 25-28th, 2012 in Saint Paul, MN. For more information on the conference (including photos and videos), visit: http://go.nagios.com/nwcna
2. Disclaimer
Math may occur later.
I apologize in advance.
2012 2
3. Abstract
Introduction
Capacity Planning Component
Features
Different Forecasting Methods
When to use
RRD Analysis Tool
Statistics Pillow Talk
2012 3
4. Introduction
Nagios Data Gathering Attributes
SO MUCH DATA (TOO MUCH?)
Generally noisy
Sources usually not simple
How many factors are affecting service X on a
given host Y?
We have data showing X is like this but why?
2012 4
5. Capacity Planning Terminology
Residuals – Variation that exists after fitting
Period – A frame of time where a pattern cycles
through a complete iteration
Example:
2012 5
9. Capacity Planning
Least Squares
Better for simple trending, obviously
Finds trend line that minimizes the sum of the
residuals squared
Less computationally expensive than HW
2012 9
10. Capacity Planning
Good choice for noisy data
Possible future mean value
2012 10
11. Capacity Planning
Linear Algebra is fun
Linear Algebra is grindy
Linear Algebra is a great way to really think
about algorithms
RRD Python abstraction class is available
2012 11
12. Capacity Planning
Quadratic/Cubic Fit
Naive Experimental
Fits a polynomial of given order to data
2012 12
13. Capacity Planning
For quadratic or cubic datasets
User decision
2012 13
14. RRD Analysis Tool
Goals
General stats, mean, variance, etc
Also do derivatives, multiple order derivatives
Bivariate correlation
Dependencies:
Python >= 2.4
numpy, rrdtool, scipy, matplotlib, mako
2012 14
15. RRD Analysis Tool
Example running of this thing:
./analyze.py -H localhost -S Current_Load -s
2012 15
16. RRD Analysis Tool
Why do you want to smooth your stuff?
Noise noise noise
Comedy Option: Pretty graphs
Mean
Stddev
Variance
2012 16
17. RRD Analysis Tool
Derivatives Δx
Quick refresher:
Δy
Actual form we'll use:
y t − y t−1 y t − yt −1
=
t t −t t−1 RRD Resolution
2012 17
18. RRD Analysis Tool
Uses?
Relateable to physics?
Position
Velocity
Acceleration
Jerk (seriously)
2012 18
19. RRD Analysis Tool
Example, first derivative on CPU Load:
analyze.py -H localhost -S Current_Load -d 1
2012 19
22. RRD Analysis Tool
Bivariate Analysis
Compare two possibly related variables
Define a relationship
Graph them on the same graph
Find Pearson's Correlation Coefficient
2012 22
Try to keep this applicable to real life, as this is the Nagios world conference, I just like the math portion of it Looking for hardcore application, Wittenberg is presenting right now and its very applicative However, I will foray into implementation a bit, and since I like programming to some tips on what I learned when implementing these Statistics, I like it, perhaps some things I overlooked Haile story
Cover the new CP component for Nagios XI - Some of the features, dates, extrapolation, RRD data validity exclusions - sprinkled with the how and why behind whats going on RRD Data Analysis tool - Derivatives, Bivariate comparisons, correlation - Free, I put it together for fun contact me if you want it, want to use it in a project or personal use, whatevs
Nagios collects data at 5 minutes, and, god help us, our uptime... Each service is a complex function, how would you write a function to represent all factors that affect the services perfdata? After thinking about that? Are you sure? Financial sectors deals with this everyday Goal is to make this data usable, heart of forecasting and analysis, understand the numbers better, seems abstract at first, and takes time
The capacity planning component was designed so that you don't have to know much to get a some forecasting going
Periods: Time where a pattern may repeat itself Extrap is limited to 4 * period Methods: A few more are in development, but the current set is a 'good start' All are self-projecting, rather than cause-and-effect
Without going through the forumula, well kind of Smoothed value – exponentially weighted Trend value - Represents variations of the time series that happen at a lower frequency Seasonal Value Represent items that occur across trends, could be a construed as the trend of the trend Calculates initial trend by: Split the two known periods, calculate trend by summing second period_t – first period_t, divide by L, then divide that sum by L,
Feeds back on itself, if the difference from period 1 to period 2 contained some strange outlier, it will be represented, and exaggerated in next steps However, there is something satisfying about having a somewhat educated guess as to what a stat is going to be in several weeks/months Which is a shortcoming of holt winters, outliers can destroy it Smoothing may be necessary or preferred, not currently implemented, on todo list for future release, presents own issues, Would like to discuss implementation as its fascinating, but we'll move on as its also time consuming
Should not be used to predict future values, but to predict future direction Should be treated as more of a “this should be around this level at this time.” Will however be wrong if dealing with an exponential or quadratic dataset, wouldn't be noticeable if extrapolation period was short enough however, eg derivatives.
Good for noisy data as it is mean only as a trender Actual graph line shows where the least squared of the residuals will be in the future Aside: Fun to implement. If you're interested in Linear Algebra you'll have a blast.
Do it if you like Linear Algebra, or just want to hone youre programming prowess, doing any sort of matrix operations will make you better at algorithms. Don't look for pot of gold at the end, its hard to do clever stuff that severely reduces time complexity of basic matrix operations RRD abstraction class is avaiable through the stats thing I wrote about, makes it take less thought on getting info out of the RRD
Much like least squares, fits polynomial to have the minimum sum of the squared residuals Gears more towards items where you would expect exponential growth Given thats its for exponential growth, can be very touch, the more data you have to compare with, the better it will be, which goes for every one of these, but this one in particular
Once again, this is for anticpated exponential datasets User decision, are you expecting quadratic or cubic growth or decay, or want to plan for it?
Looking to take a crack at some general stat data with an eye on nagios Analysis stuff has been around a while, just looking to make something specific to Nagios and RRDs Take a look at what these definitions actually mean to a network operation, or the usual nagios setup If you want to use it, or help develop it, feel free
Looking to take a crack at some general stat data with an eye on nagios Analysis stuff has been around a while, just looking to make something specific to Nagios and RRDs Take a look at what these definitions actually mean to a network operation, or the usual nagios setup
Weird random stuff happens, and this weird random stuff throws off statistical analysis, kind of strange if you think about it philosophically, however this isn't philosophy, this is math, there are rules Would you have wanted that spke to 5 to register as a critical? That speaks to the noise, as we'll see when we go into the derivatives Stdev – helps to understand the outliers and for setting up normal distributions for calculating the odds of what future values may be Variance – Can help identify multiplicative trend when mean and variance are increasing with some period
Our use case is thatx = RRD data with the y being the time value those values occured. Since we're not in math class, no need to do this as h approaches 0 business This actually makes our job pretty easy, obviously we'll need a y_t-1 value, which we'll just leave as 0 as we
Everyday. Every single time you see a Bytes/Sec reading, thats a delta, and thats all this is trying to do Why is the current byte count useless to us? Do our brains not keep its state? Probably, can we apply that other metrics? Would it be useful? When would it not be useful? Bytes per second is always increasing, CPU load is not Can we relate this to physics, if we can we can use their entire wealth of information, however the nature may be different
Do you care what the rate of change is of your CPU load per 300 seconds? What does the mean actually symbolize here? Or any of them Interpret: Mean – The CPU load was slowly growing Max – magnitude of the highest rate of positive increase, and we can see the time that it happened, not when it peaked, but when it started its rise to it Min – Same thing
Root partition on Nagios test box, obviously a very active nagios box Obviously not an active hard drive and these values are nothing to worry about Keep in mind peaks of actual bytes happen when the derivative is going from pos -> neg at zero. Helps isolate actual times of events.
Now we get back to the second derivative, which if you remember is similar to the acceleration How fast was the rate of change changing? What does this mean? At zero the velocity is at its local max/min Cycle is back as far as timing goes d(d(cos)) F = ma, is there something we could assign to be m, F? Might show relative magnitude of impulse
Correlation We have all these services/hosts, are they related? We can postulate, but we don't know for sure If there are lags we woudn't really know, but lets start simple Graph em Find Pearsons
We can see that there is definitely a relationship, two different checks that are checking local ping, but are getting slightly different results Transcends that though We can imagine a line on that graph that would do a pretty good job of representing those points 0 - .09 : None .1 - .3 : Small .3 - .5 : Medium Else Strong
Hard to pull the relationship out of this graph R shows a medium NEGATIVE correlation, meaning that when one goes up, the other goes down Would've been hard to pull that out without a little help 0 - .09 : None .1 - .3 : Small .3 - .5 : Medium Else Strong