This document discusses kernel density estimation (KDE), a non-parametric method for estimating the probability density function of a variable. KDE involves placing a kernel (such as a Gaussian) over each data point and summing the kernels to estimate the density. The bandwidth parameter controls the width of each kernel and influences the smoothness of the estimated density function. Different kernel functions, such as uniform, triangular, and normal can be used. KDE provides a continuous density estimate compared to histograms and converges faster than histograms for continuous variables.
2. Parametric Vs Non-parametric Estimation
Parametric probability density estimation involves selecting a
common distribution and estimating the parameters for
the density function from a data sample.
Nonparametric probability density estimation involves using a
technique to fit a model to the arbitrary distribution of the data, like
kernel density estimation
3. • Perhaps the most common nonparametric approach for
estimating the probability density function of a
continuous random variable is called kernel smoothing, or
kernel density estimation, KDE for short.
• Kernel Density Estimation: Nonparametric method for
using a dataset to estimating probabilities for new points.
Kernel Density Estimation (KDE)
4. • The kernel function weights the contribution of observations from a
data sample based on their relationship.
• A parameter, called the smoothing parameter or the bandwidth,
controls the scope, or window of observations, from the data sample
that contributes to estimating the probability for a given sample. As
such, kernel density estimation is sometimes referred to as a Parzen-
Rosenblatt window, or simply a Parzen window, after the developers of
the method.
• Smoothing Parameter (bandwidth): Parameter that controls the
number of samples or window of samples used to estimate the
probability for a new point.
Kernel Density Estimation (KDE)
5. The contribution of samples within the window can be shaped
using different functions, sometimes referred to as basis
functions, e.g. uniform normal, etc., with different effects on
the smoothness of the resulting density function.
Basis Function (kernel): The function chosen used to control
the contribution of samples in the dataset toward estimating
the probability of a new point. Here we consider Gaussian
kernel
Kernel Density Estimation (KDE)
6. Kernel Density Estimation (KDE)
A range of kernel functions are commonly used: uniform, triangular, biweight, triweight,
normal, and others
Let (x1, x2, …, xn) be independent and identically distributed samples drawn from some
univariate distribution with an unknown density ƒ at any given point x.
We are interested in estimating the shape of this function ƒ. Its kernel density
estimator is
where K is the kernel — a non-negative function which integrates to 1 — and h
> 0 is a smoothing parameter called the bandwidth.
7. KDE-Example
Sample 1 2 3 4 5 6
Value -2.1 -1.3 -0.4 1.9 5.1 6.2
Kernel density estimates are closely related to histograms, but can
be endowed with properties such as smoothness or continuity by
using a suitable kernel. The table above contains six data points
For the histogram, first the horizontal axis is divided into sub-intervals or bins which cover
the range of the data: In this case, six bins each of width 2. Whenever a data point falls
inside this interval, a box of height 1/12=0.083 is placed there. If more than one data point
falls inside the same bin, the boxes are stacked on top of each other.
8. Sample 1 2 3 4 5 6
Value -2.1 -1.3 -0.4 1.9 5.1 6.2
Bin size=2 units
So different bins can be (-4 to -2), (-2 to 0), (0 to 2), (2 to 4), (4 to 6),
(6 to 8)
KDE-Example
9. KDE-Example
For the kernel density estimate, normal kernels with standard deviation 2.25
(indicated by the red dashed lines) are placed on each of the data points xi.
The kernels are summed to make the kernel density estimate (solid blue curve).
The smoothness of the kernel density estimate (compared to the discreteness of
the histogram) illustrates how kernel density estimates converge faster to the
true underlying density for continuous random variables.
11. Different Estimators
We can build classifiers when the underlying densities are known
Bayesian Decision Theory introduced the general formulation
In most situations, however, the true distributions are unknown and must be
estimated from data.
Parametric Estimation : Assume a particular form for the density (e.g.
Gaussian), so only the parameters (e.g., mean and variance) need to be
estimated.
Examples:
Maximum Likelihood Estimation (MLE)
Maximum A Posteriori (MAP) Estimation
Bayesian Estimation
Non-parametric Density Estimation: Assume NO knowledge about the
density
Example:
Kernel Density Estimation