SlideShare a Scribd company logo
1 of 32
Download to read offline
Markov Chain Monte Carlo
→ Gibbs Sampling → Metropolis–Hastings
→ Hamiltonian Monte Carlo → Reversible-Jump MCMC
Francesco Casalegno
Francesco Casalegno – Markov Chain Monte Carlo
1. Motivation
2. Basic Principles of MCMC
3. Gibbs Sampling
4. Metropolis–Hastings
5. Hamiltonian Monte Carlo
6. Reversible-Jump Markov Chain Monte Carlo
7. Conclusion
Outline
2
Motivation
3
Francesco Casalegno – Markov Chain Monte Carlo
● “Computing in Science and Engineering” put MCMC among the Top 10 Algorithms of
the 20th
Century, (together with Fast Fourier Transform, Quicksort, Simplex Method, …)
● MCMC methods are used to draw samples from complex distributions: X1
… XM
~ p(x)
○ Why complex distributions? → If not, use Inverse Transform Sampling or Rejection Sampling
→ p(x) highly dimensional, multi-modal, known up to const ...
○ Why Markov Chain? → Samples drawn in a sequence!
○ Why Monte Carlo? → Samples used to approximate pdf or compute mean/variance/...
● OK, but when do we find these complex distributions?
○ Typical scenario: sample from posterior = use observations (y1
… yN
) = Y to make inference on θ
○ So we want to sample θ ~ p(θ|Y) = p(Y|θ) p(θ) / p(Y) ∝ p(Y|θ) p(θ)
■ θ is often highly dimensional
■ The normalization constant, i.e. the evidence p(Y) = p(Y|θ) p( θ) , is computationally intractable
○ MCMC are an essential tool to make inference with Bayesian Networks
● In the next slides we see Bayesian Networks where MCMC can be applied with success!
1. Motivation
4
Francesco Casalegno – Markov Chain Monte Carlo
● Bayesian Networks (aka Belief Network) are powerful models representing variables and
their conditional dependencies in a DAG (directed acyclic graph).
● Notation
● Inference on unobserved variables is done by computing posterior distributions
● Posterior distributions are computed using the following tools
○ Law of Total Probability
○ Chain Rule (aka Product Rule)
○ MCMC methods
1. Motivation
5
observed quantity unobserved variables fixed parameters
Bayes’ Theorem
Francesco Casalegno – Markov Chain Monte Carlo
1. Example: Hierarchical Regression
6
● Observations
○ Different countries c = 1 … 4, different number of samples Nc
○ Each sample is xi
= mother longevity, yi
= child longevity
● Model
○ Linear regression, but samples are given per country...
■ No Pool: treat each country independently, fit 4 independent θc
■ Complete Pool: forget about the country, fit one θ on all together
■ Hierarchical Regression: there are 4 different θc
, but related!
→ Best approach, in particular for countries with few samples (c=3)!
● How can we make inference on θ1
… θC
, μθ
, and σθ
?
○ Note: in Bayesian Nets, all parameters of interest have priors!
Francesco Casalegno – Markov Chain Monte Carlo
● Observations
○ Number of coal mines fatalities yt
during year t = 1900 ... 1960
● Model
○ Number of fatalities follows a Poisson law
○ Fatality rate changed (e.g. new OSH law) at some point
● How can we compute posteriors for ν, λ1
, λ2
?
○ Year ν when the rate changed
○ Fatality rate λ1
before changepoint
○ Fatality rate λ2
after changepoint
1. Example: Mine Fatality (Change-Point Model)
7
Francesco Casalegno – Markov Chain Monte Carlo
1. Example: Latent Dirichlet Allocation
● Observations: words from D documents
● Model
○ Assume there are T topics in total
○ Assume Bag-Of-Words (only word counts matter, not order)
○ φt
distribution of words in topic t ∊ {1 … T}
○ θd
distribution of topics in document d∊ {1 … D}
○ zd,n
topic of word n ∊ {1 … Nd
} within document d∊ {1 … D}
○ wd,n
word appearing at position n ∊ {1 … Nd
} of doc d∊ {1 … D}
● How to automatically discover (infer posterior distribution)
○ topics content in terms of words associated with them?
○ document content in terms of topic distribution? 8
Basic Principles of MCMC
9
Francesco Casalegno – Markov Chain Monte Carlo
2. Basic Principles of MCMC
● As we have seen, to use Bayesian Networks we need to sample from θ ~ p(θ|Y).
But MCMC methods are generic: in the following we just talk about sampling from p.
● MCMC methods build a Markov Chain of samples X1
, X2
, … converging to p. We need:
1. An initial sample x0
2. A simple way to draw a new Xn+1
given Xn
= xn
(i.e. the Markov process)
3. A mathematical proof that, for n large enough, the process generates samples Xn
~ p
We will see how different MCMC methods differ in the way they draw Xn+1
given Xn
= xn
● In this way, we draw X1
, X2
, … ~ p and we then compute Monte Carlo approximations like
And the error in IM
? X1
, X2
, … are correlated, so it is worse than standard Monte Carlo!
M* is known as effective sample size, and ρk
is the k-lag autocorrelation of X1
, X2
, …
10
Francesco Casalegno – Markov Chain Monte Carlo
X1
, X2
, … generated by MCMC
● wait for converge to p: discard burn-in!
● pdf approximation is worse (effective M*!)
● strong autocorrelation of samples
2. Basic Principles of MCMC
11
X1
, X2
, … drawn i.i.d. from p
● looks like noise
● pdf is approximation is quite good
● no autocorrelation
Francesco Casalegno – Markov Chain Monte Carlo
2. Basic Principles of MCMC
● But how do draw a new Xn+1
given Xn
= xn
?
○ There is no single solution, depending on the situation we can use a different MCMC method!
○ Each MCMC method has its own way of drawing a new Xn+1
given Xn
= xn
● In these slides we will present all most important MCMC methods
○ Gibbs Sampling
○ Metropolis–Hastings
○ Hamiltonian Monte Carlo
○ Reversible-Jump MCMC
12
Gibbs Sampling
13
Francesco Casalegno – Markov Chain Monte Carlo
3. Gibbs Sampling
● Gibbs sampling is a MCMC method used to draw from a multivariate distribution p when
1. sampling from the joint distribution p(x) = p(x1
… xD
) is difficult → so we need MCMC!
2. sampling from the (univariate) conditionals p(x1
|x2
… xD
), p(x2
|x1
x3
… xD
), …, p(xD
|x1
… xD-1
) is easy
● Algorithm
→ Choose an initial point x0
and find a way to draw from conditionals (e.g. by inverse sampling)
→ For n=0,...
→ Draw x1
n+1
~ p(x1
n+1
|x2
n
… xD
n
)
→ Draw x2
n+1
~ p(x2
n+1
|x1
n+1
x3
n
… xD
n
)
…
→ Draw xD
n+1
~ p(xD
n+1
|x1
n+1
… xD-1
n+1
)
→ Set xn+1
= (x1
n+1
… xD
n+1
)
14
Francesco Casalegno – Markov Chain Monte Carlo
● Let us sample from a bivariate normal distribution.
3. Gibbs Sampling: Example
15
Francesco Casalegno – Markov Chain Monte Carlo
● The joint posterior probability is
● Direct sampling from these multivariate, mixed
(discrete-continuous) distribution would be too hard!
→ Use Gibbs sampling, draw from conditional posterior!
○ The posterior for λ1
is
○ The posterior for λ2
is
○ The posterior for ν is a with
from which we can draw using Inverse Transform Sampling.
3. Gibbs Sampling: Mine Fatality
16
Francesco Casalegno – Markov Chain Monte Carlo
1. Unlike other MCMC methods, xn+1
is always
accepted as next step (no rejection)
2. Useful to treat very highly dimensional
problems
3. Useful if we have both continuous and
discrete components, to work with fully
discrete/continuous separate conditionals
1. All the conditionals must be known
→ often known only up to normalizing const!
2. Must know how to sample from conditionals
→ if it is hard, sample from conditionals with
another MCMC method such as
“Metropolis-within-Gibbs”
3. If components are strongly correlated, the
Markov chain converges slowly and has
highly auto-correlated samples
17
3. Gibbs Sampling: Pros and Cons
Metropolis–Hastings
18
Francesco Casalegno – Markov Chain Monte Carlo
4. Metropolis–Hastings
● Metropolis–Hastings generates a chain of samples from p by using the following ideas.
○ Draw a new candidate x*n+1
for Xn+1
given Xn
= xn
using some proposal distribution Q(x*n+1
|xn
)
○ Accept the candidate (xn+1
= x*n+1
) with some acceptance prob. A(x*n+1
,xn
), otherwise reject.
● Algorithm
→ Choose an initial point x0
and a proposal distribution Q(x*n+1
|xn
)
→ For n=0,...
→ Draw new candidate x*n+1
~ Q(x*n+1
|xn
)
→ Compute acceptance probability
→ Accept candidate with probability A(x*n+1
,xn
).
If candidate is rejected, go back to draw new candidate.
Note: to compute the acceptance prob. we only need to know p up to a multiplicative const → typical Bayesian posterior!
● How do we choose the proposal distribution Q(x*n+1
|xn
) ?
○ A common choice x*n+1
~ Normal(xn
, σ2
)
○ This is called Random Walk Metropolis, as we can also write x*n+1
= xn
+ ε with ε~Normal(0, σ2
)
○ Large σ2
→ low auto-correlation between samples (“big jumps”), but high rejection rate
○ Small σ2
→ high auto-correlation between samples (“small jumps”), but low rejection rate
○ If we use a symmetric proposal distribution (e.g. Q = Normal), we have
○ This is called Metropolis method, historically invented before Metropolis–Hastings
○ But sometimes we need an asymmetric proposal, e.g. for 1-tailed target distributions (e.g. p = Gamma(α, β))
19
Francesco Casalegno – Markov Chain Monte Carlo
● Let us sample from a bivariate normal distribution using a Normal proposal distribution.
4. Metropolis–Hastings: Example
20
Francesco Casalegno – Markov Chain Monte Carlo
4. Metropolis–Hastings: Covariation Model
21
● The posterior distribution is
which does not look like anything familiar.
● Using Inverse Transform or Rejection Sampling would
be difficult in this case. So we use Metropolis-Hastings.
○ p(ρ|y1:N
) is known up to a constant, and that is OK
○ Q(ρ*n+1
|ρn
) cannot be a Normal as our domain is bounded
→ Cannot use Random Walk Metropolis!
○ Use Q(ρ*n+1
|ρn
) = TruncatedNormal-1
+1
: asymmetric proposal
→ Pdf of TruncatedNormal-1
+1
will appear in acceptance prob!
Francesco Casalegno – Markov Chain Monte Carlo
1. Works also if we know p only up to a
multiplicative constant
○ Can sample from Bayesian posterior
w/o calculating
2. Can be used within Gibbs sampling:
○ Gibbs splits the joint into conditionals
○ Sample x1
n+1
~ p(x1
n+1
|x2
n
), x2
n+1
~ p(x2
n+1
|x1
n+1
)
using Metropolis-Hastings
3. Can be used when it is not practical to
derive all conditional posteriors
1. Choice of the best proposal distribution?
2. Choice of variance of proposal distribution?
○ too small → high autocorrelation
○ too large → high rejection rate
22
4. Metropolis–Hastings: Pros and Cons
Hamiltonian Monte Carlo
23
Francesco Casalegno – Markov Chain Monte Carlo
5. Hamiltonian Monte Carlo
24
● Hamiltonian Monte Carlo has two advantages with respect to other MCMC methods
○ Little or no autocorrelation of samples
○ Fast mix-in, i.e. the chain immediately converges to distribution p
● Hamiltonian Monte Carlo is based on the Hamiltonian (total energy) H(x, v) = U(x) + K(v)
○ Imagine a ball in a space with potential energy U(x) = - log p(x) and put the ball in initial position xn
○ Give the ball an initial random velocity v ~ q and define its kinetic energy K(v) = - log q(v)
○ Compute the trajectory for a time T, then take the final position: x(T) = xn+1
● Algorithm
→ Choose an initial point x0
and a velocity distribution q(v)
→ For n=0,...
→ Set the initial position to x(t=0) = xn
→ Draw a new random initial velocity v(t=0) ~ q(v)
→ Numerical integrate the trajectory with total energy H(x, v) = -log p(x) - log q(v) for a time T
→ Set xn+1
= x(t=T)
● How do we choose the distribution for the velocity q(v) ?
○ A common choice is v~Normal(0, Σ) so that the kinetic energy reads K(v) = ½ vT
Σ-1
v
○ If we have an understanding of p(x) we can choose Σ in a smart way, otherwise just set Σ = σ2
I
Francesco Casalegno – Markov Chain Monte Carlo
● In a system with energy H(x, v) = U(x) + K(v), position x and velocity v evolve according to
● In most cases these equations cannot be solved exactly, so we use a numerical scheme
○ Choose a discrete time step τ
○ Compute numerical solution using Leapfrog Method (or another symplectic method)
○ Energy H(x, v) = U(x) + K(v) should be preserved over time, but we use a numerical discretization…
○ Symplectic methods are good because they preserve H(x, v) up to O(τs
) , with s=2 for Leapfrog Method
○ When using numerical methods to compute trajectories, accept xn+1
= x(t=T) with acceptance probability
Notice that H(xn+1
, vn+1
) = H(xn
, vn
) + O(τs
) so the acceptance probability is ≈ 1 for τ small enough.
5. Hamiltonian Monte Carlo: Trajectories
25
Francesco Casalegno – Markov Chain Monte Carlo
● Let us sample from a bivariate normal distribution.
5. Hamiltonian Monte Carlo: Example
26
Francesco Casalegno – Markov Chain Monte Carlo
1. Best method for continuous distributions
2. Samples have almost 0 autocorrelation
3. Only requires to know only p up to a const.
4. Can be extended to have velocity
depending on the location, q = q(v|x), but
than K = K(x, v)
1. Choice of symplectic integrator and τ?
○ τ too small → slow integration
○ τ too large → higher rejection rate
→ adaptive methods automatically choose τ
2. Choice of q(v) ?
If q(v) = Normal(0, Σ), choice of Σ?
3. Choice of integration time T?
○ T too small → may have correlation
○ T too large → Hamiltonian trajectories
are closed, so time waste
→ NUTS method automatically chooses T!
4. Must evaluate derivatives p’(x) and q’(v)
5. Works only for continuous distributions 27
5. Hamiltonian Monte Carlo: Pros and Cons
Reversible-Jump MCMC
28
Francesco Casalegno – Markov Chain Monte Carlo
● Reversible-Jump MCMC extends MCMC methods to the case where the variables space
has unknown/variable number of dimensions.
○ Hierarchical Regression. In the example we fit lines, i.e. we used θ ∊ ℝ2
. We could also decide to
use polynomials of another degree k, so that θ ∊ ℝk+1
→ how do we choose k?
○ Change-Point Model. In the example we assumed that the
mine fatality rate was changing at some point.
We could also assume that the rate changed k times, so we
need inference on the change-points ν1
… νk
as well as on the
rates λ1
… λk+1
→ how do we choose k?
● Reversible-Jump MCMC is a powerful
method for model selection!
○ Also works for multiple
hyper-parameters k1
… km
6. Reversible-Jump MCMC
29
Francesco Casalegno – Markov Chain Monte Carlo
● Consider the meta-space where k is the
model index, and dk
is the dimension of that space
○ k is treated as just another variable in the meta-space
○ For our change-point model, k = n. of change points, dk
= 2k+1
● How do we jump from dimension dk
to dk’
?
○ Sample an extra random variable u ~ Q(u)
○ If dk
< dk’
it is called “birth” — If dk
> dk’
it is called “death”
● Algorithm
a. draw jump u ~ Q(u)
b. compute proposal xn+1
*
= g(xn
, u)
c. compute reverse jump u*
s.t. xn
= g(xn+1
*
, u*
)
d. accept proposal with acceptance probability
6. Reversible-Jump MCMC
30
Conclusions
31
Francesco Casalegno – Markov Chain Monte Carlo
Conclusions
1. Bayesian Networks are a powerful tool of Machine Learning and Statistical Modelling.
2. Bayesian Networks use MCMC to sample from computationally intractable posteriors.
3. Gibbs Sampling reduces reduces drawing from hard joint posterior into easy conditionals
4. Metropolis-Hastings is useful when posterior has no closed form/is known up to const.
5. Hamiltonian Monte Carlo is best choice for continuous case: low correlation, low rejection
6. Reversible-Jump MCMC is an extension used when n. of parameters is unknown/variable
32

More Related Content

What's hot

Introduction to MCMC methods
Introduction to MCMC methodsIntroduction to MCMC methods
Introduction to MCMC methodsChristian Robert
 
Restricted Boltzman Machine (RBM) presentation of fundamental theory
Restricted Boltzman Machine (RBM) presentation of fundamental theoryRestricted Boltzman Machine (RBM) presentation of fundamental theory
Restricted Boltzman Machine (RBM) presentation of fundamental theorySeongwon Hwang
 
Feature Selection in Machine Learning
Feature Selection in Machine LearningFeature Selection in Machine Learning
Feature Selection in Machine LearningUpekha Vandebona
 
Generative Adversarial Network (GAN)
Generative Adversarial Network (GAN)Generative Adversarial Network (GAN)
Generative Adversarial Network (GAN)Prakhar Rastogi
 
Gradient descent method
Gradient descent methodGradient descent method
Gradient descent methodSanghyuk Chun
 
Markov chain and its Application
Markov chain and its Application Markov chain and its Application
Markov chain and its Application Tilakpoudel2
 
DBSCAN : A Clustering Algorithm
DBSCAN : A Clustering AlgorithmDBSCAN : A Clustering Algorithm
DBSCAN : A Clustering AlgorithmPınar Yahşi
 
K mean-clustering algorithm
K mean-clustering algorithmK mean-clustering algorithm
K mean-clustering algorithmparry prabhu
 
Unsupervised learning
Unsupervised learningUnsupervised learning
Unsupervised learningamalalhait
 
Reinforcement Learning Q-Learning
Reinforcement Learning   Q-Learning Reinforcement Learning   Q-Learning
Reinforcement Learning Q-Learning Melaku Eneayehu
 
Back propagation
Back propagationBack propagation
Back propagationNagarajan
 
Recurrent Neural Networks, LSTM and GRU
Recurrent Neural Networks, LSTM and GRURecurrent Neural Networks, LSTM and GRU
Recurrent Neural Networks, LSTM and GRUananth
 
Linear regression
Linear regressionLinear regression
Linear regressionMartinHogg9
 
Lecture 7: Hidden Markov Models (HMMs)
Lecture 7: Hidden Markov Models (HMMs)Lecture 7: Hidden Markov Models (HMMs)
Lecture 7: Hidden Markov Models (HMMs)Marina Santini
 
What is the Expectation Maximization (EM) Algorithm?
What is the Expectation Maximization (EM) Algorithm?What is the Expectation Maximization (EM) Algorithm?
What is the Expectation Maximization (EM) Algorithm?Kazuki Yoshida
 
Lecture 6
Lecture 6Lecture 6
Lecture 6hunglq
 
Density Based Clustering
Density Based ClusteringDensity Based Clustering
Density Based ClusteringSSA KPI
 

What's hot (20)

Introduction to MCMC methods
Introduction to MCMC methodsIntroduction to MCMC methods
Introduction to MCMC methods
 
Restricted Boltzman Machine (RBM) presentation of fundamental theory
Restricted Boltzman Machine (RBM) presentation of fundamental theoryRestricted Boltzman Machine (RBM) presentation of fundamental theory
Restricted Boltzman Machine (RBM) presentation of fundamental theory
 
Feature Selection in Machine Learning
Feature Selection in Machine LearningFeature Selection in Machine Learning
Feature Selection in Machine Learning
 
Generative Adversarial Network (GAN)
Generative Adversarial Network (GAN)Generative Adversarial Network (GAN)
Generative Adversarial Network (GAN)
 
Gradient descent method
Gradient descent methodGradient descent method
Gradient descent method
 
Markov chain and its Application
Markov chain and its Application Markov chain and its Application
Markov chain and its Application
 
DBSCAN : A Clustering Algorithm
DBSCAN : A Clustering AlgorithmDBSCAN : A Clustering Algorithm
DBSCAN : A Clustering Algorithm
 
CSC446: Pattern Recognition (LN4)
CSC446: Pattern Recognition (LN4)CSC446: Pattern Recognition (LN4)
CSC446: Pattern Recognition (LN4)
 
K mean-clustering algorithm
K mean-clustering algorithmK mean-clustering algorithm
K mean-clustering algorithm
 
Unsupervised learning
Unsupervised learningUnsupervised learning
Unsupervised learning
 
Pseudo Random Number
Pseudo Random NumberPseudo Random Number
Pseudo Random Number
 
Reinforcement Learning Q-Learning
Reinforcement Learning   Q-Learning Reinforcement Learning   Q-Learning
Reinforcement Learning Q-Learning
 
Back propagation
Back propagationBack propagation
Back propagation
 
Recurrent Neural Networks, LSTM and GRU
Recurrent Neural Networks, LSTM and GRURecurrent Neural Networks, LSTM and GRU
Recurrent Neural Networks, LSTM and GRU
 
GMM
GMMGMM
GMM
 
Linear regression
Linear regressionLinear regression
Linear regression
 
Lecture 7: Hidden Markov Models (HMMs)
Lecture 7: Hidden Markov Models (HMMs)Lecture 7: Hidden Markov Models (HMMs)
Lecture 7: Hidden Markov Models (HMMs)
 
What is the Expectation Maximization (EM) Algorithm?
What is the Expectation Maximization (EM) Algorithm?What is the Expectation Maximization (EM) Algorithm?
What is the Expectation Maximization (EM) Algorithm?
 
Lecture 6
Lecture 6Lecture 6
Lecture 6
 
Density Based Clustering
Density Based ClusteringDensity Based Clustering
Density Based Clustering
 

Similar to Markov Chain Monte Carlo Methods Explained

Monte Carlo Berkeley.pptx
Monte Carlo Berkeley.pptxMonte Carlo Berkeley.pptx
Monte Carlo Berkeley.pptxHaibinSu2
 
Sampling and Markov Chain Monte Carlo Techniques
Sampling and Markov Chain Monte Carlo TechniquesSampling and Markov Chain Monte Carlo Techniques
Sampling and Markov Chain Monte Carlo TechniquesTomasz Kusmierczyk
 
Firefly exact MCMC for Big Data
Firefly exact MCMC for Big DataFirefly exact MCMC for Big Data
Firefly exact MCMC for Big DataGianvito Siciliano
 
ORMR_Monte Carlo Method.pdf
ORMR_Monte Carlo Method.pdfORMR_Monte Carlo Method.pdf
ORMR_Monte Carlo Method.pdfSanjayBalu7
 
Metropolis-Hastings MCMC Short Tutorial
Metropolis-Hastings MCMC Short TutorialMetropolis-Hastings MCMC Short Tutorial
Metropolis-Hastings MCMC Short TutorialRalph Schlosser
 
Stratified sampling and resampling for approximate Bayesian computation
Stratified sampling and resampling for approximate Bayesian computationStratified sampling and resampling for approximate Bayesian computation
Stratified sampling and resampling for approximate Bayesian computationUmberto Picchini
 
PixelCNN, Wavenet, Normalizing Flows - Santiago Pascual - UPC Barcelona 2018
PixelCNN, Wavenet, Normalizing Flows - Santiago Pascual - UPC Barcelona 2018PixelCNN, Wavenet, Normalizing Flows - Santiago Pascual - UPC Barcelona 2018
PixelCNN, Wavenet, Normalizing Flows - Santiago Pascual - UPC Barcelona 2018Universitat Politècnica de Catalunya
 
Markov chain Monte Carlo methods and some attempts at parallelizing them
Markov chain Monte Carlo methods and some attempts at parallelizing themMarkov chain Monte Carlo methods and some attempts at parallelizing them
Markov chain Monte Carlo methods and some attempts at parallelizing themPierre Jacob
 
2012 mdsp pr04 monte carlo
2012 mdsp pr04 monte carlo2012 mdsp pr04 monte carlo
2012 mdsp pr04 monte carlonozomuhamada
 
Lecture: Monte Carlo Methods
Lecture: Monte Carlo MethodsLecture: Monte Carlo Methods
Lecture: Monte Carlo MethodsFrank Kienle
 
머피의 머신러닝: 17장 Markov Chain and HMM
머피의 머신러닝: 17장  Markov Chain and HMM머피의 머신러닝: 17장  Markov Chain and HMM
머피의 머신러닝: 17장 Markov Chain and HMMJungkyu Lee
 
Forced repetitions over alphabet lists
Forced repetitions over alphabet listsForced repetitions over alphabet lists
Forced repetitions over alphabet listsMichael Soltys
 
Unbiased Markov chain Monte Carlo
Unbiased Markov chain Monte CarloUnbiased Markov chain Monte Carlo
Unbiased Markov chain Monte CarloJeremyHeng10
 

Similar to Markov Chain Monte Carlo Methods Explained (20)

Monte Carlo Berkeley.pptx
Monte Carlo Berkeley.pptxMonte Carlo Berkeley.pptx
Monte Carlo Berkeley.pptx
 
Sampling and Markov Chain Monte Carlo Techniques
Sampling and Markov Chain Monte Carlo TechniquesSampling and Markov Chain Monte Carlo Techniques
Sampling and Markov Chain Monte Carlo Techniques
 
A bit about мcmc
A bit about мcmcA bit about мcmc
A bit about мcmc
 
Firefly exact MCMC for Big Data
Firefly exact MCMC for Big DataFirefly exact MCMC for Big Data
Firefly exact MCMC for Big Data
 
ORMR_Monte Carlo Method.pdf
ORMR_Monte Carlo Method.pdfORMR_Monte Carlo Method.pdf
ORMR_Monte Carlo Method.pdf
 
Recurrent Instance Segmentation (UPC Reading Group)
Recurrent Instance Segmentation (UPC Reading Group)Recurrent Instance Segmentation (UPC Reading Group)
Recurrent Instance Segmentation (UPC Reading Group)
 
QMC Program: Trends and Advances in Monte Carlo Sampling Algorithms Workshop,...
QMC Program: Trends and Advances in Monte Carlo Sampling Algorithms Workshop,...QMC Program: Trends and Advances in Monte Carlo Sampling Algorithms Workshop,...
QMC Program: Trends and Advances in Monte Carlo Sampling Algorithms Workshop,...
 
Metropolis-Hastings MCMC Short Tutorial
Metropolis-Hastings MCMC Short TutorialMetropolis-Hastings MCMC Short Tutorial
Metropolis-Hastings MCMC Short Tutorial
 
intro
introintro
intro
 
Stratified sampling and resampling for approximate Bayesian computation
Stratified sampling and resampling for approximate Bayesian computationStratified sampling and resampling for approximate Bayesian computation
Stratified sampling and resampling for approximate Bayesian computation
 
PixelCNN, Wavenet, Normalizing Flows - Santiago Pascual - UPC Barcelona 2018
PixelCNN, Wavenet, Normalizing Flows - Santiago Pascual - UPC Barcelona 2018PixelCNN, Wavenet, Normalizing Flows - Santiago Pascual - UPC Barcelona 2018
PixelCNN, Wavenet, Normalizing Flows - Santiago Pascual - UPC Barcelona 2018
 
Markov chain Monte Carlo methods and some attempts at parallelizing them
Markov chain Monte Carlo methods and some attempts at parallelizing themMarkov chain Monte Carlo methods and some attempts at parallelizing them
Markov chain Monte Carlo methods and some attempts at parallelizing them
 
2012 mdsp pr04 monte carlo
2012 mdsp pr04 monte carlo2012 mdsp pr04 monte carlo
2012 mdsp pr04 monte carlo
 
17_monte_carlo.pdf
17_monte_carlo.pdf17_monte_carlo.pdf
17_monte_carlo.pdf
 
Lecture: Monte Carlo Methods
Lecture: Monte Carlo MethodsLecture: Monte Carlo Methods
Lecture: Monte Carlo Methods
 
머피의 머신러닝: 17장 Markov Chain and HMM
머피의 머신러닝: 17장  Markov Chain and HMM머피의 머신러닝: 17장  Markov Chain and HMM
머피의 머신러닝: 17장 Markov Chain and HMM
 
Finding Dense Subgraphs
Finding Dense SubgraphsFinding Dense Subgraphs
Finding Dense Subgraphs
 
Forced repetitions over alphabet lists
Forced repetitions over alphabet listsForced repetitions over alphabet lists
Forced repetitions over alphabet lists
 
Subquad multi ff
Subquad multi ffSubquad multi ff
Subquad multi ff
 
Unbiased Markov chain Monte Carlo
Unbiased Markov chain Monte CarloUnbiased Markov chain Monte Carlo
Unbiased Markov chain Monte Carlo
 

More from Francesco Casalegno

DVC - Git-like Data Version Control for Machine Learning projects
DVC - Git-like Data Version Control for Machine Learning projectsDVC - Git-like Data Version Control for Machine Learning projects
DVC - Git-like Data Version Control for Machine Learning projectsFrancesco Casalegno
 
Ordinal Regression and Machine Learning: Applications, Methods, Metrics
Ordinal Regression and Machine Learning: Applications, Methods, MetricsOrdinal Regression and Machine Learning: Applications, Methods, Metrics
Ordinal Regression and Machine Learning: Applications, Methods, MetricsFrancesco Casalegno
 
Hyperparameter Optimization for Machine Learning
Hyperparameter Optimization for Machine LearningHyperparameter Optimization for Machine Learning
Hyperparameter Optimization for Machine LearningFrancesco Casalegno
 
Confidence Intervals––Exact Intervals, Jackknife, and Bootstrap
Confidence Intervals––Exact Intervals, Jackknife, and BootstrapConfidence Intervals––Exact Intervals, Jackknife, and Bootstrap
Confidence Intervals––Exact Intervals, Jackknife, and BootstrapFrancesco Casalegno
 
[C++] The Curiously Recurring Template Pattern: Static Polymorphsim and Expre...
[C++] The Curiously Recurring Template Pattern: Static Polymorphsim and Expre...[C++] The Curiously Recurring Template Pattern: Static Polymorphsim and Expre...
[C++] The Curiously Recurring Template Pattern: Static Polymorphsim and Expre...Francesco Casalegno
 
C++11: Rvalue References, Move Semantics, Perfect Forwarding
C++11: Rvalue References, Move Semantics, Perfect ForwardingC++11: Rvalue References, Move Semantics, Perfect Forwarding
C++11: Rvalue References, Move Semantics, Perfect ForwardingFrancesco Casalegno
 

More from Francesco Casalegno (8)

DVC - Git-like Data Version Control for Machine Learning projects
DVC - Git-like Data Version Control for Machine Learning projectsDVC - Git-like Data Version Control for Machine Learning projects
DVC - Git-like Data Version Control for Machine Learning projects
 
Ordinal Regression and Machine Learning: Applications, Methods, Metrics
Ordinal Regression and Machine Learning: Applications, Methods, MetricsOrdinal Regression and Machine Learning: Applications, Methods, Metrics
Ordinal Regression and Machine Learning: Applications, Methods, Metrics
 
Recommender Systems
Recommender SystemsRecommender Systems
Recommender Systems
 
Hyperparameter Optimization for Machine Learning
Hyperparameter Optimization for Machine LearningHyperparameter Optimization for Machine Learning
Hyperparameter Optimization for Machine Learning
 
Confidence Intervals––Exact Intervals, Jackknife, and Bootstrap
Confidence Intervals––Exact Intervals, Jackknife, and BootstrapConfidence Intervals––Exact Intervals, Jackknife, and Bootstrap
Confidence Intervals––Exact Intervals, Jackknife, and Bootstrap
 
Smart Pointers in C++
Smart Pointers in C++Smart Pointers in C++
Smart Pointers in C++
 
[C++] The Curiously Recurring Template Pattern: Static Polymorphsim and Expre...
[C++] The Curiously Recurring Template Pattern: Static Polymorphsim and Expre...[C++] The Curiously Recurring Template Pattern: Static Polymorphsim and Expre...
[C++] The Curiously Recurring Template Pattern: Static Polymorphsim and Expre...
 
C++11: Rvalue References, Move Semantics, Perfect Forwarding
C++11: Rvalue References, Move Semantics, Perfect ForwardingC++11: Rvalue References, Move Semantics, Perfect Forwarding
C++11: Rvalue References, Move Semantics, Perfect Forwarding
 

Recently uploaded

Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfsudhanshuwaghmare1
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Scriptwesley chun
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationMichael W. Hawkins
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptxHampshireHUG
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Igalia
 
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxFactors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxKatpro Technologies
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Miguel Araújo
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking MenDelhi Call girls
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsMaria Levchenko
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slidespraypatel2
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)wesley chun
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking MenDelhi Call girls
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking MenDelhi Call girls
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...apidays
 
Advantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessAdvantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessPixlogix Infotech
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Servicegiselly40
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEarley Information Science
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CVKhem
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...Neo4j
 

Recently uploaded (20)

Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
 
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxFactors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slides
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
Advantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessAdvantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your Business
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Service
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CV
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
 

Markov Chain Monte Carlo Methods Explained

  • 1. Markov Chain Monte Carlo → Gibbs Sampling → Metropolis–Hastings → Hamiltonian Monte Carlo → Reversible-Jump MCMC Francesco Casalegno
  • 2. Francesco Casalegno – Markov Chain Monte Carlo 1. Motivation 2. Basic Principles of MCMC 3. Gibbs Sampling 4. Metropolis–Hastings 5. Hamiltonian Monte Carlo 6. Reversible-Jump Markov Chain Monte Carlo 7. Conclusion Outline 2
  • 4. Francesco Casalegno – Markov Chain Monte Carlo ● “Computing in Science and Engineering” put MCMC among the Top 10 Algorithms of the 20th Century, (together with Fast Fourier Transform, Quicksort, Simplex Method, …) ● MCMC methods are used to draw samples from complex distributions: X1 … XM ~ p(x) ○ Why complex distributions? → If not, use Inverse Transform Sampling or Rejection Sampling → p(x) highly dimensional, multi-modal, known up to const ... ○ Why Markov Chain? → Samples drawn in a sequence! ○ Why Monte Carlo? → Samples used to approximate pdf or compute mean/variance/... ● OK, but when do we find these complex distributions? ○ Typical scenario: sample from posterior = use observations (y1 … yN ) = Y to make inference on θ ○ So we want to sample θ ~ p(θ|Y) = p(Y|θ) p(θ) / p(Y) ∝ p(Y|θ) p(θ) ■ θ is often highly dimensional ■ The normalization constant, i.e. the evidence p(Y) = p(Y|θ) p( θ) , is computationally intractable ○ MCMC are an essential tool to make inference with Bayesian Networks ● In the next slides we see Bayesian Networks where MCMC can be applied with success! 1. Motivation 4
  • 5. Francesco Casalegno – Markov Chain Monte Carlo ● Bayesian Networks (aka Belief Network) are powerful models representing variables and their conditional dependencies in a DAG (directed acyclic graph). ● Notation ● Inference on unobserved variables is done by computing posterior distributions ● Posterior distributions are computed using the following tools ○ Law of Total Probability ○ Chain Rule (aka Product Rule) ○ MCMC methods 1. Motivation 5 observed quantity unobserved variables fixed parameters Bayes’ Theorem
  • 6. Francesco Casalegno – Markov Chain Monte Carlo 1. Example: Hierarchical Regression 6 ● Observations ○ Different countries c = 1 … 4, different number of samples Nc ○ Each sample is xi = mother longevity, yi = child longevity ● Model ○ Linear regression, but samples are given per country... ■ No Pool: treat each country independently, fit 4 independent θc ■ Complete Pool: forget about the country, fit one θ on all together ■ Hierarchical Regression: there are 4 different θc , but related! → Best approach, in particular for countries with few samples (c=3)! ● How can we make inference on θ1 … θC , μθ , and σθ ? ○ Note: in Bayesian Nets, all parameters of interest have priors!
  • 7. Francesco Casalegno – Markov Chain Monte Carlo ● Observations ○ Number of coal mines fatalities yt during year t = 1900 ... 1960 ● Model ○ Number of fatalities follows a Poisson law ○ Fatality rate changed (e.g. new OSH law) at some point ● How can we compute posteriors for ν, λ1 , λ2 ? ○ Year ν when the rate changed ○ Fatality rate λ1 before changepoint ○ Fatality rate λ2 after changepoint 1. Example: Mine Fatality (Change-Point Model) 7
  • 8. Francesco Casalegno – Markov Chain Monte Carlo 1. Example: Latent Dirichlet Allocation ● Observations: words from D documents ● Model ○ Assume there are T topics in total ○ Assume Bag-Of-Words (only word counts matter, not order) ○ φt distribution of words in topic t ∊ {1 … T} ○ θd distribution of topics in document d∊ {1 … D} ○ zd,n topic of word n ∊ {1 … Nd } within document d∊ {1 … D} ○ wd,n word appearing at position n ∊ {1 … Nd } of doc d∊ {1 … D} ● How to automatically discover (infer posterior distribution) ○ topics content in terms of words associated with them? ○ document content in terms of topic distribution? 8
  • 10. Francesco Casalegno – Markov Chain Monte Carlo 2. Basic Principles of MCMC ● As we have seen, to use Bayesian Networks we need to sample from θ ~ p(θ|Y). But MCMC methods are generic: in the following we just talk about sampling from p. ● MCMC methods build a Markov Chain of samples X1 , X2 , … converging to p. We need: 1. An initial sample x0 2. A simple way to draw a new Xn+1 given Xn = xn (i.e. the Markov process) 3. A mathematical proof that, for n large enough, the process generates samples Xn ~ p We will see how different MCMC methods differ in the way they draw Xn+1 given Xn = xn ● In this way, we draw X1 , X2 , … ~ p and we then compute Monte Carlo approximations like And the error in IM ? X1 , X2 , … are correlated, so it is worse than standard Monte Carlo! M* is known as effective sample size, and ρk is the k-lag autocorrelation of X1 , X2 , … 10
  • 11. Francesco Casalegno – Markov Chain Monte Carlo X1 , X2 , … generated by MCMC ● wait for converge to p: discard burn-in! ● pdf approximation is worse (effective M*!) ● strong autocorrelation of samples 2. Basic Principles of MCMC 11 X1 , X2 , … drawn i.i.d. from p ● looks like noise ● pdf is approximation is quite good ● no autocorrelation
  • 12. Francesco Casalegno – Markov Chain Monte Carlo 2. Basic Principles of MCMC ● But how do draw a new Xn+1 given Xn = xn ? ○ There is no single solution, depending on the situation we can use a different MCMC method! ○ Each MCMC method has its own way of drawing a new Xn+1 given Xn = xn ● In these slides we will present all most important MCMC methods ○ Gibbs Sampling ○ Metropolis–Hastings ○ Hamiltonian Monte Carlo ○ Reversible-Jump MCMC 12
  • 14. Francesco Casalegno – Markov Chain Monte Carlo 3. Gibbs Sampling ● Gibbs sampling is a MCMC method used to draw from a multivariate distribution p when 1. sampling from the joint distribution p(x) = p(x1 … xD ) is difficult → so we need MCMC! 2. sampling from the (univariate) conditionals p(x1 |x2 … xD ), p(x2 |x1 x3 … xD ), …, p(xD |x1 … xD-1 ) is easy ● Algorithm → Choose an initial point x0 and find a way to draw from conditionals (e.g. by inverse sampling) → For n=0,... → Draw x1 n+1 ~ p(x1 n+1 |x2 n … xD n ) → Draw x2 n+1 ~ p(x2 n+1 |x1 n+1 x3 n … xD n ) … → Draw xD n+1 ~ p(xD n+1 |x1 n+1 … xD-1 n+1 ) → Set xn+1 = (x1 n+1 … xD n+1 ) 14
  • 15. Francesco Casalegno – Markov Chain Monte Carlo ● Let us sample from a bivariate normal distribution. 3. Gibbs Sampling: Example 15
  • 16. Francesco Casalegno – Markov Chain Monte Carlo ● The joint posterior probability is ● Direct sampling from these multivariate, mixed (discrete-continuous) distribution would be too hard! → Use Gibbs sampling, draw from conditional posterior! ○ The posterior for λ1 is ○ The posterior for λ2 is ○ The posterior for ν is a with from which we can draw using Inverse Transform Sampling. 3. Gibbs Sampling: Mine Fatality 16
  • 17. Francesco Casalegno – Markov Chain Monte Carlo 1. Unlike other MCMC methods, xn+1 is always accepted as next step (no rejection) 2. Useful to treat very highly dimensional problems 3. Useful if we have both continuous and discrete components, to work with fully discrete/continuous separate conditionals 1. All the conditionals must be known → often known only up to normalizing const! 2. Must know how to sample from conditionals → if it is hard, sample from conditionals with another MCMC method such as “Metropolis-within-Gibbs” 3. If components are strongly correlated, the Markov chain converges slowly and has highly auto-correlated samples 17 3. Gibbs Sampling: Pros and Cons
  • 19. Francesco Casalegno – Markov Chain Monte Carlo 4. Metropolis–Hastings ● Metropolis–Hastings generates a chain of samples from p by using the following ideas. ○ Draw a new candidate x*n+1 for Xn+1 given Xn = xn using some proposal distribution Q(x*n+1 |xn ) ○ Accept the candidate (xn+1 = x*n+1 ) with some acceptance prob. A(x*n+1 ,xn ), otherwise reject. ● Algorithm → Choose an initial point x0 and a proposal distribution Q(x*n+1 |xn ) → For n=0,... → Draw new candidate x*n+1 ~ Q(x*n+1 |xn ) → Compute acceptance probability → Accept candidate with probability A(x*n+1 ,xn ). If candidate is rejected, go back to draw new candidate. Note: to compute the acceptance prob. we only need to know p up to a multiplicative const → typical Bayesian posterior! ● How do we choose the proposal distribution Q(x*n+1 |xn ) ? ○ A common choice x*n+1 ~ Normal(xn , σ2 ) ○ This is called Random Walk Metropolis, as we can also write x*n+1 = xn + ε with ε~Normal(0, σ2 ) ○ Large σ2 → low auto-correlation between samples (“big jumps”), but high rejection rate ○ Small σ2 → high auto-correlation between samples (“small jumps”), but low rejection rate ○ If we use a symmetric proposal distribution (e.g. Q = Normal), we have ○ This is called Metropolis method, historically invented before Metropolis–Hastings ○ But sometimes we need an asymmetric proposal, e.g. for 1-tailed target distributions (e.g. p = Gamma(α, β)) 19
  • 20. Francesco Casalegno – Markov Chain Monte Carlo ● Let us sample from a bivariate normal distribution using a Normal proposal distribution. 4. Metropolis–Hastings: Example 20
  • 21. Francesco Casalegno – Markov Chain Monte Carlo 4. Metropolis–Hastings: Covariation Model 21 ● The posterior distribution is which does not look like anything familiar. ● Using Inverse Transform or Rejection Sampling would be difficult in this case. So we use Metropolis-Hastings. ○ p(ρ|y1:N ) is known up to a constant, and that is OK ○ Q(ρ*n+1 |ρn ) cannot be a Normal as our domain is bounded → Cannot use Random Walk Metropolis! ○ Use Q(ρ*n+1 |ρn ) = TruncatedNormal-1 +1 : asymmetric proposal → Pdf of TruncatedNormal-1 +1 will appear in acceptance prob!
  • 22. Francesco Casalegno – Markov Chain Monte Carlo 1. Works also if we know p only up to a multiplicative constant ○ Can sample from Bayesian posterior w/o calculating 2. Can be used within Gibbs sampling: ○ Gibbs splits the joint into conditionals ○ Sample x1 n+1 ~ p(x1 n+1 |x2 n ), x2 n+1 ~ p(x2 n+1 |x1 n+1 ) using Metropolis-Hastings 3. Can be used when it is not practical to derive all conditional posteriors 1. Choice of the best proposal distribution? 2. Choice of variance of proposal distribution? ○ too small → high autocorrelation ○ too large → high rejection rate 22 4. Metropolis–Hastings: Pros and Cons
  • 24. Francesco Casalegno – Markov Chain Monte Carlo 5. Hamiltonian Monte Carlo 24 ● Hamiltonian Monte Carlo has two advantages with respect to other MCMC methods ○ Little or no autocorrelation of samples ○ Fast mix-in, i.e. the chain immediately converges to distribution p ● Hamiltonian Monte Carlo is based on the Hamiltonian (total energy) H(x, v) = U(x) + K(v) ○ Imagine a ball in a space with potential energy U(x) = - log p(x) and put the ball in initial position xn ○ Give the ball an initial random velocity v ~ q and define its kinetic energy K(v) = - log q(v) ○ Compute the trajectory for a time T, then take the final position: x(T) = xn+1 ● Algorithm → Choose an initial point x0 and a velocity distribution q(v) → For n=0,... → Set the initial position to x(t=0) = xn → Draw a new random initial velocity v(t=0) ~ q(v) → Numerical integrate the trajectory with total energy H(x, v) = -log p(x) - log q(v) for a time T → Set xn+1 = x(t=T) ● How do we choose the distribution for the velocity q(v) ? ○ A common choice is v~Normal(0, Σ) so that the kinetic energy reads K(v) = ½ vT Σ-1 v ○ If we have an understanding of p(x) we can choose Σ in a smart way, otherwise just set Σ = σ2 I
  • 25. Francesco Casalegno – Markov Chain Monte Carlo ● In a system with energy H(x, v) = U(x) + K(v), position x and velocity v evolve according to ● In most cases these equations cannot be solved exactly, so we use a numerical scheme ○ Choose a discrete time step τ ○ Compute numerical solution using Leapfrog Method (or another symplectic method) ○ Energy H(x, v) = U(x) + K(v) should be preserved over time, but we use a numerical discretization… ○ Symplectic methods are good because they preserve H(x, v) up to O(τs ) , with s=2 for Leapfrog Method ○ When using numerical methods to compute trajectories, accept xn+1 = x(t=T) with acceptance probability Notice that H(xn+1 , vn+1 ) = H(xn , vn ) + O(τs ) so the acceptance probability is ≈ 1 for τ small enough. 5. Hamiltonian Monte Carlo: Trajectories 25
  • 26. Francesco Casalegno – Markov Chain Monte Carlo ● Let us sample from a bivariate normal distribution. 5. Hamiltonian Monte Carlo: Example 26
  • 27. Francesco Casalegno – Markov Chain Monte Carlo 1. Best method for continuous distributions 2. Samples have almost 0 autocorrelation 3. Only requires to know only p up to a const. 4. Can be extended to have velocity depending on the location, q = q(v|x), but than K = K(x, v) 1. Choice of symplectic integrator and τ? ○ τ too small → slow integration ○ τ too large → higher rejection rate → adaptive methods automatically choose τ 2. Choice of q(v) ? If q(v) = Normal(0, Σ), choice of Σ? 3. Choice of integration time T? ○ T too small → may have correlation ○ T too large → Hamiltonian trajectories are closed, so time waste → NUTS method automatically chooses T! 4. Must evaluate derivatives p’(x) and q’(v) 5. Works only for continuous distributions 27 5. Hamiltonian Monte Carlo: Pros and Cons
  • 29. Francesco Casalegno – Markov Chain Monte Carlo ● Reversible-Jump MCMC extends MCMC methods to the case where the variables space has unknown/variable number of dimensions. ○ Hierarchical Regression. In the example we fit lines, i.e. we used θ ∊ ℝ2 . We could also decide to use polynomials of another degree k, so that θ ∊ ℝk+1 → how do we choose k? ○ Change-Point Model. In the example we assumed that the mine fatality rate was changing at some point. We could also assume that the rate changed k times, so we need inference on the change-points ν1 … νk as well as on the rates λ1 … λk+1 → how do we choose k? ● Reversible-Jump MCMC is a powerful method for model selection! ○ Also works for multiple hyper-parameters k1 … km 6. Reversible-Jump MCMC 29
  • 30. Francesco Casalegno – Markov Chain Monte Carlo ● Consider the meta-space where k is the model index, and dk is the dimension of that space ○ k is treated as just another variable in the meta-space ○ For our change-point model, k = n. of change points, dk = 2k+1 ● How do we jump from dimension dk to dk’ ? ○ Sample an extra random variable u ~ Q(u) ○ If dk < dk’ it is called “birth” — If dk > dk’ it is called “death” ● Algorithm a. draw jump u ~ Q(u) b. compute proposal xn+1 * = g(xn , u) c. compute reverse jump u* s.t. xn = g(xn+1 * , u* ) d. accept proposal with acceptance probability 6. Reversible-Jump MCMC 30
  • 32. Francesco Casalegno – Markov Chain Monte Carlo Conclusions 1. Bayesian Networks are a powerful tool of Machine Learning and Statistical Modelling. 2. Bayesian Networks use MCMC to sample from computationally intractable posteriors. 3. Gibbs Sampling reduces reduces drawing from hard joint posterior into easy conditionals 4. Metropolis-Hastings is useful when posterior has no closed form/is known up to const. 5. Hamiltonian Monte Carlo is best choice for continuous case: low correlation, low rejection 6. Reversible-Jump MCMC is an extension used when n. of parameters is unknown/variable 32