SlideShare a Scribd company logo
1 of 43
Introduction to Probabilistic
Programming Languages (PPL)
Anton Andreev 14/02/2018
Ingénieur d'études CNRS
gipsa-lab
int function add(int a, int b)
return a + b
add(3, 2)
5
Deterministic program is a very precise model -
the same input always produces the same output
Something simple
gipsa-lab
Deterministic programs are not interesting
because they always give the same result
gipsa-lab
Some statistics
• probabilistic (stochastic) model/program is the
opposite of deterministic program
• stochastic process/random process - represents the
evolution of some system of random values over
time (again opposite of deterministic process)
• programs – if, else, for, while (usually line by line)
• distribution – gives the probability that a random
variable is exactly equal to some value
 distributions have parameters
gipsa-lab
Motivation
Probabilistic Models:
• incredibly powerful (Machine learning/AI)
• the tools for creating are:
 a complete mess
 incredibly heterogeneous (Math, English,
Diagrams, Pictures)
• bigger models get really hard to write down
gipsa-lab
What is PPL (1)
http://probabilistic-programming.org
Probabilistic programming languages simplify the
development of probabilistic models by allowing
programmers to specify a stochastic process using
syntax used in general purpose programs.
Probabilistic programs generate samples from the
modeled joint distribution and inference is performed
automatically given the specification (the model).
gipsa-lab
Input
Program
Data
(provided)
Generative
Model
X
Parameters
Program
(random variables)
Observations
Output
What is PPL? (2)
Computer science Statistics Probabilistic Programming
gipsa-lab
What is PPL? (3)
• We would like to construct a
model in a way similar to a
computer program
• The model is built to generate
the observations
• A built-in inference engine
takes the observations and
returns the distributions (over
the settings) of the
parameters that could have
generated the observations
• The built-in inference engine
is part of the “compiler”.
Parameters
Program
(random variables)
Observations
gipsa-lab
Clear separation between model
and inference algorithmes
Built-in inference engine
Program
(probabalistic
model)
Execution
+
Inference
algorithms
Provided by compiler
gipsa-lab
Bayes net (or Bayesian network)
TB=t
0.1
flu=t
0.2
TB flu Cough=t
t t 0.9
t f 0.8
f t 0.75
f f 0.1
flu Sneeze=t
t 0.8
f 0.2
TB flu
cough sneeze
gipsa-lab
Bayes net
• Probabilistic graphical model (directed and
acyclic)
• Represents a set of random variables
• Shows the conditional dependencies between
the random variables
• Representation of a distribution
gipsa-lab
Same Bayes net converted to PPL (Church)
(define samples
(mh-query 100 100
(define TB (flip 0.1)) ;not a fixed constant value
(define flu (flip 0.2))
(define cough (or (and TB (flip 0.33)) (and flu (flip 0.54))))
(define sneeze (and flu (flip 0.8)))
TB ;query (what is the probability of tuberculosis)
(and cough flu) ;conditions
)
)
(hist samples "chances of TB")
gipsa-lab
Objectives of PPL
• To benefit from automatic inference over models
 new inference methods have been developed
 computers are powerful enough
• Generative model as code
 more intuitive
 simplification - less math, lower technical
barrier for development of new models
 models can be shared and stored in public
repositories (just like code)
 faster development of cognitive models can
boost AI research
gipsa-lab
List of PPLs (over 20)
• Church – extends Scheme(Lisp) with probabilistic semantics
• Figaro – integrated with Scala, runs on the JVM (Java Virtual
Machine)
• Probabilistic C#
• Anglican – integrated with Clojure language, runs on JVM
• Infer.net – integrated with C# , runs on .NET, developed by
Microsoft Research, provides many examples
• WebPPL – from the creators of Church, Java script based
• Stan
• BUGS
• Pyro – Uber AI Labs (deep learning + bayesian modeling)
• https://eng.uber.com/pyro
gipsa-lab
Classification of PPLs
• PPL can be:
• new language
• Host language + library
• Time to learn host language?
• Slow IDE, slow compiler, slow execution
• Good documentation? Support forum?
• Easy to incorporate in commercial project?
• License restricted?
gipsa-lab
Church PPL
• Named after Alonzo Church
• Designed for expressive description of generative
models
• Based on functional programming (Scheme)
• Can be executed in the browser
• Every computable distribution can be represented by
Church
• Web-site: http://projects.csail.mit.edu/church/wiki/Church
• Interactive tutorial book: https://v1.probmods.org
gipsa-lab
“Hello world” in Church (1)
Sampling example
;All comments are green, “flip” is primitive that give us a 50%/50% T/F
(define A (if (flip) 1 0))
(define B (if (flip) 1 0))
(define C (if (flip) 1 0))
(define D (+ A B C))
D ;we ask for a possible value when summing A, B and C just one time
Result: 2
• “2” is just one sample - one of 4 possible answers (0,1,2,3)
• We are simply running the evaluation process “forward” (i.e.
simulating the process)
• This is a probabilistic program
gipsa-lab
“Hello world” in Church
Sampling example (2)
(define (take-sample)
(define A (if (flip) 1 0))
(define B (if (flip) 1 0))
(define C (if (flip) 1 0))
(define D (+ A B C))
D
)
(hist (repeat 100 take-sample))
gipsa-lab
Two execution strategies
Forward chaining Backward inference
PPL program
(Church)
Samples
PPL program
(Church)
Observations
write a distribution ask a question
gipsa-lab
Queries template
(query ;church primitive
generative-model ;some defines to build our model
what-we-want-to-ask ;select the random variable that we are interested in
what-we-know) ;give a list of conditions/observations
gipsa-lab
Example of “rejection-query”
(define (take-sample) ;name of our program/function
(rejection-query ;implemented for us using rejection sampling
(define A (if (flip) 1 0))
(define B (if (flip) 1 0))
(define C (if (flip) 1 0))
(define D (+ A B C))
A ;the random variable of interest
(condition (equal? D 3)))) ;constraints to our model
(hist (repeat 100 take-sample) "Value of A, given that D is 3")
gipsa-lab
Example of “mh-query”
(define samples
(mh-query ;we ask/search/infer for something
100 100 ;number of samples ; lag
;we define our model
(define A (if (flip) 1 0))
(define B (if (flip) 1 0))
(define C (if (flip) 1 0))
A ;the random variable of interest
(condition (>= (+ A B C) 2)))) ;constraints to our model
(hist samples "Value of A, given that the sum is greater than or equal to 2")
gipsa-lab
Explaining away
TB=t
0.1
flu=t
0.2
TB flu Cough=t
t t 0.9
t f 0.8
f t 0.75
f f 0.1
flu Sneeze=t
t 0.8
f 0.2
TB flu
cough sneeze
P(TB) = 0.1
P(TB|flu) = 0.1
P(TB|cough) = 0.293 ~ 30%
P(TB|cough,flu) = 0.128 ~ 13%
gipsa-lab
Cognitive example (1)
Learning about coins
A friend gives you a coin and you observe a certain amount of consecutive
heads. Question is: is it a fair or trick coin?
• Does 5 x H is normal?
• Does 10 x H looks suspicious?
• What about after 15 x H?
Our model:
Let’s consider only two hypotheses:
• fair coin
• trick coin that produces heads 95% of the time
The prior probability of seeing a trick coin is 1 in a 1000, versus 999 in 1000 for
a fair coin.
gipsa-lab
Cognitive example (2)
Learning about coins
Model
Question/query: Is it a fair coin?
A priori
information Observations H x 15
gipsa-lab
Cognitive example (3)
Learning about coins
(define observed-data '(h h h h h)) ;configuring the observations
(define num-flips (length observed-data))
(define samples
(mh-query
1000 10
(define fair-prior 0.999) ;setting the a priori information
(define fair-coin? (flip fair-prior))
(define make-coin (lambda (weight) (lambda () (if (flip weight) 'h 't)))) ;we apply the a priori information
(define coin (make-coin (if fair-coin? 0.5 0.95)))
fair-coin? ;query
(equal? observed-data (repeat num-flips coin)))) ;we set the observed data as conditions for the query
(hist samples "Fair coin?")
gipsa-lab
Cognitive example (4)
Learning about coins
1/1000 is fair
H x 5
1/1000 is fair
H x 10
50% is fair
H x 5
gipsa-lab
Example – Hidden Markov model (1)
Components of HMM:
• A – state transition function
• B – state to observation transition function
• Initialization
gipsa-lab
Example – Hidden Markov model (2)
(define states '(s1 s2 s3 s4 s5 s6 s7 s8 stop)) ;list of hidden states
(define vocabulary '(chef omelet soup eat work bake)) ;list of possible observations
(define state->observation-model ;generate observation transition probabilities (B)
(mem (lambda (state) (dirichlet (make-list (length vocabulary) 1)))))
(define (observation state) ;use B
(multinomial vocabulary (state->observation-model state)))
(define state->transition-model ; generate the state transition probabilities (A)
(mem (lambda (state) (dirichlet (make-list (length states) 1)))))
(define (transition state) ;use A
(multinomial states (state->transition-model state)))
(define (sample-words last-state) ;returns the next observation using the state and observation models
(if (equal? last-state 'stop)
'() (pair (observation last-state) (sample-words (transition last-state)))))
(sample-words 'start) ;generate a list of observations
Possible output: (work omelet omelet work work soup) ;possible observation sequence
gipsa-lab
More examples in Church
https://v1.probmods.org
• Probabilistic Context-free Grammars (PCFG)
• Goal inference
• Communication and Language
• Planning
• Learning a shared prototype
• One-shot learning of visual categories
• Mixture models
• Categorical Perception of Speech Sounds
gipsa-lab
Classical example
gipsa-lab
One example encoded in 3 PPL
• Infer.Net
• Figaro
• Probabilistic C#
gipsa-lab
Code remarks 1:
• Model positive Bernulli(0.7)
• Two types of variables:
• Variable<bool> sprinkler;
• bool IsSprinklerOn
• Bernoulli (Infer.Net/Prob c#) = Flip (Figaro)
• Discrete vs Continuous
• bool/int vs double
• Sampling the continuous normal distribution for int
value might not be trivial
gipsa-lab
Code remarks 2:
var SprinklerDist = from c in CloudyDist
from sd in BernoulliF(Prob(c ? 0.1 : 0.5))
select sd;
Not the same as:
var SprinklerDist = from c in CloudyDist
select BernoulliF(Prob(c ? 0.1 : 0.5)).ToSampleDist().Sample();
gipsa-lab
Code remarks 3:
How do we set evidence?
We set the variables:
WetGrass.observe(true) (Figaro/Scala)
wetGrass.ObservedValue = true; (Infer.Net/C#)
We construct a new model in Probabilistic C#:
FiniteDist<SprinklerModel> givenGrassWet =
sprinklerModel.ConditionHard
(e => e.GrassWet = true);
gipsa-lab
Different Inference Algorithms
• Factored Inference Algorithms:
• Variable Elimination (VE)
• Belief Propagation (BP)
• Sampling Algorithms:
• Rejection sampling
• Importance Sampling (type of Forward Sampling)
• Markov chain Monte Carlo sampling
gipsa-lab
Image Reconstruction
T F
T
T F
T
Probability a single pixel is powered is 0.4. It is twice as likely
two pixels to be both powered on or power off. T and F is the
evidence.
gipsa-lab
Most probable explanation (MPE)
• select an algorithm
• for each pixel we calculate the:
• algorithm.mostLikelyValue(pixels(i)(j))
• considering the:
• A priori information (0.4)
• The rule of two pixels (2 times)
• The T and F evidence
gipsa-lab
Probability of Evidence (PoE)
• Model a normal system
• Each time you provide
• Current state (evidence)
• Query (prob. of evidence?)
• val prob =
ProbEvidenceSampler.computeProbEvidence(evide
nce,1000)
• If the probability is too low (we use a threshold)
• we raise an alert (as in anomaly detection)
• we can express surprise: “Oh, I thought you
would be tired”
gipsa-lab
PPL + Natural language Processing (NLP)
• PPL are a framework for working with uncertain data
• Human responses are also uncertain
Example: How do you feel today?
var GoodMood = Flip(X)
Terrible A little bit ill Awesome
So so
XX X = 0.1 X = 0.3 X = 0.5 X = 0.9
gipsa-lab
Microsoft Infer.net - a probabilistic programming language
• TrueSkill® matchmaking system for Xbox LIVE
It ranks gamers by starting with a standard distribution for
new players, and then updating it as the player wins or
loses games.
• Predict Click-Through Rates used on Bing
To optimize user experience, search engine revenue, and
advertiser revenue, the search engine needs to display the
results that the user is most likely to click
Real-world examples from the industry
More on: http://research.microsoft.com/en-us/um/cambridge/projects/infernet
gipsa-lab
Sources/Citations
• Dr. Noah Goodman*, Assistant Professor Linguistics
and Computer Science, Stanford university
• Dr. Frank Wood*, Associate Professor, Dept. of
Engineering Science, University of Oxford
• A Revealing Introduction to Hidden Markov Models,
Mark Stamp
gipsa-lab
Links
• http://probabilistic-programming.org
• http://v1.probmods.org
• https://www.cra.com/work/case-studies/figaro
• http://infernet.azurewebsites.net
• https://github.com/joashc/csharp-probability-
monad

More Related Content

Similar to Introduction to Probabilistic Programming Languages (PPL)

fb69b412-97cb-4e8d-8a28-574c09557d35-160618025920
fb69b412-97cb-4e8d-8a28-574c09557d35-160618025920fb69b412-97cb-4e8d-8a28-574c09557d35-160618025920
fb69b412-97cb-4e8d-8a28-574c09557d35-160618025920
Karl Rudeen
 
original
originaloriginal
original
butest
 
Chapter09.ppt
Chapter09.pptChapter09.ppt
Chapter09.ppt
butest
 

Similar to Introduction to Probabilistic Programming Languages (PPL) (20)

Machine learning mathematicals.pdf
Machine learning mathematicals.pdfMachine learning mathematicals.pdf
Machine learning mathematicals.pdf
 
fb69b412-97cb-4e8d-8a28-574c09557d35-160618025920
fb69b412-97cb-4e8d-8a28-574c09557d35-160618025920fb69b412-97cb-4e8d-8a28-574c09557d35-160618025920
fb69b412-97cb-4e8d-8a28-574c09557d35-160618025920
 
Project Paper
Project PaperProject Paper
Project Paper
 
Cuckoo Search Algorithm: An Introduction
Cuckoo Search Algorithm: An IntroductionCuckoo Search Algorithm: An Introduction
Cuckoo Search Algorithm: An Introduction
 
An introduction to Bayesian Statistics using Python
An introduction to Bayesian Statistics using PythonAn introduction to Bayesian Statistics using Python
An introduction to Bayesian Statistics using Python
 
Model Selection and Validation
Model Selection and ValidationModel Selection and Validation
Model Selection and Validation
 
Week08.pdf
Week08.pdfWeek08.pdf
Week08.pdf
 
Using Simulation to Investigate Requirements Prioritization Strategies
Using Simulation to Investigate Requirements Prioritization StrategiesUsing Simulation to Investigate Requirements Prioritization Strategies
Using Simulation to Investigate Requirements Prioritization Strategies
 
Python Programming Homework Help.pptx
Python Programming Homework Help.pptxPython Programming Homework Help.pptx
Python Programming Homework Help.pptx
 
Large-scale real-time analytics for everyone
Large-scale real-time analytics for everyoneLarge-scale real-time analytics for everyone
Large-scale real-time analytics for everyone
 
PS
PSPS
PS
 
Good functional programming is good programming
Good functional programming is good programmingGood functional programming is good programming
Good functional programming is good programming
 
tutorial.ppt
tutorial.ppttutorial.ppt
tutorial.ppt
 
original
originaloriginal
original
 
Abstract machines for great good
Abstract machines for great goodAbstract machines for great good
Abstract machines for great good
 
Advanced Econometrics L5-6.pptx
Advanced Econometrics L5-6.pptxAdvanced Econometrics L5-6.pptx
Advanced Econometrics L5-6.pptx
 
C2.0 propositional logic
C2.0 propositional logicC2.0 propositional logic
C2.0 propositional logic
 
An Introduction to RevBayes and Graphical Models
An Introduction to RevBayes and Graphical ModelsAn Introduction to RevBayes and Graphical Models
An Introduction to RevBayes and Graphical Models
 
Ry pyconjp2015 turtle
Ry pyconjp2015 turtleRy pyconjp2015 turtle
Ry pyconjp2015 turtle
 
Chapter09.ppt
Chapter09.pptChapter09.ppt
Chapter09.ppt
 

Recently uploaded

Structural Analysis and Design of Foundations: A Comprehensive Handbook for S...
Structural Analysis and Design of Foundations: A Comprehensive Handbook for S...Structural Analysis and Design of Foundations: A Comprehensive Handbook for S...
Structural Analysis and Design of Foundations: A Comprehensive Handbook for S...
Dr.Costas Sachpazis
 
result management system report for college project
result management system report for college projectresult management system report for college project
result management system report for college project
Tonystark477637
 
AKTU Computer Networks notes --- Unit 3.pdf
AKTU Computer Networks notes ---  Unit 3.pdfAKTU Computer Networks notes ---  Unit 3.pdf
AKTU Computer Networks notes --- Unit 3.pdf
ankushspencer015
 
Call for Papers - Educational Administration: Theory and Practice, E-ISSN: 21...
Call for Papers - Educational Administration: Theory and Practice, E-ISSN: 21...Call for Papers - Educational Administration: Theory and Practice, E-ISSN: 21...
Call for Papers - Educational Administration: Theory and Practice, E-ISSN: 21...
Christo Ananth
 

Recently uploaded (20)

Porous Ceramics seminar and technical writing
Porous Ceramics seminar and technical writingPorous Ceramics seminar and technical writing
Porous Ceramics seminar and technical writing
 
MANUFACTURING PROCESS-II UNIT-2 LATHE MACHINE
MANUFACTURING PROCESS-II UNIT-2 LATHE MACHINEMANUFACTURING PROCESS-II UNIT-2 LATHE MACHINE
MANUFACTURING PROCESS-II UNIT-2 LATHE MACHINE
 
Structural Analysis and Design of Foundations: A Comprehensive Handbook for S...
Structural Analysis and Design of Foundations: A Comprehensive Handbook for S...Structural Analysis and Design of Foundations: A Comprehensive Handbook for S...
Structural Analysis and Design of Foundations: A Comprehensive Handbook for S...
 
result management system report for college project
result management system report for college projectresult management system report for college project
result management system report for college project
 
AKTU Computer Networks notes --- Unit 3.pdf
AKTU Computer Networks notes ---  Unit 3.pdfAKTU Computer Networks notes ---  Unit 3.pdf
AKTU Computer Networks notes --- Unit 3.pdf
 
Introduction to IEEE STANDARDS and its different types.pptx
Introduction to IEEE STANDARDS and its different types.pptxIntroduction to IEEE STANDARDS and its different types.pptx
Introduction to IEEE STANDARDS and its different types.pptx
 
High Profile Call Girls Nagpur Meera Call 7001035870 Meet With Nagpur Escorts
High Profile Call Girls Nagpur Meera Call 7001035870 Meet With Nagpur EscortsHigh Profile Call Girls Nagpur Meera Call 7001035870 Meet With Nagpur Escorts
High Profile Call Girls Nagpur Meera Call 7001035870 Meet With Nagpur Escorts
 
Call for Papers - Educational Administration: Theory and Practice, E-ISSN: 21...
Call for Papers - Educational Administration: Theory and Practice, E-ISSN: 21...Call for Papers - Educational Administration: Theory and Practice, E-ISSN: 21...
Call for Papers - Educational Administration: Theory and Practice, E-ISSN: 21...
 
UNIT-III FMM. DIMENSIONAL ANALYSIS
UNIT-III FMM.        DIMENSIONAL ANALYSISUNIT-III FMM.        DIMENSIONAL ANALYSIS
UNIT-III FMM. DIMENSIONAL ANALYSIS
 
The Most Attractive Pune Call Girls Budhwar Peth 8250192130 Will You Miss Thi...
The Most Attractive Pune Call Girls Budhwar Peth 8250192130 Will You Miss Thi...The Most Attractive Pune Call Girls Budhwar Peth 8250192130 Will You Miss Thi...
The Most Attractive Pune Call Girls Budhwar Peth 8250192130 Will You Miss Thi...
 
Booking open Available Pune Call Girls Pargaon 6297143586 Call Hot Indian Gi...
Booking open Available Pune Call Girls Pargaon  6297143586 Call Hot Indian Gi...Booking open Available Pune Call Girls Pargaon  6297143586 Call Hot Indian Gi...
Booking open Available Pune Call Girls Pargaon 6297143586 Call Hot Indian Gi...
 
Sheet Pile Wall Design and Construction: A Practical Guide for Civil Engineer...
Sheet Pile Wall Design and Construction: A Practical Guide for Civil Engineer...Sheet Pile Wall Design and Construction: A Practical Guide for Civil Engineer...
Sheet Pile Wall Design and Construction: A Practical Guide for Civil Engineer...
 
Java Programming :Event Handling(Types of Events)
Java Programming :Event Handling(Types of Events)Java Programming :Event Handling(Types of Events)
Java Programming :Event Handling(Types of Events)
 
KubeKraft presentation @CloudNativeHooghly
KubeKraft presentation @CloudNativeHooghlyKubeKraft presentation @CloudNativeHooghly
KubeKraft presentation @CloudNativeHooghly
 
ONLINE FOOD ORDER SYSTEM PROJECT REPORT.pdf
ONLINE FOOD ORDER SYSTEM PROJECT REPORT.pdfONLINE FOOD ORDER SYSTEM PROJECT REPORT.pdf
ONLINE FOOD ORDER SYSTEM PROJECT REPORT.pdf
 
Roadmap to Membership of RICS - Pathways and Routes
Roadmap to Membership of RICS - Pathways and RoutesRoadmap to Membership of RICS - Pathways and Routes
Roadmap to Membership of RICS - Pathways and Routes
 
Water Industry Process Automation & Control Monthly - April 2024
Water Industry Process Automation & Control Monthly - April 2024Water Industry Process Automation & Control Monthly - April 2024
Water Industry Process Automation & Control Monthly - April 2024
 
CCS335 _ Neural Networks and Deep Learning Laboratory_Lab Complete Record
CCS335 _ Neural Networks and Deep Learning Laboratory_Lab Complete RecordCCS335 _ Neural Networks and Deep Learning Laboratory_Lab Complete Record
CCS335 _ Neural Networks and Deep Learning Laboratory_Lab Complete Record
 
(ANJALI) Dange Chowk Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(ANJALI) Dange Chowk Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...(ANJALI) Dange Chowk Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(ANJALI) Dange Chowk Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
 
College Call Girls Nashik Nehal 7001305949 Independent Escort Service Nashik
College Call Girls Nashik Nehal 7001305949 Independent Escort Service NashikCollege Call Girls Nashik Nehal 7001305949 Independent Escort Service Nashik
College Call Girls Nashik Nehal 7001305949 Independent Escort Service Nashik
 

Introduction to Probabilistic Programming Languages (PPL)

  • 1. Introduction to Probabilistic Programming Languages (PPL) Anton Andreev 14/02/2018 Ingénieur d'études CNRS
  • 2. gipsa-lab int function add(int a, int b) return a + b add(3, 2) 5 Deterministic program is a very precise model - the same input always produces the same output Something simple
  • 3. gipsa-lab Deterministic programs are not interesting because they always give the same result
  • 4. gipsa-lab Some statistics • probabilistic (stochastic) model/program is the opposite of deterministic program • stochastic process/random process - represents the evolution of some system of random values over time (again opposite of deterministic process) • programs – if, else, for, while (usually line by line) • distribution – gives the probability that a random variable is exactly equal to some value  distributions have parameters
  • 5. gipsa-lab Motivation Probabilistic Models: • incredibly powerful (Machine learning/AI) • the tools for creating are:  a complete mess  incredibly heterogeneous (Math, English, Diagrams, Pictures) • bigger models get really hard to write down
  • 6. gipsa-lab What is PPL (1) http://probabilistic-programming.org Probabilistic programming languages simplify the development of probabilistic models by allowing programmers to specify a stochastic process using syntax used in general purpose programs. Probabilistic programs generate samples from the modeled joint distribution and inference is performed automatically given the specification (the model).
  • 8. gipsa-lab What is PPL? (3) • We would like to construct a model in a way similar to a computer program • The model is built to generate the observations • A built-in inference engine takes the observations and returns the distributions (over the settings) of the parameters that could have generated the observations • The built-in inference engine is part of the “compiler”. Parameters Program (random variables) Observations
  • 9. gipsa-lab Clear separation between model and inference algorithmes Built-in inference engine Program (probabalistic model) Execution + Inference algorithms Provided by compiler
  • 10. gipsa-lab Bayes net (or Bayesian network) TB=t 0.1 flu=t 0.2 TB flu Cough=t t t 0.9 t f 0.8 f t 0.75 f f 0.1 flu Sneeze=t t 0.8 f 0.2 TB flu cough sneeze
  • 11. gipsa-lab Bayes net • Probabilistic graphical model (directed and acyclic) • Represents a set of random variables • Shows the conditional dependencies between the random variables • Representation of a distribution
  • 12. gipsa-lab Same Bayes net converted to PPL (Church) (define samples (mh-query 100 100 (define TB (flip 0.1)) ;not a fixed constant value (define flu (flip 0.2)) (define cough (or (and TB (flip 0.33)) (and flu (flip 0.54)))) (define sneeze (and flu (flip 0.8))) TB ;query (what is the probability of tuberculosis) (and cough flu) ;conditions ) ) (hist samples "chances of TB")
  • 13. gipsa-lab Objectives of PPL • To benefit from automatic inference over models  new inference methods have been developed  computers are powerful enough • Generative model as code  more intuitive  simplification - less math, lower technical barrier for development of new models  models can be shared and stored in public repositories (just like code)  faster development of cognitive models can boost AI research
  • 14. gipsa-lab List of PPLs (over 20) • Church – extends Scheme(Lisp) with probabilistic semantics • Figaro – integrated with Scala, runs on the JVM (Java Virtual Machine) • Probabilistic C# • Anglican – integrated with Clojure language, runs on JVM • Infer.net – integrated with C# , runs on .NET, developed by Microsoft Research, provides many examples • WebPPL – from the creators of Church, Java script based • Stan • BUGS • Pyro – Uber AI Labs (deep learning + bayesian modeling) • https://eng.uber.com/pyro
  • 15. gipsa-lab Classification of PPLs • PPL can be: • new language • Host language + library • Time to learn host language? • Slow IDE, slow compiler, slow execution • Good documentation? Support forum? • Easy to incorporate in commercial project? • License restricted?
  • 16. gipsa-lab Church PPL • Named after Alonzo Church • Designed for expressive description of generative models • Based on functional programming (Scheme) • Can be executed in the browser • Every computable distribution can be represented by Church • Web-site: http://projects.csail.mit.edu/church/wiki/Church • Interactive tutorial book: https://v1.probmods.org
  • 17. gipsa-lab “Hello world” in Church (1) Sampling example ;All comments are green, “flip” is primitive that give us a 50%/50% T/F (define A (if (flip) 1 0)) (define B (if (flip) 1 0)) (define C (if (flip) 1 0)) (define D (+ A B C)) D ;we ask for a possible value when summing A, B and C just one time Result: 2 • “2” is just one sample - one of 4 possible answers (0,1,2,3) • We are simply running the evaluation process “forward” (i.e. simulating the process) • This is a probabilistic program
  • 18. gipsa-lab “Hello world” in Church Sampling example (2) (define (take-sample) (define A (if (flip) 1 0)) (define B (if (flip) 1 0)) (define C (if (flip) 1 0)) (define D (+ A B C)) D ) (hist (repeat 100 take-sample))
  • 19. gipsa-lab Two execution strategies Forward chaining Backward inference PPL program (Church) Samples PPL program (Church) Observations write a distribution ask a question
  • 20. gipsa-lab Queries template (query ;church primitive generative-model ;some defines to build our model what-we-want-to-ask ;select the random variable that we are interested in what-we-know) ;give a list of conditions/observations
  • 21. gipsa-lab Example of “rejection-query” (define (take-sample) ;name of our program/function (rejection-query ;implemented for us using rejection sampling (define A (if (flip) 1 0)) (define B (if (flip) 1 0)) (define C (if (flip) 1 0)) (define D (+ A B C)) A ;the random variable of interest (condition (equal? D 3)))) ;constraints to our model (hist (repeat 100 take-sample) "Value of A, given that D is 3")
  • 22. gipsa-lab Example of “mh-query” (define samples (mh-query ;we ask/search/infer for something 100 100 ;number of samples ; lag ;we define our model (define A (if (flip) 1 0)) (define B (if (flip) 1 0)) (define C (if (flip) 1 0)) A ;the random variable of interest (condition (>= (+ A B C) 2)))) ;constraints to our model (hist samples "Value of A, given that the sum is greater than or equal to 2")
  • 23. gipsa-lab Explaining away TB=t 0.1 flu=t 0.2 TB flu Cough=t t t 0.9 t f 0.8 f t 0.75 f f 0.1 flu Sneeze=t t 0.8 f 0.2 TB flu cough sneeze P(TB) = 0.1 P(TB|flu) = 0.1 P(TB|cough) = 0.293 ~ 30% P(TB|cough,flu) = 0.128 ~ 13%
  • 24. gipsa-lab Cognitive example (1) Learning about coins A friend gives you a coin and you observe a certain amount of consecutive heads. Question is: is it a fair or trick coin? • Does 5 x H is normal? • Does 10 x H looks suspicious? • What about after 15 x H? Our model: Let’s consider only two hypotheses: • fair coin • trick coin that produces heads 95% of the time The prior probability of seeing a trick coin is 1 in a 1000, versus 999 in 1000 for a fair coin.
  • 25. gipsa-lab Cognitive example (2) Learning about coins Model Question/query: Is it a fair coin? A priori information Observations H x 15
  • 26. gipsa-lab Cognitive example (3) Learning about coins (define observed-data '(h h h h h)) ;configuring the observations (define num-flips (length observed-data)) (define samples (mh-query 1000 10 (define fair-prior 0.999) ;setting the a priori information (define fair-coin? (flip fair-prior)) (define make-coin (lambda (weight) (lambda () (if (flip weight) 'h 't)))) ;we apply the a priori information (define coin (make-coin (if fair-coin? 0.5 0.95))) fair-coin? ;query (equal? observed-data (repeat num-flips coin)))) ;we set the observed data as conditions for the query (hist samples "Fair coin?")
  • 27. gipsa-lab Cognitive example (4) Learning about coins 1/1000 is fair H x 5 1/1000 is fair H x 10 50% is fair H x 5
  • 28. gipsa-lab Example – Hidden Markov model (1) Components of HMM: • A – state transition function • B – state to observation transition function • Initialization
  • 29. gipsa-lab Example – Hidden Markov model (2) (define states '(s1 s2 s3 s4 s5 s6 s7 s8 stop)) ;list of hidden states (define vocabulary '(chef omelet soup eat work bake)) ;list of possible observations (define state->observation-model ;generate observation transition probabilities (B) (mem (lambda (state) (dirichlet (make-list (length vocabulary) 1))))) (define (observation state) ;use B (multinomial vocabulary (state->observation-model state))) (define state->transition-model ; generate the state transition probabilities (A) (mem (lambda (state) (dirichlet (make-list (length states) 1))))) (define (transition state) ;use A (multinomial states (state->transition-model state))) (define (sample-words last-state) ;returns the next observation using the state and observation models (if (equal? last-state 'stop) '() (pair (observation last-state) (sample-words (transition last-state))))) (sample-words 'start) ;generate a list of observations Possible output: (work omelet omelet work work soup) ;possible observation sequence
  • 30. gipsa-lab More examples in Church https://v1.probmods.org • Probabilistic Context-free Grammars (PCFG) • Goal inference • Communication and Language • Planning • Learning a shared prototype • One-shot learning of visual categories • Mixture models • Categorical Perception of Speech Sounds
  • 32. gipsa-lab One example encoded in 3 PPL • Infer.Net • Figaro • Probabilistic C#
  • 33. gipsa-lab Code remarks 1: • Model positive Bernulli(0.7) • Two types of variables: • Variable<bool> sprinkler; • bool IsSprinklerOn • Bernoulli (Infer.Net/Prob c#) = Flip (Figaro) • Discrete vs Continuous • bool/int vs double • Sampling the continuous normal distribution for int value might not be trivial
  • 34. gipsa-lab Code remarks 2: var SprinklerDist = from c in CloudyDist from sd in BernoulliF(Prob(c ? 0.1 : 0.5)) select sd; Not the same as: var SprinklerDist = from c in CloudyDist select BernoulliF(Prob(c ? 0.1 : 0.5)).ToSampleDist().Sample();
  • 35. gipsa-lab Code remarks 3: How do we set evidence? We set the variables: WetGrass.observe(true) (Figaro/Scala) wetGrass.ObservedValue = true; (Infer.Net/C#) We construct a new model in Probabilistic C#: FiniteDist<SprinklerModel> givenGrassWet = sprinklerModel.ConditionHard (e => e.GrassWet = true);
  • 36. gipsa-lab Different Inference Algorithms • Factored Inference Algorithms: • Variable Elimination (VE) • Belief Propagation (BP) • Sampling Algorithms: • Rejection sampling • Importance Sampling (type of Forward Sampling) • Markov chain Monte Carlo sampling
  • 37. gipsa-lab Image Reconstruction T F T T F T Probability a single pixel is powered is 0.4. It is twice as likely two pixels to be both powered on or power off. T and F is the evidence.
  • 38. gipsa-lab Most probable explanation (MPE) • select an algorithm • for each pixel we calculate the: • algorithm.mostLikelyValue(pixels(i)(j)) • considering the: • A priori information (0.4) • The rule of two pixels (2 times) • The T and F evidence
  • 39. gipsa-lab Probability of Evidence (PoE) • Model a normal system • Each time you provide • Current state (evidence) • Query (prob. of evidence?) • val prob = ProbEvidenceSampler.computeProbEvidence(evide nce,1000) • If the probability is too low (we use a threshold) • we raise an alert (as in anomaly detection) • we can express surprise: “Oh, I thought you would be tired”
  • 40. gipsa-lab PPL + Natural language Processing (NLP) • PPL are a framework for working with uncertain data • Human responses are also uncertain Example: How do you feel today? var GoodMood = Flip(X) Terrible A little bit ill Awesome So so XX X = 0.1 X = 0.3 X = 0.5 X = 0.9
  • 41. gipsa-lab Microsoft Infer.net - a probabilistic programming language • TrueSkill® matchmaking system for Xbox LIVE It ranks gamers by starting with a standard distribution for new players, and then updating it as the player wins or loses games. • Predict Click-Through Rates used on Bing To optimize user experience, search engine revenue, and advertiser revenue, the search engine needs to display the results that the user is most likely to click Real-world examples from the industry More on: http://research.microsoft.com/en-us/um/cambridge/projects/infernet
  • 42. gipsa-lab Sources/Citations • Dr. Noah Goodman*, Assistant Professor Linguistics and Computer Science, Stanford university • Dr. Frank Wood*, Associate Professor, Dept. of Engineering Science, University of Oxford • A Revealing Introduction to Hidden Markov Models, Mark Stamp
  • 43. gipsa-lab Links • http://probabilistic-programming.org • http://v1.probmods.org • https://www.cra.com/work/case-studies/figaro • http://infernet.azurewebsites.net • https://github.com/joashc/csharp-probability- monad

Editor's Notes

  1. This is a deterministic program.
  2. Although we spend hours (days) to debug make them deterministic.
  3. In a stochastic or random process there is some indeterminacy: even if the initial condition (or starting point) is known, there are several (often infinitely many) directions in which the process may evolve. Example: So stock market fluctuations can be modeled by as several stochastic processes. Discrete distributions are usually modeled with PMF (probability mass function) PDF is a function that describes the relative likelihood a continuous random variable to take on a given value. The normal distribution is modeled by a PDF. PDF usually a represented by a formula (that includes the parameters of the distribution). Distribution - modeled as PDF/PMF – modeled by parameters
  4. Examples: probabilistic context free grammar, factor graphs (built from random variables and factors). What we need is some universal way to define PM (Probabilistic Models)
  5. PPL allows you to encode a probabilistic model to represent a stochastic process using tools that we already know (programming languages that we already know). So what you get at the end of your program is a joint distribution.
  6. Statistics: Could be a linear model y = ax+b Or it could be a: distribution X – what parameters of these distributions may have given rise to this data. Parameters that describe these distributions. For example Gaussian is defined by (mean, std). Probabilistic programming: We can estimate the parameters ourselves and use program to describe our distribution. And then generate samples
  7. So PPL let us focus on the model and not on the inference. Why? Because implementing inference algorithms is not easy. We can do it ourselves, but why reinvent the wheel each time and not leave this to the compiler. Also in many cases our custom implementation will contain bugs. So it is better to leave this job to researches that specialize in probabilistic inference. The inference can built-in the compiler or it might come as an extra library that provides the “probabilistic primitives”. Inference algorithms: variable elimination, dynamic programming, message passing, Monte Carlo.
  8. Bayes(ian) model or probabilistic directed acyclic graphical model is a probabilistic graphical model PGM (a type of statistical model) that represents a set of random variables and their conditional dependencies via a directed acyclic graph (DAG). For example, a Bayesian network could represent the probabilistic relationships between diseases and symptoms. Bayes net are representation of a distribution. Tables can become enormous. By just adding a single dependency or variable will increase the number of tables and rows a lot. CPT – conditional probabilistic table
  9. Here we get this compact representation.
  10. Functional languages seem appropriate for Probabilistic programs. problem in web search: inferring the relevance of documents from the sequence of clicks made by users on the results page
  11. Figaro / Infer.net/ Probabilistic C# easier for commercial project implementation. Infer.NET license restricted. Church, Figaro, Infer.NET have good documentation with examples. Probabilistic C# has a tutorial, but it is not well documented.
  12. So the code in the next section will look like scheme. Alonzo Chuch – mathematical logic.
  13. Next run of the program might give us another answer. Dice example.
  14. The result is a distribution. Dice example.
  15. Flip gives us a distribution of 50/50
  16. Example of rejection query Rejection query is very slow.
  17. we could look at (>= (+ A B C) 2))) as another random variable, which is unnamed. We can more or less put everything as a condition and this shows the power of church although with variable efficiency. Mh-query is based on Metropolis Hastings (MH) algorithm.
  18. TB and flu are the causes. Coughing and sneezing are the symptoms or observations. A model that goes from top to bottom – the top is the source or the cause and the bottom are the observations. So let’s reason about TB which is actually reasoning backwards. TB and flu are independent. TB are conditionally dependent on coughing. So first our degree of belief goes up and then by adding something it actually goes down. So probabilities are useful. So we get non-monolithic reasoning, something non-linear happens. This does not happen if we simply use classical logic. 26:00 –54:35
  19. This “learning curve” reflects a highly systematic and rational process of conditional inference. These probabilities determine the weight passed to make-coin. A weight of 0.5 indicates that the coin will give 50% heads, so it is a fair coin.
  20. This is a discrete example.
  21. So we are searching which coin produced the observed result having in mind the the priori information.
  22. If a priory we set 1/1000 is the chance to meet a biased coin we have seen 5 heads. If a priory we set 1/1000 is the chance to meet a biased coin we have seen 10 heads. If a priory we set 50% is the chance to meet a biased coin we have seen 5 heads.
  23. Hidden states: Observed states: 2 Transition functions
  24. The idea here is show how intuitive the implementation of Hidden Markov Model is. Hidden states: '(s1 s2 s3 s4 s5 s6 s7 s8 stop) Observed states: '(chef omelet soup eat work bake)) 2 Transition functions, but they use a hidden state to transition. (define states '(s1 s2 )) (multinomial states '(0.4 .2))
  25. The host types (from C#/Java/Scala) bool, int, double describe the values the probabilistic variable can take and they tell us if the distribution is continuous or discrete.
  26. In the first case we create instance of the distribution and we sample it conditioning it on c (that was sampled from CloudyDist) In the second case for each sample c we create a corresponding Bernoulli distribution object and we sample it. But we need to get our samples from a single object – a single Bernoulli distribution.
  27. FA work on by operating on data structures called factors. Factors capture the probabilistic model being reasoned about. VE is an exact algorithm, so its great. Can be very slow. BP is an approximation algorithm. It can be fast and most of the time it returns result that is close to the right one, but not always. So there is a tradeoff between accuracy and speed. Sampling algorithms work by creating examples of possible worlds from the prob. distribution and using those examples to answer queries. Sampling is approximate inference algorithm. Sampling can be used for continuous variables and it is perhaps more useful for such variables. Importance Sampling algorithm is a type of Forward Sampling algorithm. With MCMC the idea is rather than sampling from a distribution, you define a sampling process that eventually converges to the true distribution. The estimated distribution you get from every run of the program will usually be different and might not the one that you described in probabilistic program. Sometimes we might not be able to use factored algorithms – for example when we have many variables and each variable can take on many values. More samples intuitively means that we get closer to the real distribution (the variance of the sampling process decreases with more samples). Also it is easy to answer prob. queries by using sample generation. Importance Sampling is similar to Rejection Sampling where samples that do not satisfy the evidence or the conditions are rejected.
  28. Show code here.
  29. Show code here
  30. You model a normal system. And each time you provide the current state – the evidence and query (what is the probability of evidence)
  31. A ranking system is actually a hard think to do. TrueSkill is a Bayesian ranking algorithm. It is a probabilistic model. TrueSkill can converge on a stable ranking in as few as three games, depending on the number of players. TrueSkill ranking system skill is characterized by two numbers: The average skill of the gamer (μ). The degree of uncertainty in the gamer's skill (σ) – variance. So we start with the normal distribution about each player (with some default values) The ranking system maintains a belief in every gamer's skill using these two numbers. If the uncertainty is still high, the ranking system does not yet know exactly the skill of the gamer. In contrast, if the uncertainty is small, the ranking system has a strong belief that the skill of the gamer is close to the average skill. μ always increases after a win and always decreases after a loss. The extent of actual updates depends on each player's σ and on how "surprising" the outcome is to the system. Unbalanced games, for example, result in either negligible updates when the favorite wins, or huge updates when the favorite loses surprisingly.