Abortion pills in Doha Qatar (+966572737505 ! Get Cytotec
Probability introduction for non-math people
1. Probability Distribution for Non-math People
People who don’t have advanced mathematical knowledge still
needs to understanding the AI/BI, machine learning, data
analysis results so they could use these results
Author: Dr. Guang Yang
email: guangyang@btconnect.com
2. Head on Probability and Distribution
• How much confidence you could gain from results from BI/
AI, analytical tools, and machine learning. How the
confidence changes with environmental changes.
• Probability distribution is fundamental to understand
statistics, AI/BI, machine learning, exploration analysis
results for probabilistic/soft classification or regression
• Understand probability and distribution as an observer
without going mathematical knowledge and details
• The basic concepts: probability, conditional probability, join
probability and their distribution
3. Example for Explain Probability Distribution
Glasgow office has 30 male and 10 female staffs. There are 3 male managers
and 4 female managers among them.
We could define 2 events according to the characters: event1: staff gender: male,
female; event 2: staff role: manager; worker. Image a visitor knock the door, what
are the probabilities ?
probability: if a male staff to open the door: P(gender=male)=0.75) and P(gender !
=male)=0.25;
conditional probability: if a female opened door, what chance she was a manager
P(role=manger|gender=female) = 0.4
join probability: what a chance if the person who open door is a female manager:
P(gender=female, role=manager)=0.4 x 0.25 = 0.1
probability distribution: for some reason, work from home, etc.. the office never
full, what are probabilities of male staff in 18, 19, 20 … with 30 staffs in office,
respectively.
4. Generally Looking at Probabilities
• Understand events and their outcomes: event gender has outcome
{male, female}, event role has outcome {manager, worker}; beware
event domain or scope of space
• Experiment: count the number of repeating event trail outcome
separately
• Computing probabilities using the counted numbers against
corresponding event domains.
• Define the relationships in terms of conditional, join, and both
• Conditional probability, known first event outcome, what a chance
of second event outcome.
• Join probability: want to know what chance if two event outcomes
come together
5. Probability Distribution Introduction
• Event is a set of outcomes of an experiment to which probability is
assigned
• Probability distribution is a description of random phenomenon in
terms of the probabilities of events
• Have a probability, want to know how the probability changes with
the changes of environment or parameters.
• Choose appropriated probability distribution for calculation
according to the characters of outcomes and events.
• The most popular type of distribution is identical independent
distribution — i.i.d
• There are hundreds of probability distributions, but 15 are common
distribution, and their relationships show next slide.
6. Understand Probability Distribution Parameters
• Identify random variables and distribution parameters.
• Common distribution parameters include mean, variance, and size
of domain.
• Mean is a weighted average of possible values that random variable
can take. It is also called Expected value of the random variable.
• Variance measures the spread or variability of distribution. It
indicates the likely range of variability among the mean. Its square
root called standard deviation
• Identify the source of the distribution parameters, i. e. from sample
space or from population.
8. Choosing Probability Distributions—1
• Bernoulli and Uniform both are single trail, Bernoulli has
two outcomes with one probability as p (not necessary 0.5),
another 1-p. Uniform has n outcomes, and each outcome
has probability 1/n
• Binormal distribution: repeat trials of Bernoulli distributions
and trails are independent to each other.
• Hypergeometric distribution: similar to Binormal distribution
except trails are NOT independent each other, such as pick
up a colour ball from urns without replacement.
• Poisson distribution: binary outcomes, probability p is small,
and trails n is large. λ <—np such as the river had been
flooded 3 times within 100 years ( λ=3)
9. Choosing Probability Distribution — 2
• Geometric distribution: solves “how many failed until a
success” comparing to Binormal distribution which solves
“how many successes”
• Negative binormal distribution is a simple generalisation to
geometric distribution with r success instead 1 success; i.e.
“how many failures until r successes”
• Exponential distribution: a continuous distribution, typically to
solve “how long until an event” in comparison with Poisson
“how many events per time”
• Weibull distribution: a generalisation of exponential
distribution to describe time-to-failure. The rate λ of failure is
vary, whereas exponential distribution rate is constant.
10. Choosing Probability Distribution — 3
• Normal (Gaussian) distribution: a most important continuous
distribution. It is also call Bell shape distribution. The
distributions of the sums of other distributions follows
(approximately) the normal distribution (central limit theorem)
• Log-normal distribution: it takes values whose logarithm is
normally distributed, or exponentiation of a normally distributed
value.
• t-distribution: reasoning about the means of normal distribution,
and approaches the normal distribution as its parameters
changes.
• chi-squared distribution: a distribution of the sum of squares of
normal -distributed values. chi-squared test is the sum of
squires of differences, which supposed to be normal distribution.
11. Choosing Probability Distribution — 4
• Gamma distribution: a two parameters family continue
distribution, the generalisation of both the exponential and
ch-squared distributions, model continuous variables that
are always positive and have skewed distributions. it
could be used to model waiting time until next n event
occur. Conjugate prior to a couple distributions in machine
learning.
• Beta distribution: a family of continuous probability
distribution defined on the interval [0, 1], used to model
the behaviour of random variable limited to intervals of
finite length in a wide variety of disciplines. It also for the
conjugate prior
12. Conclusion
• Many people won’t do data analysis or machine learning, but
they may use the analytic or machine learning results. They
may need to communicate with data scientist and analyst.
• In many cases, probability distributions are basis for data
visualisation. To understand chart or diagram, the basic
concepts should be understood
• Many classifications, such as maximum likelihood estimation
(MLE) are based on probability distribution principles.