Python Bayesian probability inference sequence data probability and a priori, likelihood and a posteriori graph visualization

Posted by marowa on Wed, 10 Nov 2021 11:09:21 +0100

Original link:

In this article, I will focus on an example of the inference probability given a short data sequence. I will first introduce the theory of how to use Bayesian method for expectation reasoning, and then implement the theory in Python so that we can deal with these ideas. In order to make the article easier to understand, I will only consider a small group of candidate probabilities. I can minimize the mathematical difficulty of reasoning and still get very good results, including a priori, likelihood and a posteriori graphs.

  Specifically, I will consider the following situations:

  • The computer program outputs a random string of 1 and 0. For example, an example output might be:

  • The goal is to infer the program used to generate D   0 of   The probability of. We use the symbol p0   Represents 0   The probability of. Of course, this also means 1   The probability of must be p1=1 − p0.
  • As mentioned above, we only consider a set of candidate probabilities. Specifically, the candidate p0=0.2,0.4,0.6,0.8 is used for the above data sequence. How do we choose wisely among these probabilities and how sure are we of the results?  


My starting point is to write the probability of the sequence, just as I know the probability of 0 or 1. Of course, I don't know these probabilities - finding them is our goal - but it's a priori useful. For example, the probability of our sample data series does not need to specify the value of p0, which can be written as:

I use p1=1 − p0 to write the probability of p0. I can also write the above probability in a more compact way:  


The form of probability given above is called   Bernoulli process  . I can also write this probability in a very general way, rather than specifically about Data Series D or probability p0, such as:  

n0 and n1 represent the number of 0 and 1 in the data series.

By replacing the relevant counts and probabilities, I can relate the general form to a specific example. I first calculate the likelihood values of the data series and probability given above:


As a result, I found that p0 = 0.6 was the most likely, slightly higher than p0 = 0.8. Here are a few points to note:

  • I have the maximum likelihood value (among the values considered). I can provide the answer p0=0.6 and complete it.
  • Sum of probability (likelihood)   Not 1——   This means that I did not normalize the probability mass function (pmf) about p0 correctly, and I tried to infer the parameters. One goal of Bayesian inference is to provide a properly normalized pmf for p0, which is called a posteriori.

The ability to perform the above calculations enables me to apply Bayesian theorem well and obtain the required a posteriori pmf. Before moving on to the Bayesian theorem, I want to emphasize the general form of likelihood function again  :

  It is also useful to write down log likelihood:

Because when I create some Python code below, this form increases numerical stability. It should be clear that I use the natural (e-based) logarithm, that is, loge(x)=ln(x).


I have decided to choose p0 ∈ {0.2,0.4,0.6,0.8} as a set of probabilities I will consider. The rest is to assign a priori probability to each candidate p0 so that I can start with the correctly normalized a priori pmf. Assuming a priori equality, this is a kind of reasoning:

Where A1 is used to represent my assumptions. The above information constitutes a priori   pmf.

Bayesian theorem and a posteriori

Next, I use   Defined above   likelihood   and   A priori pmf is used to infer the potential value of p0. That is, I will use Bayesian theorem to calculate   Given likelihood and a priori   Posterior pmf. A posteriori is in the form of:

In other words, this is  _ Given data sequence_   D  _ And assumptions_   A1_ Yes_   p0   _ The probability of_ I can calculate a posteriori using Bayesian theorem:  


Where a priori P(p0|A1) is red, likelihood P(D|p0) is black, and a posteriori P(p0|D,A1) is blue.

This updates my p0 information from hypothesis (A1) to hypothesis + data (d, A1):  

I can simplify Bayesian theorem by defining marginal likelihood function:  

I can write the Bayesian theorem in the following form:

The posterior part should be considered as a set of equations corresponding to each candidate value of p0, just as we do for likelihood and a priori.

Finally, for the theory, I calculate the a posteriori pmf of p0. Let's start with the calculation basis (I know all the likelihood and a priori values above):

Therefore, the denominator in Bayesian theorem is equal to 9.57440e-04. Now, complete the posteriori pmf calculation.  






Before Python code, let's review the results a little. Using data and Bayesian theorem, I have learned from   A priori   pmf

reach   Posterior pmf

In the Bayesian setting, this a posteriori pmf is our answer to infer p0, reflecting our knowledge of the parameters of given assumptions and data. Usually people want to report a single number, but this a posteriori reflects considerable uncertainty. Some options are:

  • report   p0  _ Maximum a posteriori_ Value - 0.6 in this case.
  • report  _ Posterior mean  _ Posterior median_  —— Use_ Posteriori_ pmf for calculation.
  • A posteriori variance or confidence interval is included to describe the uncertainty in the estimation.

However, the report concludes that communication uncertainty is part of the work. In practice, the following figure is really helpful to complete the task. So let's leave the theory and implement these ideas in Python.

Writing reasoning code in Python

First, the code imports numpy and matplotlib. Use the ggplot style to plot.

imprt matlli.pplt as plt

#  use   mapltlb   style sheet

First, I created a class to handle   Likelihood. This class receives the data sequence and provides an interface for calculating the likelihood of a given probability p0. You can find the log likelihood equation in the method (special attention should be paid to the marginal case).  

class liihd:
    def \_\_int\_\_(elf,dat):
        """binary data"""

    def \_pss\_a(slf,data):
        tep = \[str(x) for x in dta\]
        for s in \['0', '1'\]:
            slf.cnts\[s\] = emp.ount(s)

        if len(tmp) != sum(ef.conts.valus()):
            rase Exepon("!")

    def \_prcs\_pobites(self, p0):
        """Processing data."""
        n0 = slf.couts\['0'\]
        n1 = slf.conts\['1'\]

        if p0 != 0 and p0 != 1:
            #  example
            log_dta = n0*np.og(p0) + \
            p\_daa = np.ep(opr\_dta)
        elif p0 == 0 and n0 != 0:
            #  If it is not 0, p0 is not 0
            lordta= -np.inf
            prta = np.exp(lor_daa)
        elif p0 == 0 and n0 == 0:
            ##  Data and   p0  =  0 consistent
            logpr_data = n1*np.log(1.-p0)
            prdat = np.exp(lor_dta)
        elif p0 = 1 and n1 != 0:
            #  If   n1 is not 0   p0 is not 1
            loprta = -np.inf
            paa = np.exp(lgpaa)
        elif p0 == 1 and n1 == 0:
            ordta = n0*np.log(p0)
            prta = np.xp(lgp_dta)

    def prb(self, p0):
        """Probability of obtaining data"""
        p\_at, \_ = sef.pcrbbes(p0)

        retrn prdta

    def lo_pb(sef, p0):
        """Get log probability of data"""
        _, lp\_at = slf.p\_plie(p0)

        reurn lor_ta

Next, I'm a priori   pmf creates a class  . Given a list of candidate values for p0, a uniform a priori is created by default. If you need additional, you can pass a priori probabilities to override this default. Let me give an example.

class pri or:
    def \_\_ni\_\_(self, pls, pobs=Nne):
           list: Permissible p0'list
           P_pos: \[Optional\]A priori probability
        if p_prbs:
            #  Ensure a priori normality
            nom = sum(p_pbs.vaes())
            sel.lopct = {p:np.log(_prbs\[p\]) - \
                                np.log(nrm) for p in p_lst}
            n = len(p_is)
            sef.lo\_pict = {p:-np.log(n) for p in p\_lst}

    def \_\_iter\_\_(self):
        rturn ier(sre(slf.lopit))

    def lgpob(self, p):
        """obtain p 0 Logarithm of/A priori probability."""
        if p in sef.ogpdt:
            return sf.og_ic\[p\]
            return -np.inf

    def prob(slf, p):
        """obtain p 0 A priori probability of."""
        if p in
            retun np.ep(sf.o_pt\[p\])
            reurn 0.0

Finally, I construct a class for a posteriori,    It uses data and an instance of a priori class to construct a posteriori pmf. The plot() method provides a very good reasoning visualization, including   A priori   likelihood   and   A posteriori graph.

Note that all posteriori calculations are done using logarithmic probabilities. This is absolutely necessary for numerical accuracy, because the probability may vary greatly and may be very small.

class posir:
    def \_\_it\_\_(slf, da ta, p ior):
        """Data: data samples as lists
        sel.lod = lklio(dta)
        lf.prr = prir

    def \_pocss\_ostrior(elf):
        """Use the passed data and a priori to process a posteriori."""

        nuts = {}
        deniaor = -npnf
        for p in slf.prir:
            netor\[p\] = sef.lieioo.logrob(p) + \

            if nurts\[p\] != -np.inf:

                deoior = nplgxp(eoior,

        #  Save the denominator in Bayesian theorem
        sef.lo_lielod = deoiato

        #  Computational posterior
        slf.ogict = {}
        for p in slf.pior:
            elf.lopct\[p\] = umros\[p\] - \

    def logpob(self, p):
        """Get through p Log a posteriori probability"""
        if p in self.loic:
            retrn self.ogdt\[p\]
            retrn -np.inf

    def prob(self, p):
        """Get passed p Posterior probability of"""
        if p in sl.lo_pdit:
            rtrn np.exp(sef.lct\[p\])
            rurn 0.0

    def plot(slf):
        """Draw reasoning results"""

        f, ax= plt.sbs3, 1, ise=(8, 6), hae=Tre)
        #  Obtain candidate probability from a priori
        x = \[p for p in elf.prir\]

        #  Draw a priori ob(p)   for   p   in   x\])
        ax\[0\].sem y1,inf='-, meft'', bef = -')

        #  Plot likelihood
        ax\[1\].stem(x, y, lifm= -',aerf t=ko bafmt=w')

        #  Drawing posterior
        ax\[2\].tm,y3 if='b-, mmt=bo, sefm-')


Let's test the code. First of all, I will copy the example we did in the theoretical example to ensure that everything is normal:


#  transcendental
A1 = prior(\[0.2, 0.4, 0.6, 0.8\])

#  Posteriori
pt1 = postior(da1, A1)

Please note how a posteriori pmf shows that both p0=0.6 and p0=0.8 have great probability - there is uncertainty! This makes sense because we have only one data series with a length of 10 and only four candidate probabilities. In addition, please note:  

  • The sum of all numbers in a priori and a posteriori is 1, reflecting that these are appropriate pmfs.

Next, let's consider setting a strong a priori -- a value of preference p0. Using our Python code, it is easy to see the impact of this a priori on the result a posteriori:

#  A priori-   Standardize by class

#  Posteriori
po2 = ptror(data, A2)

Note the following:

  • A posteriori and likelihood no longer have the same shape.
  • A posteriori probability of p0=0.2,0.4_ Relative to their a priori probability_ all  _ Down_   Because their likelihood for the data sequence provided is very low. In a similar way, P0 = a posteriori probability of 0.6,0.8_ Relative to their a priori probability_ Some  _ Add.

Finally, let's use more candidate probabilities (here 100) and longer data sequences as an example.

#  Probability set to 0
p0 = 0.2
#  set up   rng   Seed is   four
#  Generating data
da2= np.roie(\[0,1\],  p=\[p0, 1.-p0\])

#  transcendental
A3 = pir(np.aane)

#  Posteriori
ps3 = porir(daa2, A3)


  • A posteriori has a nice smooth shape - the probability I deal with looks like a continuous value.
  • Note that the likelihood value (y-axis) of this data volume is very small.

Most popular insights

1.Deep learning using Bayesian Optimization in matlab

2.Implementation of Bayesian hidden Markov hmm model in matlab

3.Bayesian simple linear regression simulation of Gibbs sampling in R language

4.block Gibbs Gibbs sampling Bayesian multiple linear regression in R language

5.Bayesian model of MCMC sampling for Stan probability programming in R language

6.Python implements Bayesian linear regression model with PyMC3

7.R language uses Bayesian hierarchical model for spatial data analysis

8.R language random search variable selection SSVS estimation Bayesian vector autoregressive (BVAR) model

9.Implementation of Bayesian hidden Markov hmm model in matlab

Topics: Machine Learning AI Deep Learning Data Mining