BAYa class Assignment 2020

In this assignment, your task will be to implement and analyze algorithms for inference in probabilistic models described by factor graphs. In particular, you will implement Sum-Product algorithm (or Belief Propagation) and Max-Product (Max-Sum) algorithm to infer different probabilities, (marginal) distributions and values of variables for general factor graphs with categorical random variables as described in the slides on Graphical Models and Inference. More detailed information can be found in Chapter 8.4 in Christopher M. Bishop. 2006. Pattern Recognition and Machine Learning

The preferred and easiest way of accomplishing this task is to complete this Jupyter Notebook, which already comes with a definition of the probabilistic model that you will work with. If you do not have any experience with Jupyter Notebook, the easiest way to start is to install Anaconda3, run Jupyter Notebook and open this notebook downloaded from [BAYa_Assignment2020.ipynb] (http://www.fit.vutbr.cz/study/courses/BAYa/public/notebooks/BAYa_Assignment2020.ipynb).

The following cell contains a code with the definition of the probabilistic model for which you will run the inference algorithms. You should not edit this part!

In the above cell, we first introduce names of our random variables and their values to make up some "story" around our probabilistic model. Our model "pretends" to model the behaviour of a random US citizen during the recent US presidental elections. It contains the following random variables and their categorical values:

Next came the definition of the full model as a list of factors. In our model, the factors are taken from a bayesian network (see below). Therefore, each factor represents a (normalized) probability distribution for one of the 6 variables. Such variable is always the first element of 'vars' (see 'vars' in class Factor). If 'vars' has more than one element, then the factor represents a conditional distribution that is conditioned on the remaining variables.

The following function 'evaluate_factor' can help us interpret the individual factors. Its arguments are a Factor instance and values for all the variables that the factor depents on (in the order given by 'vars'). The values are given only as integer IDs of the categories (i.e. second indices into 'val_names') The function evaluates the factor using the values. Finally, using the variable and value names, it prints out a string showing what (conditional) distribution is evaluated using what values and what is the resulting probability.

From the examples of evaluated factors above, we can see that

Using all the factors, the following code construct graph showing the corresponding Bayesian Network. Each node is one variable. For each factor, we create edges into the first variable in 'vars' from all the remaining variables.

As we can see, our model naively assumes that the probability of being COVID positive does not dependent on the age or the state the person is from.

Similarly, we can draw Factor Graph corresponding to our model:

As was already said, each factor corresponds to (conditional) distribution of one variable. Therefore, we choose the factors (rectangle) node names to be these variables names. This Factor Graph has tree structure so application of Belief Propagation should be straightforward.

The following cell implements the inefficient 'brute-force marginalization' approach, which might be useful to test correctness of your Belief Propagation implementation. When calling this function, we can specify values of some of the variables and the function marginalizes over all possible values of the remaining variables. It implements the operation $$ \DeclareMathOperator{\xx}{\mathbf{x}} \sum_{\xx_{marg}}\prod_s f_s(\xx_s) $$ where the product is over all factors $f_s$, $\xx_s$ is the subset of variables that $f_s$ depends on and $\xx_{marg}$ is the subset of variables for which the sum sums over all their possible values (i.e marginalizes). For our model where the factors are the (conditional) distributions of individual variables, the product corresponds to the joint distribution of all the variables $$ \prod_s f_s(\xx_s) = P(State) P(Age) P(COVID) P(Party|State,Age) P(Favorite|Party) P(Voting|Party,COVID)\\ = P(State,Age,COVID,Party,Favorite,Voting) $$

For example, if we supply the function with values for $Party$ and $Voting$ and marginalize over all other variables, we obtain the joint marginal probability

$$ P(Party,Voting)=\sum_{State} \sum_{Age}\sum_{COVID}\sum_{Favorite}P(State,Age,COVID,Party,Favorite,Voting) $$

When calling 'brute_force_marginalize' in the cell above, we set the values of all variables to 'None'. This instructs the function to sum over all the values of all the variables. The result is 1, which proves that our model represents a well normalized joint distribution. In the following calls, we clamp some of the variables to specific values, which avoids marginalization over those variables.

The above examples show that the marginal probabilities $P(Voting)$ and $P(Party)$ sum to 1 when summed over all their possible values. The marginal distribution $P(Voting)$ tells us what are the probabilities of different forms of voting (InPerson, Postal, NotVoting) and $P(Party=Democrats)$ says that $2/3$ of all the people are Democrats.

We also evaluate the joint marginal P(Party,Voting) to see that out of all the people, 17.16% are Republicans that came to vote InPerson.

Tasks and Questions

Your task is to implement the Sum-Product algorithm allowing for much more efficient inference that the brute-force approach. Based on the following questions, you will use the algorithm to infer individual probabilities such as $$P(Age=Young,Favorite=Biden)=?$$ $$P(Voting=Postal|Favorite=Trump,COVID=Positive)=?$$ or the whole marginal distributions such as $$P(Voting|Favorite=Trump,COVID=Positive)=?$$ $$P(Age,Favorite)=?$$ For the distributions, you need to report probabilities for all the possible values of the variables in question, e.g. $$P(Age=Young,Favorite=Biden)=?$$ $$P(Age=Young,Favorite=Trump)=?$$ $$P(Age=Old,Favorite=Biden)=?$$ $$P(Age=Old,Favorite=Trump)=?$$

Use your implementation to infer the following probabilities or distributions and to answer the following question:

  1. For all the individual variables, use the Sum-Product algorithm to (most efficiently) infer and print their marginal distributions $P(State)$, $P(Age)$, $P(COVID)$, $P(Party)$, $P(Favorite)$, $P(Voting)$.
  2. What is the computational complexity of the inference from the previous point compared to the brute-force marginalization? Extend the code for the inference algorithms (both brute-force and Sum-Product) to count the number of additions and multiplications carried out during the inference (consider only those from the equations describing the algorithm). Report these number. Note that, with the brute-force approach you need to call the brute_force_marginalize once for every value of every variable in order to evaluate all the marginal distributions. Keep reporting such computational complexity comparisons also for all the following inference problems.
  3. Use the Sum-Product algorithm to evaluate and print the following conditional marginal distributions $$P(Voting|Favorite=Trump)$$ $$P(Voting|Favorite=Trump,COVID=Positive)$$ $$P(Voting|Favorite=Trump,State=Texas,COVID=Positive)$$
  4. Use the Sum-Product algorithm to (most efficiently) evaluate and print the following joint marginal distribution $$P(Party, Favorite)$$ Hint: Note that there is one factor dependent on both these variables.
  5. Use the Sum-Product algorithm to evaluate and print the following joint marginal probability $$P(Party=Republicans,Voting=InPerson)$$
  6. How can we use the Sum-Product algorithm to evaluate the whole joint marginal distribution $P(Party,Voting)$? Note that in this case, there is no factor dependent on both these variables.
  7. Implement Max-Product (or Max-Sum) algorithm and use it to infer the most likely values of all the variables. In other words, find the values for all the variables that maximizes the joint probability $$P(State, Age, COVID, Party, Favorite, Voting)$$
  8. With observed values $Voting=Postal$ and $Favourite=Trump$, .use Max-Product (or Max-Sum) algorithm find the most likely values of all the remaining variables.
  9. Finally, what if we replace factor Factor([2], [0.3, 0.7]) in our model with factor Factor([2, 0], [[0.9, 0.8, 0.5], [0.1, 0.2, 0.5]])? What does it represent? How does the corresponding Bayesian Network change? How does it complicate the inference? It is enought if you just answer these questions. You can also try to implement inference for the marginals of the individual variables for this modified model. Succesfull implementation will be awarded by a bonus point.