100% Guaranteed Results


10418 – HOMEWORK 5 Solved
$ 20.99
Category:

Description

5/5 – (1 vote)

VARIATIONAL INFERENCE
https://piazza.com/cmu/fall2019/1041810618
TAs: Aakanksha, Austin, Karthika
START HERE: Instructions
Summary In this assignment, we will walk through the basics of Variational Inference, Mean-Field Approximation and the Coordinate Ascent Variational Inference (CAVI) algorithm for simple distributions such as multivariate Gaussians and Gaussian Mixture Models. Finally, we will wrap up with a brief comparison between variational methods and sampling-based methods such as MCMC.
˜mgormley/courses/10418/about.html#7-academic-integrity-policies
• Late Submission Policy: See the late submission policy here: http://www.cs.cmu.edu/
˜mgormley/courses/10418/about.html#6-general-policies
• Autolab: You will submit your code for programming questions on the homework to Autolab (https: //autolab.andrew.cmu.edu/). After uploading your code, we will manually grade your code by hand. We will not use Autolab to autograde your code.
For multiple choice or select all that apply questions, shade in the box or circle in the template document corresponding to the correct answer(s) for each of the questions. For LATEX users, replace choice with CorrectChoice to obtain a shaded box/circle, and don’t change anything else.
1 Written Questions [44 pts]
1.1 Mean-Field Approximation for Multivariate Gaussians
In this question, we’ll explore how accurate a Mean-Field approximation can be for an underlying multivariate Gaussian distribution.
Assume we have observed data X that was drawn from a 2-dimensional Gaussian distribution p(x;µ, Λ−1).
p(x;µ, (1.1)
Note here that we’re using the precision matrix Λ = Σ−1. An additional property of the precision matrix is that it is symmetric, so Λ12 = Λ21. This will make your lives easier for the math to come.
We will approximate this 2-dimensional Gaussian with a mean field approximation, q(x) = q(x1)q(x2), the product of two 1-dimensional distributions q(x1) and q(x2). For now, we won’t assume any form for this distributions.
1. (1 point) Short Answer: Write down the equation for logp(X). For now, you can leave all of the parameters in terms of vectors and matrices, not their subcomponents.

2. (2 points) Short Answer: Group together everything that involves X1 and remove anything involving X2. We claim that there exists some distribution q∗(X) = q∗(X1)q∗(X2) that minimizes the KL divergence q∗ = argminq KL(q||p). And further, said distribution will have a component q?(X1) will be proportional to the quantity you find below.

It can be shown that this implies that q(X1) (and therefore q(X2)) is a Gaussian distribution.

Where
Using these facts, we’d like to explore how well our approximation can model the underlying distribution.
3. Suppose the parameters of the true distribution are µ and .
(a) (1 point) Numerical Answer: What is the value of the mean of the Gaussian for q∗(X1)?
(b) (1 point) Numerical Answer: What is the value of the variance of the Gaussian for q∗(X1)?
(c) (1 point) Numerical Answer: What is the value of the mean of the Gaussian for q∗(X2)?
(d) (1 point) Numerical Answer: What is the value of the variance of the Gaussian for q∗(X2)?
(e) (2 points) Plot: Provide a computer-generated contour plot to show the result of our approximation q∗(X) and the true underlying Gaussian p(X;µ,Λ) for the parameters given above.

4. Suppose the parameters of the true distribution are µ and .
(a) (1 point) Numerical Answer: What is the value of the mean of the Gaussian for q∗(X1)?
(b) (1 point) Numerical Answer: What is the value of the variance of the Gaussian for q∗(X1)?
(c) (1 point) Numerical Answer: What is the value of the mean of the Gaussian for q∗(X2)?
(d) (1 point) Numerical Answer: What is the value of the variance of the Gaussian for q∗(X2)?
(e) (2 points) Plot: Provide a computer-generated contour plot to show the result of our approximation q∗(X) and the true underlying Gaussian p(X;µ,Λ) for the parameters given above.

5. (1 point) Describe in words how the plots you generated provide insight into the behavior of minimization of KL(q||p) with regards to the low probability and high probability regions of the the true vs. approximate distributions.

1.2 Variational Inference for Gaussian Mixture Models
Now that we have seen how the mean-field approximation works for a multivariate Gaussian, let’s look at the case of Gaussian Mixture Models. Suppose we have a Bayesian mixture of unit-variance univariate Gaussian distributions. This mixture consists of 2 components each corresponding to a Gaussian distribution, with means µ = {µ1,µ2}. The mean parameters are drawn independently from a Gaussian prior distribution N(0,σ2). The prior variance σ2 is a hyperparameter. Generating an observation xi from this model is done according to the following generative story:
2. Generate xi from the corresponding Gaussian distribution N(cTi µ,1)
The complete hierarchical model is as follows:
µk ∼ N(0,σ2),k ∈ {1,2}
ci ∼ Categorical xi|ci,µ ∼ N(cTi µ,1),i ∈ [1,n]
where n is the number of observations generated from the model.
1. (1 point) What are the observed and latent variables for this model?

2. (1 point) Write down the joint probability of observed and latent variables under this model

3. (3 points) Let’s calculate the ELBO (evidence lower-bound) for this model. Recall that the ELBO is given by the following equation:
ELBO(q) = Eq[logp(x,z)] − Eq[logq(z)]
To calculate q(z), we will now use the mean-field assumption. Under this assumption, each latent variable is governed by its own latent factor, resulting in the following probability distribution:
!
Here q(µk;mk,vk2) is the Gaussian distribution for the k-th mixture component with mean and variance mk and vk2. q(ci;ai) is the categorical distribution for the i-th observation with assignment probabilities ai (ai is a 2-dimensional vector). Given this assumption, write down the ELBO as a function of the variational parameters m, .

4. Now that we have the ELBO formulation, let’s try to compute coordinate updates for our latent variables. Remember that the optimal variational density of a latent variable zi is proportional to the exponentiated expected log of the complete conditional given all other latent variables in the model and the observed data. In other words:
!
qi(zi) ∝ exp E−j[logp(zj|z−j,x)]
(a) (4 points) Show that the variational update for .
(Hint: We can write the optimal variational density for cluster assignment variables as
!
q(ci;ai1) ∝ exp logp(ci) + Eµ[logp(xi|ci,µ);m,v2] . Feel free to drop added constants

(b) (6 points) Show that the variational updates for the k-th mixture component are and .
(Hint: We can write the optimal variational density for the k-th mixture component as
!
. Feel free to drop added constants

1.3 Running CAVI: Toy Example
Let’s now see this in action!
Recall that the CAVI update algorithm for a Gaussian Mixture Model is as follows:

Note that our notation differs slightly, with ϕ corresponding to a and s2 corresponding to v2. We also have K = 2. Assume initial parameters, m = [0.5,0.5], v2 = [1,1] and ai = [0.3,0.7] for all i ∈ n and a sample x = [0.1,−0.3,1.2,0.8,−0.5]. Also assume prior variance σ2 = 0.01
Write a python script implementing the above procedure and run it for 5 epochs. You should submit your code to autolab as a .tar file named cavi.tar containing a single file cavi.py. You can create that file by running:
tar -cvf cavi.tar cavi.py
from the directory containing your code.
After the fifth epoch, report
1. (2 points) The variational parameters m.
m
2. (2 points) The variational parameters v2.
v2
3. (2 points) The variational parameters a.

Hint:
1. Note that the expectation update for a does not depend on µ. (Why?)
2. The expectation of the square of a Gaussian random variable is E[X2] = V ar[X] + E([X])2.
1.4 Variational Inference vs. Monte Carlo Methods
Let’s end with a brief comparison between variational methods and MCMC methods. We have seen that both classes of methods can be used for learning in scenarios involving latent variables, but both have their own sets of advantages and disadvantages. For each of the following statements, specify whether they apply more suitably to VI or MCMC methods:
1. (1 point) Transforms inference into optimization problems.
Variational Inference
MCMC
2. (1 point) Is easier to integrate with back-propagation.
Variational Inference
MCMC
3. (1 point) Involves more stochasticity.
Variational Inference
MCMC
4. (1 point) Non-parametric.
Variational Inference
MCMC
5. (1 point) Is higher variance under limited computational resources.
Variational Inference
MCMC
1.5 Wrap-up Questions
1. (1 point) Multiple Choice: Did you correctly submit your code to Autolab?
Yes
No
2. (1 point) Numerical answer: How many hours did you spend on this assignment?.

1. Did you receive any help whatsoever from anyone in solving this assignment? If so, include full details including names of people who helped you and the exact nature of help you received.

2. Did you give any help whatsoever to anyone in solving this assignment? If so, include full details including names of people you helped and the exact nature of help you offered.

3. Did you find or come across code that implements any part of this assignment? If so, include full details including the source of the code and how you used it in the assignment.

Reviews

There are no reviews yet.

Be the first to review “10418 – HOMEWORK 5 Solved”

Your email address will not be published. Required fields are marked *

Related products