ML1 Homework 2 Solved

Description

5/5 – (1 vote)

1 Probability distributions, likelihoods, and estimators

For these questions you will be working with di↵erent probability density functions listed in the table below. The purpose of these questions is to practice working with a variety of PDFs and to make computing likelihoods, MLEs, etc. more natural. Note below the indicator notation [x = 0] (and [x = 1]). The square brackets evaluate to 1 if the argument is true, and 0 otherwise. E.g. if x is 1, the [x = 0] = 0 and [x = 1] = 1 (here [x = 0] is lazy notation; in Python you would write x == 0, for example). We will use the notation a lot, both below and when we learn about classification.
Distribution p(x ✓) Range of x Range of ✓
|
Bernouilli ✓[x=1](1 ✓)[x=0] x 2 {0,1} 0 ✓ 1
Beta 0  x  1 ✓1 > 0,✓0 > 0
Poisson x 2 {0,1,2,…} ✓ > 0
Gamma x 0 ✓1 0, ✓0 0
Gaussian < x < < ✓0 < , ✓1 > 1
1 1 1 1
Question 1.1
For each of the probability distributions above, write down their normalizing constants. Remember that R p(x|✓)dx = 1 for continuous x and Px p(x|✓) = 1 for discrete x.
Question 1.2
1. What is the likelihood for a single observation? For the entire set ofobservations?
2. Write the log-likelihood for the entire set of observations.
3. Solve for the MLE of ⇢. Do it in general (with symbols for counts n0, n1 for days without and with rain) and for this specific case (plug-in the numbers).
4. Assume a Beta prior for ⇢ with parameters a and b. What is the MAP for ⇢?
5. Write the form of the posterior distribution for ⇢? You do not need to solve it analytically.
6. (Optional) Solve for the posterior distribution analytically. Hint: it is a Beta distribution.
Question 1.3
You work in the sta ng department of a maternity hospital and part of your job is to determine the sta ng requirements during the night shift at your hospital. This might mean the number of doctors and nurses at the hospital and the number of doctors on call (if there are more than the average number of deliveries). Your goal is to determine the distribution over the number of deliveries during the night shift dt 2 {0,1,2,…} (d for delivery count, t for time, the index of the night). With this you can compute the mean, the probability of more than 5 deliveries, etc. You collect data for two weeks, i.e. d1,…,d14 = 4,7,3,0,2,2,1,5,4,4,3,3,2,3. You assume the observations are explained by a Poisson distribution with parameter over the discrete delivery counts. With this information, answer the following questions:
1. What is the likelihood for a single observation? For the entire set ofobservations?
2. Write the log-likelihood for the entire set of observations.
3. Solve for the MLE of . Do it in general and for this specific case (plug-in the numbers).
4. Assume a Gamma prior for with parameters a and b. What is the MAP estimate of ?
5. Write the form of the posterior distribution for ? (You do not need to solve it analytically)
6. (Optional) Solve for the posterior distribution analytically. Hint: it is a Gamma distribution.
Question 1.4
You have developed a blood test aimed at detecting a disease d 2 {0,1} (disease is absent (d = 0) or present (d = 1)). The test measures the level of a specific indicator of the disease, that is it returns a real valued number relative to some baseline (so the levels can be both negative and positive – anywhere along the real line). Two models of the population are built: one for the patients with the disease, and another for the general population. Measurements tend to have a Gaussian shape, and we therefore model the entire population as a mixture of two Gaussians. That is, p(l) = p(d = 0)p(l|d = 0) + p(d = 1)p(l|d = 1), where p(d) is the prior distribution of patients with and without the disease in the general population and p(l|d) are conditional Gaussian distributions, one for the patients with disease, and one for those without. Note: with this question and the previous two, we are simply applying rules of probability (with some algebra) to get the form of the posterior distribution; however, in this problem we are also classifying (since our target is the discrete label d).
Assume we know p(d = 0) = ⇡0 = 0.999 and p(d = 1) = ⇡1 = 0.001 from previous experience. We do not know the parameters (the mean and variance of the disease-free population) nor (for the disease population). We measure levels for N people, and we know that n 2 {D0} are the indices for the disease free patients and n 2 {D1} are the indices for the patients with the disease (i.e. D0 and D1 are non-intersecting sets of indices from 1 to N). With this information, answer the following questions:
1. Write down the likelihood of the observations as a product over N level recordings. Hint: use indicator notation (like in the Bernouilli distribution) to distinguish between dn = 0 and dn = 1 in the likelihood.
2. Write down the likelihood as a product over the likelihoods for {D0} and {D1}.
3. Compute the log-likelihood.
4. Find the MLE for µ0 and 02. Assume we can do the same for µ1 and
2
1
5. We now have our models. To make a prediction, solve for p(d = 1|l?), where l? is a level recorded for a new patient. Hint: use Bayes theorem.
6. Reduce your solution to have the form of a sigmoid, i.e.
.
2 Matrix inversion lemma
In computing the posteriors and evidences for Gaussian models, we often encounter complicated forms of the inverse covariance matrices that actually have very simple forms if we apply some matrix manipulation. The matrix inversion lemmas in the Matrix cookbook list a few, in particular, the Woodbury Identity:
A + CBCT 1 = A 1 A 1C B 1 + CTA 1C 1 CTA 1
Question 2.1
When you are working with vector and matrix notation, it can be di cult to know if your solution is correct. A good test is to convert your solution to the scalar case, which is often much simpler to do, and see if the equivalent matrix/vector form matches the scalar form. We will do this for Woodbury by proving the lemma for the scalar case.
1. In the right hand side of the Woodbury identity, replace the matriceswith scalars: A = a, B = b, C = c, where a, b, and c are scalars.
2. Prove that the rhs is equal to the lhs for the scalar case.
3 Posterior predictive distributions

Question 3.1
Assume we have 1) observed a single training pair {t1, 1}, 2) a Gaussian likelihood (below) and 3) a Gaussian prior over weights w:
p(t1| 1,w, ) = N t1| T1 w,1/ p(w|↵) = N (w|0,I/↵)
Answer the following:
1. Write the posterior distribution p(w|t1, 1,↵, ) as a function of the prior, likelihood, and evidence.
2. Derive the posterior distribution p(w|t1, 1,↵, ).
3. Derive the model evidence p(t1| 1,↵, ). Hint: much of the work solving for the posterior can be used to solve for the evidence.
Question 3.2
This question continues from the previous question. Now assume we have observed N 1 new data vectors (so that now we have an N-dimensional vector of targets t and N-by-D dimensional matrix of transformed data ) and we have computed the posterior of w:
p(w|mN,SN) = N (w|mN,SN)
mN = ↵I + T 1 Tt
SN = ↵I + T 1
Now we have a test vector ?. We want the posterior predictive distribution for t?, i.e.
p(t?| ?, ,↵, ) = Z p(t?| ?,w, )p(w|mN,SN)dw (1)
to do this, answer the questions and/or perform the steps below. Note that this is a relatively rare instance in Machine Learning where we can analytically integrate over the model parameters and get an exact form of the posterior (predictive) distribution. Hint: If you get stuck, you can try solving the question using scalar (x) (and w); the steps to the solution will be the same, but will involve scalar operations (division instead of matrix inversion, etc) that are easy. Note: the questions/steps below divide the work required to derive the analytic form of the posterior predictive distribution into manageable chunks; the main question is solve the integral above for p(t?| ?, ,↵, ). Below, by “Gaussian form”, we mean the exponential part of the Gaussian should look like exp( ); this is not in “Gaussian form”: exp( 212(x2 mx xm + m2) (here shown for scalar x and m).
1. Compute the joint p(t?| ?,w, )p(w|mN,SN). Use correctly labeled C to represent the Gaussian normalizing constants.
2. Rewrite the joint as product of constants and two exponential functions.In one exponential, collect all terms with w; put the remaining terms in the other exponential.
3. Complete the square for w. This will require adding a m to the w exponential and subtracting from the other (and thus canceling each other).
4. Rewrite the w exponential so it is in a Gaussian form. Marginalize for t? by integrating over w, ensuring the resulting marginal distribution is normalizing (i.e. keep track of all the normalizing constants).
5. What is left is the posterior predictive distribution, but it is not in aGaussian form; we’ll do this now. Collect the squared and linear terms of t?.
6. The squared term is in the form of t? ?t?; solve for ?. Hint: use the
Woodbury identity. Show the correct solution is .
7. The linear term is t? ?y?; solve for y?.
8. Write the Gaussian form for the posterior predictive distribution usingmean and variance functions.
9. (Bonus) Let N2 ( ) = 1/ ?( ). In the limit N ! 0, what is N2 ( )? How do you interpret this? How is this related to the irreducible loss (similarities and di↵erences)?

Reviews

There are no reviews yet.

Be the first to review “ML1 Homework 2 Solved”

Description

Reviews

Related products

ML1 – Machine Learning 1 Homework Week 5 Solved