Description
Bayesian Statistics and Data Analysis
Assignment 4
General information
The recommended tool in this course is R (with the IDE R-Studio). You can download R here and R-Studio here. There are many tutorials, videos and introductions to R and R-Studio online. You can nd some initial hints from RStudio Education pages.
When working with R, we recommend writing the report using R markdown and the provided R markdown template. The template includes the formatting instructions and how to include code and gures.
Instead of R markdown, you can use other software to make the PDF report, but you should use the same instructions for formatting. These instructions are also available in the PDF produced from the R markdown template.
We supply a Google Colab notebook that you can also use for the assignments. We have included the installation of all necessary R packages; hence, this can be an alternative to using your own local computer. You can nd the notebook here. You can also open the notebook in Colab here.
Report all results in a single and anonymous pdf. Note that no other formats are allowed.
The course has its own R package bsda with data and functionality to simplify coding. To install the package, just run the following (upgrade=”never” skips question about updating other packages):
1. install.packages(“remotes”)
2. remotes::install_github(“MansMeg/BSDA”, subdir = “rpackage”, upgrade=”never”)
Many of the exercises can be checked automatically using the R package markmyassignment. you can nd information on how to install and use the package here. There is no need to include markmyassignment results in the report.
You can nd common questions and answers regarding the installation and technical problems in Frequently Asked Questions (FAQ).
You can nd deadlines and information on how to turn in the assignments in Studium.
If you have any suggestions or improvements to the course material, please post in the course chat feedback channel, create an issue, or submit a pull request to the public repository here.
It is mandatory to include the following parts in all assignments (these are included already in the template):
1. Time used for reading: How long time took the reading assignment (in hours)
2. Time used for the assignment: How long time took the basic assignment (in hours)
3. Good with assignment: Write one-two sentences of what you liked with the assignment/what we should keep for next year.
4. Things to improve in the assignment: Write one-two sentences of what you think can be improved in the assignment. Can something be clari ed further? Did you get stuck on stu unrelated to the content of the assignment etc.
To pass (G) the assignment, you need 70% of the total points. To pass with distinction (VG), you need 90% of the total points. See the grading information on the point allocations for each assignment.
Information on this assignment
This assignment is related to Chapters 3 and 10.
Reading instructions: Chapters 3 and 10 in BDA3, see reading instructions.
Reporting accuracy: For posterior statistics of interest, only report digits for which the Monte Carlo standard error (MCSE) is zero. Example: If you estimate E(µ) = 1.234 with MCSE(E(µ)) = 0.01, you should report E(µ) = 1.2.
To use markmyassignment for this assignment, run the following code in R:
library(markmyassignment) assignment_path <paste(“https://github.com/MansMeg/BSDA/”,
“blob/main/assignments/tests/assignment4.yml”, sep=””)
set_assignment(assignment_path)
# To check your code/functions, just run
mark_my_assignment()
Don’t include markmyassignment results in the report.
Bioassay model
In this exercise, you will use a dose-response relation model that is used in Section 3.7 of the course book. The used likelihood is the same, but instead of uniform priors, we will use a bivariate normal distribution as the joint prior distribution of the parameters α and β.
a) In the prior distribution for (α,β), the marginal distributions are α ∼ N(0,22) and β ∼ N(10,102), and the correlation between them is corr(α,β) = 0.6. Report the mean (vector of two values) and covariance (two by two matrix) of the bivariate normal distribution.
Hint! The mean and covariance of the bivariate normal distribution are a length 2 vector and a 2 × 2 matrix. The elements of the covariance matrix can be computed using the relation of correlation and covariance.
Note! The answer is graded as correct only if the number of digits reported is correct! The number of signi cant digits can be di erent for the mean and quantile estimates. In some other cases, the number of digits reported can be less than MCSE allows for practical reasons.
Hint! Quantiles can be computed with the quantile function. With S draws,
the MCSE for E[θ] is pVar[θ]/S. MCSE for the quantile estimates can be computed with the mcse_quantile function from the bsda package.
Importance sampling
Now we discard our posterior draws and switch to importance sampling.
c) Implement a function for computing the log importance ratios (log importance weights) when the importance sampling target distribution is the posterior distribution, and the proposal distribution is the prior distribution from a). Below is a test example, the functions can also be tested with markmyassignment. Explain in words why it’s better to compute log ratios instead of ratios.
Note! The values below are only a test case. In this c) part, you only need to report the source code of your function, as it will be needed in later parts.
Hints! Use the function rmvnorm from the bsda package for sampling. Nonlog importance ratios are given by equation (10.3) in the course book. The fact that our proposal distribution is the same as the prior distribution makes this task easier. The logarithm of the likelihood can be computed with the bioassaylp function from the bsda package. The data required for the likelihood can be loaded with data(“bioassay”).
alpha <- c(1.896, -3.6, 0.374, 0.964, -3.123, -1.581) beta <- c(24.76, 20.04, 6.15, 18.65, 8.16, 17.4) round(log_importance_weights(alpha, beta),2)
## [1] -8.95 -23.47 -6.02 -8.13 -16.61 -14.57
d) Implement a function for computing normalized importance ratios from the unnormalized log ratios in c). In other words, exponentiate the log ratios and scale them such that they sum to one. Explain in words what is the e ect of exponentiating and scaling so that sum is one. Below is a test example, the functions can also be tested with markmyassignment.
Note! The values below are only a test case. In this d) part, you only need to report the source code of your function, as it will be needed in later parts.
alpha
## [1] 1.896 -3.600 0.374 0.964 -3.123 -1.581
beta
## [1] 24.76 20.04 6.15 18.65 8.16 17.40
round(normalized_importance_weights(alpha = alpha, beta = beta),3)
## [1] 0.045 0.000 0.852 0.103 0.000 0.000
e) Sample 4000 draws of α and β from the prior distribution from a). Compute and plot a histogram of the 4000 normalized importance ratios. Use the functions you implemented in c) and d).
f) Using the importance ratios, compute the importance sampling e ective sample size Se and report it.
Note! The values below are only a test case, you need to use 4000 draws for alpha and beta in the nal report.
alpha
## [1] 1.896 -3.600 0.374 0.964 -3.123 -1.581
beta
## [1] 24.76 20.04 6.15 18.65 8.16 17.40
round(S_eff(alpha = alpha, beta = beta), 3)
## [1] 1.354
Hint! Equation (10.4) in the course book.
Note! BDA3 1st (2013) and 2nd (2014) printing have an error for w˜(θs) used in the e ective sample size equation (10.4). The normalized weights equation should not have the multiplier S (the normalized weights should sum to one). Errata for the book can be found here: http://www.stat.columbia.edu/ ~gelman/book/errata_bda3.txt. The later printings and slides have the correct equation.
g) Explain in your own words what the importance sampling e ective sample size represents. Also explain how the e ective sample size is seen in the histogram of the weights that you plotted in e).
h) Implement a function for computing the posterior mean using importance sampling, and compute the mean using your 4000 draws. Explain in your own words the computation for importance sampling. Below is an example how the function would work with the example values for alpha and beta above. Report the means for alpha and beta, and also the Monte Carlo standard errors (MCSEs) for the mean estimates. Report the number of digits for the means based on the MCSEs.
Note! The values below are only a test case, you need to use 4000 draws for alpha and beta in the nal report.
Hint! Use the same equation for the MCSE of E[θ] as earlier (pVar[θ]/S), but now replace S with Se . To compute Var[θ] with importance sampling, use the identity Var[θ] = E[θ2] − E[θ]2.
alpha
## [1] 1.896 -3.600 0.374 0.964 -3.123 -1.581 beta
## [1] 24.76 20.04 6.15 18.65 8.16 17.40 round(posterior_mean(alpha = alpha, beta = beta),3)
## [1] 0.503 8.275




Reviews
There are no reviews yet.