Description
Bayesian Statistics and Data Analysis
Assignment 3
General information
The recommended tool in this course is R (with the IDE R-Studio). You can download R here and R-Studio here. There are many tutorials, videos and introductions to R and R-Studio online. You can nd some initial hints from RStudio Education pages.
When working with R, we recommend writing the report using R markdown and the provided R markdown template. The template includes the formatting instructions and how to include code and gures.
Instead of R markdown, you can use other software to make the PDF report, but you should use the same instructions for formatting. These instructions are also available in the PDF produced from the R markdown template.
We supply a Google Colab notebook that you can also use for the assignments. We have included the installation of all necessary R packages; hence, this can be an alternative to using your own local computer. You can nd the notebook here. You can also open the notebook in Colab here.
Report all results in a single and anonymous pdf. Note that no other formats are allowed.
The course has its own R package bsda with data and functionality to simplify coding. To install the package, just run the following (upgrade=”never” skips question about updating other packages):
1. install.packages(“remotes”)
2. remotes::install_github(“MansMeg/BSDA”, subdir = “rpackage”, upgrade=”never”)
Many of the exercises can be checked automatically using the R package markmyassignment. you can nd information on how to install and use the package here. There is no need to include markmyassignment results in the report.
You can nd common questions and answers regarding the installation and technical problems in Frequently Asked Questions (FAQ).
You can nd deadlines and information on how to turn in the assignments in Studium.
If you have any suggestions or improvements to the course material, please post in the course chat feedback channel, create an issue, or submit a pull request to the public repository here.
It is mandatory to include the following parts in all assignments (these are included already in the template):
1. Time used for reading: How long time took the reading assignment (in hours)
2. Time used for the assignment: How long time took the basic assignment (in hours)
3. Good with assignment: Write one-two sentences of what you liked with the assignment/what we should keep for next year.
4. Things to improve in the assignment: Write one-two sentences of what you think can be improved in the assignment. Can something be clari ed further? Did you get stuck on stu unrelated to the content of the assignment etc.
To pass (G) the assignment, you need 70% of the total points. To pass with distinction (VG), you need 90% of the total points. See the grading information on the point allocations for each assignment.
Information on this assignment
This assignment is related to Chapters 2 and 3.
Reading instructions: Chapter 2 and 3 in BDA3, see reading instructions. Use Frank Harrell’s recommendations on how to state results in Bayesian two group comparisons (and note that there is no point null hypothesis testing in this assignment).
To use markmyassignment for this assignment, run the following code in R:
library(markmyassignment) assignment_path <paste(“https://github.com/MansMeg/BSDA/”,
“blob/main/assignments/tests/assignment3.yml”, sep=””)
set_assignment(assignment_path)
# To check your code/functions, just run
mark_my_assignment()
Don’t include markmyassignment results in the report.
1. Inference for normal mean and deviation
A factory has a production line for manufacturing car windshields. A sample of windshields has been taken for testing hardness. The observed hardness values y1 can be found in the windshieldy1 dataset. The data can be accessed from the bsda R package as follows:
library(bsda) data(“windshieldy1”) head(windshieldy1)
## [1] 13.357 14.928 14.896 15.297 14.820 12.067
Below are test examples that can be used. The functions below can also be tested with markmyassignment. Note! This is only a test case. You need to change to the full data windshieldy above when reporting your results.
windshieldy_test <- c(13.357, 14.928, 14.896, 14.820)
In the report, formulate (1) model likelihood, (2) the prior, and (3) the resulting posterior.
a) What can you say about the unknown µ? Summarize your results using Bayesian point estimate (i.e. E(µ|y)), a posterior interval (95%), and plot the density. A test example can be found below for an uninformative prior. Note! Posterior intervals are also called credible intervals and are di erent from con dence intervals.
mu_point_est(data = windshieldy_test)
## [1] 14.5
mu_interval(data = windshieldy_test, prob = 0.95)
## [1] 13.3 15.7
b) What can you say about the hardness of the next windshield coming from the production line before actually measuring the hardness? Summarize your results using Bayesian point estimate, a predictive interval (95%), and plot the density. A test example can be found below.
mu_pred_point_est(data = windshieldy_test)
## [1] 14.5
mu_pred_interval(data = windshieldy_test, prob = 0.95)
## [1] 11.8 17.2
Note! Predictive intervals are di erent from posterior intervals.
Hint With a conjugate prior a closed form posterior is Student’s t form (see equations in the book). R users can use the dt function after doing input normalisation. We have added an R function dtnew() in the bsda R package which does that. For generating samples, you can use the corresponding rtnew function.
2. Inference for the di erence between proportions
In the report, formulate (1) model likelihood, (2) the prior, and (3) the resulting posterior.
a) Summarize the posterior distribution for the odds ratio, (p1/(1 − p1))/(p0/(1 − p0)). Compute the point estimate, a posterior interval (95%), and plot the histogram. Use Frank Harrell’s recommendations how to state results in Bayesian two group comparison. Below is a test case on how the odd ratio should be computed. Note! This is only a test case. You need to change to the real posteriors when reporting your results.
set.seed(4711) p0 <- rbeta(100000, 5, 95) p1 <- rbeta(100000, 10, 90) posterior_odds_ratio_point_est(p0 = p0, p1 = p1)
## [1] 2.676
posterior_odds_ratio_interval(p0 = p0, p1 = p1, prob = 0.9)
## [1] 0.875 6.059
b) Discuss the sensitivity of your inference to your choice of prior density with a couple of sentences.
Hint With a conjugate prior, a closed-form posterior is the Beta form for each group separately (see equations in the book). You can use rbeta() to sample from the posterior distributions of p0 and p1, and use these samples and odds ratio equation to get samples from the distribution of the odds ratio.
3. Inference for the di erence between normal means
Consider a case where the same factory has two production lines for manufacturing car windshields. Independent samples from the two production lines were tested for hardness. The hardness measurements for the two samples y1 and y2 are given in the les windshieldy1.txt and windshieldy2.txt. These can be accessed directly with
data(“windshieldy1”) data(“windshieldy2”)
We assume that the samples have unknown standard deviations σ1 and σ2.
In the report, formulate (1) model likelihood, (2) the prior, and (3) the resulting posterior.
Use uninformative or weakly informative priors and answer the following questions:
a) What can you say about µd = µ1 − µ2? Summarize your results using a Bayesian point estimate, a posterior interval (95%), and plot the histogram. Use Frank Harrell’s recommendations how to state results in Bayesian two group comparison.
b) Given the model used, what is the probability that the means are exactly the same (µ1 = µ2)? Explain your reasoning.
Hint With a conjugate prior, a closed-form posterior is Student’s t form for each group separately (see equations in the book). You can use rt() function to sample from the posterior distributions of µ1 and µ2, and use these samples to get samples from the distribution of the di erence µd = µ1 − µ2. Be careful to scale them and shift them according to their mean and variance values in R, as described above.
Hint Posterior distributions of µ1 and µ2 are continuous, and thus the posterior distribution of the di erence µd = µ1 − µ2 is also continuous. What is the probability that µd = 0?




Reviews
There are no reviews yet.