Description
General information
• The recommended tool in this course is R (with the IDE R-Studio). You can download R here and R-Studio here. There are tons of tutorials, videos and introductions to R and R-Studio online. You can find some initial hints here. • You can write the report with your preferred software, but the outline of the report should follow the instruction in the R markdown template that can be found here.
• Report all results in a single, anonymous *.pdf -file and return it to peergrade.io. • The course has its own R package with data and functionality to simplify coding. To install the package just run the following:
1. install.packages(“remotes”)
2. remotes::install_github(“avehtari/BDA_course_Aalto”, subdir = “rpackage”)
• Many of the exercises can be checked automatically using the R package markmyassignment. Information on how to install and use the package can be found here. • Additional self study exercises and solutions for each chapter in BDA3 can be found here.
• We collect common questions regarding installation and technical problems in a course Frequently Asked Questions (FAQ). This can be found here.
• If you have any suggestions or improvements to the course material, please feel free to create an issue or submit a pull request to the public repository!!
Information on this assignment
The exercises of this assignment are not necessarily related to chapter 1, but rather work to test whether or not you have sufficient knowledge to participate in the course. There are three pen and paper exercises (scanned hand written answers are ok for these) and one computer task. The maximum amount of points from this assignment is 2.
Reading instructions: Chapter 1 in BDA3, see reading instructions here
Grading instructions: The grading will be done in peergrade. All grading questions and evaluations for exercise 1 can be found here
To use markmyassignment for this assignment, run the following code in R:
> library(markmyassignment) > exercise_path <-
“https://github.com/avehtari/BDA_course_Aalto/blob/master/exercises/tests/ex1.yml”
> set_assignment(exercise_path)
> # To check your code/functions, just run
> mark_my_assignment()
• probability
• probability mass
• probability density
• probability mass function (pmf)
• probability density function (pdf)
• probability distribution
• discrete probability distribution
• continuous probability distribution
• cumulative distribution function (cdf)
• likelihood
2. (Basic computer skills) This task deals with elementary plotting and computing skills needed during the rest of the course. You can use either R or Python, although R is the recommended language and we will only guarantee support in R. For documentation in R, just type ?{function name here}.
a) Plot the density function of Beta-distribution, with mean µ = 0.2 and variance σ2 = 0.01. The parameters α and β of the Beta-distribution are related to the mean and variance according to the following equations
.
Hint! Useful R functions: seq(), plot() and dbeta(). Later on we will also use the more flexible ggplot2 for plotting.
b) Take a sample of 1000 random numbers from the above distribution and plot a histogram of the results. Compare visually to the density function.
Hint! Useful R functions: rbeta() and hist()
c) Compute the sample mean and variance from the drawn sample. Verify that
they match (roughly) to the true mean and variance of the distribution.
Hint! Useful R functions: mean() and var()
d) Estimate the central 95% probability interval of the distribution from the drawn samples.
Hint! Useful R functions: quantile()
3. (Bayes’ theorem) A group of researchers has designed a new inexpensive and painless test for detecting lung cancer. The test is intended to be an initial screening test for the population in general. A positive result (presence of lung cancer) from the test would be followed up immediately with medication, surgery or more extensive and expensive test. The researchers know from their studies the following facts:
• Test gives a positive result in 98% of the time when the test subject has lung cancer.
• Test gives a negative result in 96 % of the time when the test subject does not have lung cancer.
• In general population approximately one person in 1000 has lung cancer.
The researchers are happy with these preliminary results (about 97% success rate), and wish to get the test to market as soon as possible. How would you advise them? Base your answer on Bayes’ rule computations.
4. (Bayes’ theorem) We have three boxes, A, B, and C. There are
• 2 red balls and 5 white balls in the box A,
• 4 red balls and 1 white ball in the box B, and
• 1 red ball and 3 white balls in the box C.
Consider a random experiment in which one of the boxes is randomly selected and from that box, one ball is randomly picked up. After observing the color of the ball it is replaced in the box it came from. Suppose also that on average box A is selected 40% of the time and box B 10% of the time (i.e. P(A) = 0.4).
a) What is the probability of picking a red ball?
b) If a red ball was picked, from which box it most probably came from?
Implement two functions in R that computes the probabilities. Below is an example of how the functions should be named and work if you want to check them with markmyassignment.
Note! This is a test case, you need to change the number in the matrix to the numbers in the exercise.
> boxes <- matrix(c(2,2,1,5,5,1), ncol = 2, dimnames = list(c(“A”, “B”, “C”), c(“red”, “white”)))
> boxes
red white
A 2 5
B 2 5
C 1 1
> p_red(boxes = boxes)
[1] 0.3928571
> p_box(boxes = boxes)
[1] 0.29090909 0.07272727 0.63636364
5. (Bayes’ theorem) Assume that on average fraternal twins (two fertilized eggs and then could be of different sex) occur once in 150 births and identical twins (single egg divides into two separate embryos, so both have the same sex) once in 400 births (Note! This is not the true values, see Exercise 1.6, page 28, in BDA3). American male singer-actor Elvis Presley (1935 – 1977) had a twin brother who died in birth. What is the probability that Elvis was an identical twin? Assume that an equal number of boys and girls are born on average.
Implement this as a function in R that computes the probability. Below is an example of how the functions should be named and work if you want to check your result with markmyassignment.
> p_identical_twin(fraternal_prob = 1/125, identical_prob = 1/300)
[1] 0.4545455
> p_identical_twin(fraternal_prob = 1/100, identical_prob = 1/500)
[1] 0.2857143




Reviews
There are no reviews yet.