100% Guaranteed Results


GR5206 – Lab 2 Solved
$ 20.99
Category:

Description

5/5 – (1 vote)

Hanao Li UNI: hl3202
Instructions
Part (A): Simple Linear Regression Model
1. Import the diamonds_small.csv dataset into R and store in a dataframe called diamonds. Use the lm() command to regress price (response) on carat (predictor) and save this result as lm0. What are the coefficients of lm0? (Some of this problem is solved for you below.)

Recall from lecture that the estimates β^0 and β^1 that you just calculated with lm() are functions of the data values and are therefore themselves are random (they inherit variability from the data). If we were to recollect the diamonds data over and over again, the estimates would be different each time.
In this lab we’ll use bootstrapping to answer the following questions:
Part (B): How Does β^1 Vary?
Strategy: we’ll re-sample (price, carat) pairs in order to provide an estimate for how β^1 varies across samples.
1. How many rows are in the diamonds dataset? Call this value n.

2. We’ll next use the sample() function to re-sample n rows of the diamonds dataset with replacement. The following code provides a single re-sample of the values 1,2,…,n, or a single re-sample of the rows of the dataset.

Now write a loop to calculate B <- 1000 such re-samples and store them as rows of the matrix resampled_values which will have B rows and n columns.

3. Now we’ll use each re-sampled dataset to provide a new estimate of β^1. Write a line of code that uses resample1 above to produce a resamples dataset of (price, carat) pairs. Using the re-sampled dataset, use lm() to produce new estimates of β^0 and β^1. These values should be stored in a vector called resample1_ests.

Hint: (a) Note that the following code produces the re-sampled dataset from the re-sampled values:

Hint: (b) You’ll probably want to use the coefficients() function.
4. Repeat the above call for each re-sampled dataset produced from the resampled_values matrix. We’ll store the new coefficient estimates in a matrix resampled_ests with B rows and 2 columns. Again you’ll want to write a loop, this time that iterates over the rows of resampled_values. (Note that if you are very clever this could be done using apply().) Make sure to print head(resample_ests) at the end.

5. Recall from lecture that (β^(1b))Bb=1 − β^1 approximates the sampling distribution of β^1 − β1 where β1 is the population parameter, β^1 is the estimate from out original dataset, and (β^(1b))Bb=1 are the B bootstrap estimates.
Make a vector diff_estimates that holds the differences between the original estimate of β^1 from lm0 and the bootstrap estimates. It should have length B.

6. Plot a histogram of the bootstrap estimates of β^1 (they’re in the `Slope_Est’ column). Label the x-axis appropriately.

7. Calculate the standard deviation of the bootstrap estimates.

Part (C): Bootstrap Confidence Intervals
Note: This section is optional. If you get the chance to do it during lab, great, but it’s not necessary that this part is completed when you turn in the lab.
Finally we’d like to approximate confidence intervals for the regression coefficients. Recall that a confidence interval is a random interval which contains the truth with high probability (the confidence level). If the confidence interval for β1 is C, and the confidence level is 1 − α, then we want
Pr(β1 ∈ C) = 1 − α
no matter what the true value of β1.
We estimate the confidence interval from the bootstrap estimates by finding a range of (β^(1b))Bb=1 − β^1 which holds 1 − alpha percent of the values. In our case, let α = 0.05, so we estimate a confidence interval with level 0.95.
1. Let Cu and Cl be the upper and lower limits of the confidence interval. Use the quantile() function to find the 0.025 and 0.975 quantiles of the vector diff_estimates calculated in B(5). Then Cu is the sum of the original estimate of β^1 from lm0 with the upper quantile and Cl is the sum of the original estimate of β^1 from lm0 with the lower quantile.

2. Instead if traditional bootstrap intervals, construct percentile based bootstrap intervals. Use the quantile() function to find the 0.025 and 0.975 quantiles of the vector resampled_ests[, “Slope_Est”] calculated in B(4).

Reviews

There are no reviews yet.

Be the first to review “GR5206 – Lab 2 Solved”

Your email address will not be published. Required fields are marked *

Related products