PAR – Assignment #4: Cross Validation Solved

Description

5/5 – (1 vote)

Problem 1
This question should be answered using the Default data set. In Chapter 4 on classification, we used logistic regression to predict the probability of default using income and balance. Now we will estimate the test error of this logistic regression model using the validation set approach. Do not forget to set a random seed before beginning your analysis.
(a) Fit a logistic regression model that predicts default using income and balance.
(b) Using the validation set approach, estimate the test error of this model. You need to perform the following steps:
i. Split the sample set into a training set and a validation set.
ii. Fit a logistic regression model using only the training data set.
iii. Obtain a prediction of default status for each individual in the validation set using a threshold of 0.5.
iv. Compute the validation set error, which is the fraction of the observations in the validation set that are misclassified.
(c) Repeat the process in (b) three times, using three different splits of the observations into a training set and a validation set. Comment on the results obtained.
(d) Consider another logistic regression model that predicts default using income, balance and student (qualitative). Estimate the test error for this model using the validation set approach. Does including the qualitative variable student lead to a reduction of test error rate?

Problem 2
This question requires performing cross validation on a simulated data set.
(a) Generate a simulated data set as follows:
set.seed(1) x=rnorm(200) y=x-2*x^2+rnorm(200)
In this data set, what is 𝑛 and what is 𝑝? Write out the model used to generate the data in equation form (i.e., the true model of the data).
(b) Create a scatter plot of 𝑌 vs 𝑋. Comment on what you find.
(c) Consider the following four models for the data set:
i. 𝑌=𝛽0+𝛽1𝑋+𝜖 ii. 𝑌=𝛽0+𝛽1𝑋+𝛽2𝑋2+𝜖 iii. 𝑌=𝛽0+𝛽1𝑋+𝛽2𝑋2+𝛽3𝑋3+𝜖 iv. 𝑌=𝛽0+𝛽1𝑋+𝛽2𝑋2+𝛽3𝑋3+𝛽4𝑋4+𝜖
Compute the LOOCV errors that result from fitting these models.
(d) Repeat (c) using another random seed, and report your results. Are your results the same as what you got in (c)? Why?
(e) Which of the models in (c) had the smallest LOOCV error? Is this what you expected? Explain your answer.
(f) Now we use 5-fold CV for the model selection. Compute the CV errors that result from fitting the four models. Which model has the smallest CV error? Are the results consistent with LOOCV?
(g) Repeat (f) using 10-fold CV. Are the results the same as 5-fold CV?

Submit through link: eCampus -> Homework->Assignment 4 Submission

Reviews

There are no reviews yet.

Be the first to review “PAR – Assignment #4: Cross Validation Solved”

PAR – Assignment #4: Cross Validation Solved

Description

Reviews

Related products

PAR – Assignment #3: Classification Solved

PA – PGM Programming Assignment: Solved

PA – PGM Programming Assignment: Solved

PA – PGM Programming Assignment: Solved

PA – PGM Programming Assignment: Solved