Description
Geometry of Least Squares
1. Suppose we have a dataset represented with the design matrix span(X) and response vector Y. We use linear regression to solve for this and obtain optimal weights as ✓ˆ. Draw the geometric interpretation of the column space of the design matrix span(X), the response vector Y, the residuals Y X✓ˆ, and the predictions X✓ˆ(using optimal parameters) and X↵ (using an arbitrary vector ↵).
(a) What is always true about the residuals in least squares regression? Select all that apply.
⇤ A. They are orthogonal to the column space of the design matrix.
⇤ B. They represent the errors of the predictions.
⇤ C. Their sum is equal to the mean squared error.
⇤ D. Their sum is equal to zero. ⇤ E. None of the above.
1
(b) Which are true about the predictions made by OLS? Select all that apply.
⇤ A. They are projections of the observations onto the column space of the design matrix.
⇤ B. They are linear combinations of the features.
⇤ C. They are orthogonal to the residuals.
⇤ D. They are orthogonal to the column space of the features.
⇤ E. None of the above.
(c) We fit a simple linear regression to our data (xi,yi),i = 1,2,3, where xi is the independent variable and yi is the dependent variable. Our regression line is of the form yˆ = ✓ˆ0 +✓ˆ1x. Suppose we plot the relationship between the residuals of the model and the ysˆ , and find that there is a curve. What does this tell us about our model?
⇤ A. The relationship between our dependent and independent variables is well represented by a line.
⇤ B. The accuracy of the regression line varies with the size of the dependent variable.
⇤ C. The variables need to be transformed, or additional independent variables are needed.
3
Understanding Dimensions
2. In this exercise, we will examine many of the terms that we have been working with in regression (e.g. ✓ˆ) and connect them to their dimensions and to concepts that they represent.
First, we define some notation. The n⇥p design matrixX hasX corresponds top+1 features, where the additionn observations on p features. (In lecture, we stated that we sometimes say
X:,j jth column vector in X,j = 1,…,p Xi,: ith row vector in X,i = 1,…,n
(a) X
(b) ✓ˆ
(c) X:,j
(d) X1,: · ✓ˆ
(e) X:,1 · ✓ˆ
(f) X✓ˆ
(g) (XT X) 1XT Y 1. the residuals
2. 0
3. 1st response, y1
4. 1st predicted value, yˆ1
5. 1st residual, e1
6. the estimated coefficients
7. the predicted values
(h) (I X(XT X) 1XT )Y
8. the features for a single observation
9. the value of a specific feature for all observations
10. the design matrix
As an example, for 2a, you would write: “2a. Dimension: n ⇥ p, Term: 10”.




Reviews
There are no reviews yet.