Math4630 – Solved – Assignment Hub

Description

5/5 – (1 vote)

4630 Assignment 3
Ravish Kamath: 213893664
Functions Created
Here are some functions I have created to save time when completing questions.
xbar = function(matrix){ n = dim(matrix)[1] onevec = rep(1,n)
xbar = 1/n*t(matrix)%*%onevec return(xbar)
}
Spool = function(dat){
#Getting the n lengths and parameters
nbar = rep(0,length(dat)) for (i in 1: length(dat)){ nbar[i] = dim(dat[[i]])[1]
} n = sum(nbar)
params = dim(dat[[1]])[2] g = length(dat)
#Getting the variance matrices S = list() for (i in 1:g){
S = append(S,list(var(dat[[i]])))
}
#Calculating the Within Matrix
W = matrix(0,params,params) Wnew = matrix(0,params,params) for (i in 1:g){
Wnew = as.matrix(lapply(S[i], * , (nbar[i] -1)))
Wnew = matrix(unlist(Wnew), ncol = params, byrow = T) W = W + Wnew
}
#s_pooled
result = W/(n – g) return(result)
}
spectral = function(matrix){ eig = eigen(matrix) D = diag(eig$values) P = eig$vectors
matsqrt = P%*%sqrt(D)%*%t(P)

output = list(D, P, matsqrt) names(output) = c( D , P , SqrtMat )
return(output)
}

Question 1
Consider the data given in the EXCEL file tab “q1”.
(a) State the multivariable linear regression model with all the necessary assumptions.
(b) Find the predicted model.
(c) Test the significance of the model.
(d) Regarless of your result in part (c), test if X1 is significant? How about X2?
(e) Find a 95% woking Hotelling confidence region for the mean response when X1 = 192 and X2 = 152. (f) Find a 95% woking Hotelling prediction region for a new response when X1 = 192 and X2 = 152.
Solution
Part A
Y = X— + ‘
˜ ˜ ˜
where ˜Y = [Y(1),Y(2)], X = [˜1,X(1),X(2)], ˜— = [—(0),—(1),—(2)] and ˜‘ = [‘(1),‘(2)]
Assumptions: E(˜‘(i)) = ˜0 V (˜‘(i)) = ‡iiI cov(˜‘(i),˜‘(j)) = ‡ijI
Part B
For this section we will show both ways of getting the predicted model. One will be through the actual LS method, and the other will be using the lm function built in R.
Least Squares Method
Y = as.matrix(q1df[,1:2]) X = as.matrix(q1df[,3:4]) onevec = rep(1, dim(X)[1]) Xnew = cbind(onevec,X)
beta_coef = solve(crossprod(Xnew))%*%crossprod(Xnew, Y) beta_coef
## y1 y2 ## onevec 28.0020858 35.8023808 ## x1 0.3876325 0.2446617 ## x2 0.5766253 0.4713147

Using the built in function in R
fit = lm(cbind(y1, y2) ~ x1 + x2, data = q1df) summary(fit)
## Response y1 :
##
## Call:
## lm(formula = y1 ~ x1 + x2, data = q1df)
##
## Residuals:
## Min 1Q Median 3Q Max
## -13.361 -3.794 0.001 4.041 17.925
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|) ## (Intercept) 28.0021 29.8446 0.938 0.358
## x1 0.3876 0.2454 1.579 0.129
## x2 0.5766 0.3673 1.570 0.131
##
## Residual standard error: 6.564 on 22 degrees of freedom
## Multiple R-squared: 0.5837, Adjusted R-squared: 0.5459
## F-statistic: 15.43 on 2 and 22 DF, p-value: 6.502e-05
## ##
## Response y2 :
##
## Call:
## lm(formula = y2 ~ x1 + x2, data = q1df)
##
## Residuals:
## Min 1Q Median 3Q Max ## -7.6192 -3.2907 -0.5662 2.5827 10.7487
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|) ## (Intercept) 35.8024 23.8778 1.499 0.148
## x1 0.2447 0.1964 1.246 0.226
## x2 0.4713 0.2938 1.604 0.123
##
## Residual standard error: 5.252 on 22 degrees of freedom
## Multiple R-squared: 0.5349, Adjusted R-squared: 0.4926
## F-statistic: 12.65 on 2 and 22 DF, p-value: 0.0002205 Part C
We have our null hypothesis to be H0 : X1 = X2 = 0. Let eˆ = Y ≠ Yˆ and ˆ = n≠1eˆTeˆ. Below is the calculation for or sample variance.
Yhat = Xnew%*%beta_coef ehat = Y – Yhat n = dim(X)[1]
Sigmahat = 1/n*t(ehat)%*%ehat
Sigmahat
## y1 y2 ## y1 37.9206 10.70300 ## y2 10.7030 24.27345
Now to calculate our test statistic (which will be using the Wilks Lambda), we need too solve for SSE and SST. We have shown the calculation below. Recall that SSE = nˆ and SST = (Y ≠ Yˆ)T(Y ≠ Yˆ). Below are the given outputs for SSE and SST.
SSE = n*Sigmahat
SSE
## y1 y2 ## y1 948.015 267.5750
## y2 267.575 606.8363
ybar = colMeans(Y) n = nrow(Y) m = ncol(Y) r = ncol(X)
q = 1
Ybar = matrix(ybar, n ,m, byrow = T)
SST = crossprod(Y – Ybar)
SST
## y1 y2 ## y1 2277.44 1230.04 ## y2 1230.04 1304.64
We now finally calculate the Wilks Lambda and complete the Bartlett method. Let it be shown that Wilks Lambda ( ) is = |SSESST|. Furthermore the formula for the Bartlett test statistic is as follows:
| |

The output is shown below.
WilksLam = det(SSE)/det(SST)
Bartlett = -(n – r – 1 – 1/2*(m – r + q + 1))*log(WilksLam)
df = m*(r-q) 1 – pchisq(Bartlett, df)
## [1] 1.420822e-05
Thus we get the Barlett observed value to be 22.32338 with the p-value being, 1.420822e-05. Based of the very small p-value, we will reject the null hypothesis. This implies that there is evidence that the model would be signficant.
Part D
Let H0 : X2 = 0
Same procedure as part (c), we have our statistic to be 0.8552134, which when used through Bartlett’s method, we get our test statistic to be 3.284489, with a p-value of 0.1935451. Based of the large p-value, we cannot reject H0. This implies that X2 is not significant. Below is the output code.
#Testing if Beta(2) is significant
#H_0: Beta(2) or X2 = 0
X1 = Xnew[,1:2]
Betahat1 = solve(crossprod(X1))%*%crossprod(X1,Y)
Sigmahat1 = 1/n*crossprod(Y – X1%*%Betahat1)
E = n*Sigmahat
H = n*(Sigmahat1 – Sigmahat) n = nrow(Y) m = ncol(Y) r = ncol(X1)
q = 1
WilksLam = det(E)/det(E + H)
Bartlett = -(n – r – 1 – 1/2*(m – r + q + 1))*log(WilksLam) df = m*(r-q)
1 – pchisq(Bartlett, df)
## [1] 0.1935451
Let H0 : X1 = 0
Same procedure as part (c), we have our statistic to be 0.8787331, which when used through Bartlett’s method, we get our test statistic to be 3.284489, with a p-value of 0.2573347. Based of the large p-value, we cannot reject H0. This implies that X1 is not significant. Below is the output code.
#Testing if Beta(1) is significant
#H_0: Beta(1) or X1 = 0
X2 = cbind(Xnew[,1], Xnew[,3])
Betahat2 = solve(crossprod(X2))%*%crossprod(X2,Y)
Sigmahat2 = 1/n*crossprod(Y – X2%*%Betahat2)
E = n*Sigmahat
H = n*(Sigmahat2 – Sigmahat) n = nrow(Y) m = ncol(Y) r = ncol(X2) q = 1
WilksLam = det(E)/det(E + H)
Bartlett = -(n – r – 1 – 1/2*(m – r + q + 1))*log(WilksLam) df = m*(r-q)
1 – pchisq(Bartlett, df)
## [1] 0.2573347
Part E
The formula for a 100(1 ≠–)% Confidence Region is as follow:
˜x0T—ˆ(i) ±Ú#(mn(n≠≠rr≠≠m1))$Fm,n≠r≠m,–Ò˜x0(XTX)≠1˜x0ˆii
Let ˜x0 = [192,152]T. Let us now compute ˜x0 into the formula, given above.
newobs = data.frame(x1 = 192, x2 = 152) pred = predict(fit, newobs) n = nrow(Y) m = ncol(Y) r = ncol(X)
table = sqrt( ((m*(n-r-1))/(n-r-m))*qf(0.95, df1 = m, df2 = n-r-m) ) x0 = c(1,192, 152)
sd1 = sqrt(t(x0)%*%solve(crossprod(Xnew))%*%x0*Sigmahat[1,1]) sd2 = sqrt(t(x0)%*%solve(crossprod(Xnew))%*%x0*Sigmahat[2,2]) sd = c(sd1, sd2) CR_L = pred – table*sd CR_U = pred + table*sd CR = cbind(t(CR_L), t(CR_U)) colnames(CR) = c(“Lower”, “Upper”) CR
## Lower Upper
## y1 185.4438 194.7054 ## y2 150.7123 158.1222
Hence our Confidence Region will be:
5 140154..07464173 6± 2.6951395 11..718209374688 6
Part F
Very similar to our C.R. formula from part (e), our Hotelling Prediction Region is as follows:
˜x0T—ˆ(i) ±Ú#(mn(n≠≠rr≠≠m1))$Fm,n≠r≠m,–Ò(1 + ˜x0(XTX)≠1˜x0)ˆii
Let us now use R,to calculate the P.R.
sd1 = sqrt( (1 + t(x0)%*%solve(crossprod(Xnew))%*%x0)*Sigmahat[1,1] ) sd2 = sqrt( (1 + t(x0)%*%solve(crossprod(Xnew))%*%x0)*Sigmahat[2,2] )
sd = c(sd1, sd2)
PR_L = pred – table*sd PR_U = pred + table*sd
PR = cbind(t(PR_L), t(PR_U)) colnames(PR) = c(“Lower”, “Upper”) PR
## Lower Upper
## y1 172.8440 207.3051 ## y2 140.6316 168.2029 Hence our Prediction Region will be:
5 140154..0746 6± 2.6951395 6.39319875.115 6 4173

Question 2
Timm (1975) reported the results of an experiment in which subjects’ respond time to “probe words” at five position (Y1 is at the beginning of the sentence, Y2 is in the first quartile of the sentence, Y3 is in the middle of the sentence, Y4 is in the third quartile of the sentence, and Y5 is at end of the sentence). The data are recorded in the EXCEL file tab “q2”.
(a) Use the sample variance and obtain all the principle components.
(b) Timm specifically required the reduction in dimension should cover at least 90% of the total variance. How many principle components are needed? Why?
(c) Repeat parts (a) and (b) using the sample correlation matrix.
Solution
Part A
In this part, we will try to hard code the principal components, using the sample variance. Note that we could also use the prcomp function that is usually used to calculate P.C. First we need to calculate the sample variance. The output of the sample variance is calculated below.
X = data.matrix(q2df) n = dim(X)[1] params = dim(X)[2] onevec = rep(1, n) #Mean Vector
xbar = 1/n*t(X)%*%onevec
#Variance Matrix
S = 1/(n – 1)*(t(X)%*%X – n*xbar%*%t(xbar)) S
## x1 x2 x3 x4 x5 ## x1 65.09091 33.64545 47.59091 36.77273 25.42727 ## x2 33.64545 46.07273 28.94545 40.33636 28.36364 ## x3 47.59091 28.94545 60.69091 37.37273 41.12727 ## x4 36.77273 40.33636 37.37273 62.81818 31.68182 ## x5 25.42727 28.36364 41.12727 31.68182 58.21818
Now we get the eigen value and eigen vectors. The eigen vectors will represent the Principal Components
#Getting the eigenvalues and vectors
eig = eigen(S) eig$vectors
## [,1] [,2] [,3] [,4] [,5] ## [1,] -0.4727831 0.57631826 0.41685181 0.2285789 0.4672469 ## [2,] -0.3918187 0.10826396 -0.45239805 0.6558114 -0.4472185 ## [3,] -0.4875471 -0.09600807 0.47945493 -0.3689607 -0.6221505 ## [4,] -0.4677199 0.12056819 -0.61953744 -0.5803451 0.2146494 ## [5,] -0.4080320 -0.79522446 0.08869553 0.2114962 0.3853964 Hence our Principal Components will be as follow:
Y
Y
Y
Y
Y
Solution

Part B
result = prcomp(X, scale = F)
var_exp = result$sdevˆ2/sum(result$sdevˆ2)
plot(var_exp, type = b , col = green , yaxt = n , ylab = par(new = T)
plot(cumsum(var_exp), xlab = “Principal Component”, ylab = “Proportion of Variance Explained”, main = “Scree Plot Using Covariance Matrix”, ylim = c(0, 1), type = “b”, col = blue )
legend(4.2, 0.4, legend=c(“Proportion”, “Cumulative”),
col=c(“green”, “blue”), lty=1:1, cex=0.8) , xlab = )
Scree Plot Using Covariance Matrix

Principal Component
I would say that using the first 3 P.C. would be needed to cover 90% of the total variance. When comparing P.C. 3 and P.C. 4 we oly get an extra 5% of explained variance.
Part C
Same procedure for part (a), we first need to find the correlation matrix.
sd = sqrt(diag(diag(S),5)) invsd = solve(sd) cor = invsd%*%S%*%t(invsd) cor
## [,1] [,2] [,3] [,4] [,5] ## [1,] 1.0000000 0.6143902 0.7571850 0.5750730 0.4130573 ## [2,] 0.6143902 1.0000000 0.5473897 0.7497770 0.5476595 ## [3,] 0.7571850 0.5473897 1.0000000 0.6052716 0.6918927 ## [4,] 0.5750730 0.7497770 0.6052716 1.0000000 0.5238876 ## [5,] 0.4130573 0.5476595 0.6918927 0.5238876 1.0000000
Now let us compute the P.C for the correlation matrix. But before we do that, when using the correlation matrix for PCA, we need to standardize our independent variables. Let Zi (X ‡ for i = 1,…,5.
result = prcomp(scale(X), scale = F ) print(result)
## Standard deviations (1, .., p=5):
## [1] 1.8483758 0.7838567 0.7564879 0.5207797 0.3543867 ##
## Rotation (n x k) = (5 x 5):
## PC1 PC2 PC3 PC4 PC5 ## x1 -0.4418394 0.2006104 -0.6786078 0.2125365 -0.5087760 ## x2 -0.4535595 0.4280646 0.3491277 0.6055405 0.3499642 ## x3 -0.4727808 -0.3678765 -0.3754368 -0.2581448 0.6584479 ## x4 -0.4536224 0.3934629 0.3345386 -0.7010073 -0.1899641
## x5 -0.4120276 -0.6974023 0.4058723 0.1734903 -0.3860467 Hence our principal components are as follows:
Y
Y
Y
Y
Y

Question 3
Use the data in the EXCEL file tab “q2”.
(a) Find the canonical correlation between (x1,x2,x3) and (x4,x5).
(b) Test the significance canonical correlations.
(c) Regardless of your answer in part (b), is each canonical correlation individually significant?
Solution
X = data.matrix(q3df) col_order = c( x4 , x5 , X = X[,col_order] x1 , x2 , x3 )
Part A
We first need to get the correlation matrix in order to calculate the canonical correlations.
cor(X)
## x4 x5 x1 x2 x3 ## x4 1.0000000 0.5238876 0.5750730 0.7497770 0.6052716 ## x5 0.5238876 1.0000000 0.4130573 0.5476595 0.6918927 ## x1 0.5750730 0.4130573 1.0000000 0.6143902 0.7571850 ## x2 0.7497770 0.5476595 0.6143902 1.0000000 0.5473897
## x3 0.6052716 0.6918927 0.7571850 0.5473897 1.0000000
Now let X(1) = (X4,X˜ 5) and X(2) = (X˜ 1,X˜ 2,X˜ 3)
˜
Furthermore, let the following matrices be subsections of our correlation matrix.

Let A = ≠1112

11 = 5 0.52391 0.52391 6 12 = 5 00..57514131 00..74985477 00..69196053 6
1 0.6144 0.7572
21 = 12 12 =0.6144 1 0.5474 TV
0.7572 0.5474 1
≠221 21 ≠1121. Let us use R to calculate ≠1112 and ≠221.
cor11 = matrix(c(1.0000000, 0.5238876,
0.5238876, 1.0000000),
ncol = 2, byrow = T)
cor11_sqrt = spectral(cor11)
cor11_invsqrt = solve(cor11_sqrt$SqrtMat) cor12 = matrix(c(0.5750730, 0.7497770, 0.6052716,
0.4130573, 0.5476595, 0.6918927), ncol = 3, byrow = T)
cor21 = t(cor12)
cor22 = matrix(c(1.0000000, 0.6143902, 0.7571850,
0.6143902, 1.0000000, 0.5473897,
0.7571850, 0.5473897, 1.0000000), ncol = 3, byrow = T)
cor22_inv = solve(cor22)
Now we can finally calculate our A matrix. This is shown below.
A = cor11_invsqrt%*%cor12%*%cor22_inv%*%cor21%*%cor11_invsqrt
A
## [,1] [,2] ## [1,] 0.4705581 0.2831067 ## [2,] 0.2831067 0.4371090
To find the canonical correlation, we now need to compute the eigen values of the A matrix. Using R once again we come with the following output.
canon_cor = sqrt((eigen(A)$values)) canon_cor
## [1] 0.8587396 0.4125933
By taking the square root of the eigen values of the matrix A, we get the following canonical correlations:
pú1 = 0.8587396 pú2 = 0.4125933
Part B
Let our null hypothesis be: H0 : pú1 = pú2 = 12 = 0. We will be using what Bartlett suggests: Ÿ
Let p Æ q which will be the number of variables in each group. Hence, p = 2 and q = 3. Using the test statistic formula from above we get the p-value to be 0.09923, which is quite high. Hence we cannot reject our null hypothesis. This implies that our canonical correlations are not significant. Below is the R code to provide the evidence for our statement.
n = dim(X)[1] p = min(3,2) q = max(3,2)
Bartlett = -(n – 1 – 1/2*(p + q + 1))*log(prod((1-canon_corˆ2))) Bartlett
## [1] 10.66704
df = p*q 1-pchisq(Bartlett, df)
## [1] 0.09922844
Part C
Let H(1) : pú1 = 0” , pú2 = 0. Hence our alternative hypothesis will be Ha(1) : pú2 = 0” . The Bartlett formula to find the test statistic is as follows:

#Testing p_2 not equal to 0
Bartlett = -(n – 1 – 1/2*(p + q + 1))*log((1-canon_cor[2]ˆ2))
Bartlett
## [1] 1.306275
df = (p – 1)*(q-1) 1-pchisq(Bartlett, df)
## [1] 0.5204105
Let H(2) : pú2 = 0” , pú1 = 0. Hence our alternative hypothesis will bepú2, with p1ú. Ha(2) : p1ú = 0” . Essentially we are using the same Bartlett formula except we replace
#Testing p_1 not equal to 0
Bartlett = -(n – 1 – 1/2*(p + q + 1))*log((1-canon_cor[1]ˆ2))
Bartlett
## [1] 9.360764
df = (p – 1)*(q – 1) 1-pchisq(Bartlett, df)
## [1] 0.009275471

Question 4
(a) Show that

(b) Let
f1(x) = (1 ≠ |x|) for |x| Æ 1 f2(x) = (1 ≠ |x ≠ 0.5|) for ≠ 0.5 Æ x Æ 1
Solution
Part A
Handwritten notes.
Part B.1
Probability Density Function Graph

Values of x
Part B.2
Handwritten notes.
Part B.3
Handwritten notes.

4630 Assignment 3 Ravish Kamath 213893664
Question 5
Use the data in the EXCEL file tab “q5”. Assume the data is a sample from two multivariate normal distributions.
(a) Identify the classification rule for the case p1 = p2 and c(1|2) = c(2|1).
(b) Identify the classification rule for the case p1 = 0.25 and c(1|2) is half of c(2|1). (c) Identify the Bayesian rule using p1 = 0.6.
(d) Based on the rules given in part (a), which population will a new data point (50,48,47,49) be classified into? How about using the rules in part (b)? Using the rule in part (c)?
Solution
A = as.matrix(q5df[q5df$Group == “A”,1:4])
B =as.matrix(q5df[q5df$Group ==”B”,1:4]) dat = list(A, B)
Part A
All the formulation of the classification rule will be in the handwritten notes, however here is the R code for the calculation.
xbar = function(matrix){ n = dim(matrix)[1] onevec = rep(1,n)
xbar = 1/n*t(matrix)%*%onevec return(xbar)
}
# Mean Vectors xbar1 = xbar(A) xbar2 = xbar(B) spooled = Spool(dat)
ahat = t(xbar1 – xbar2)%*%solve(spooled)
ybar1 = ahat%*%xbar1 ybar2 = ahat%*%xbar2
midpoint = 1/2*(ybar1 + ybar2) midpoint
## [,1]
## [1,] -7.062536
# Hence the classification is -7.062536
Part B
Handwritten Notes
Part C
Handwritten Notes.
17

Solution 4630 Assignment 3 Ravish Kamath 213893664
Part D.1
When we have equal cost and prior discriminants:
newobs = c(50, 48, 47, 49) ahat%*%newobs
## [,1]
## [1,] -5.481962
Since the observation is bigger than the cuto population A.
Part D.2
When p1 = 0.25, and c(1|2) is half of c(2|1) point, which was -7.062536, we will classify this obs. to
ahat%*%newobs – midpoint
## [,1] ## [1,] 1.580574
Since the cuto value for this classification is 0.4055, we will also classify this obs. to population A.
Part D.3
Most of this will be shown in the handwritten, however to calculation certain probabilities for the Bayesian rule, we need to use R. This is shown below
#When we use Bayesian Rule
probA_newob = (2*pi)ˆ(-2)*(det(spooled))ˆ(-1/2)*
exp( (-1/2)*t((newobs – xbar1))%*% solve(spooled)%*%(newobs xbar1)) probA_newob
## [,1]
## [1,] 1.475133e-09
probB_newob = (2*pi)ˆ(-2)*(det(spooled))ˆ(-1/2)*
exp( (-1/2)*t((newobs – xbar2))%*% solve(spooled)%*%(newobs –
probB_newob
## [,1]
## [1,] 3.036662e-10
A_newob = (0.6*probA_newob)/(0.6*probA_newob + 0.4*probB_newob)
A_newob
## [,1]
## [1,] 0.8793235
B_newob = (0.4*probB_newob)/(0.6*probA_newob + 0.4*probB_newob)
B_newob
## [,1]
## [1,] 0.1206765
18

Reviews

There are no reviews yet.

Be the first to review “Math4630 – Solved”

Math4630 – Solved

Description

Reviews

Related products

Math4630 – Solved