Description
Saket Choudhary (USCID: 2170058637) (skchoudh@usc.edu)
09/28/2015
Chapter 11: 10
library(ggplot2)
data <- read.csv(‘data_ch9_16.csv’, header=T)
data$PollenRemovedlogit = log(data$PollenRemoved/(1-data$PollenRemoved)) data$DurationOfVisitlog = log(data$DurationOfVisit)
ggplot(data, aes(x=DurationOfVisitlog, y=PollenRemovedlogit, color=BeeType)) + geom_point(shape=1) + scale_colour_hue(l=50) + geom_smooth(method=lm,
se=FALSE)
µ{PollenRemovedLogit|DuratioOfV isitlog,BeeType} = β0 + β1DurationOfV isitlog
+ β2BeeType + β3BeeType ∗ DurationOfV isitlog
lmfit <- lm(PollenRemovedlogit ~ BeeType + DurationOfVisitlog
+ BeeType*DurationOfVisitlog, data=data)
summary(lmfit)
##
## Call:
## lm(formula = PollenRemovedlogit ~ BeeType + DurationOfVisitlog +
## BeeType * DurationOfVisitlog, data = data)
##
## Residuals:
## Min 1Q Median 3Q Max
## -1.3803 -0.3699 0.0307 0.4552 1.1611
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -3.0390 0.5115 -5.941 4.45e-07 ***
## BeeTypeWorker 1.3770 0.8722 1.579 0.122
## DurationOfVisitlog 1.0121 0.1902 5.321 3.52e-06 ***
## BeeTypeWorker:DurationOfVisitlog -0.2709 0.2817 -0.962 0.342
## —
## Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ‘ 1
##
## Residual standard error: 0.6525 on 43 degrees of freedom ## Multiple R-squared: 0.6151, Adjusted R-squared: 0.5882
## F-statistic: 22.9 on 3 and 43 DF, p-value: 5.151e-09
r <- residuals(lmfit) yh <- predict(lmfit)
p1<-ggplot(lmfit, aes(.fitted, .resid))+geom_point() p1 <- p1 +geom_hline(yintercept=0)+geom_smooth() +
geom_text(aes(label=ifelse((.resid>4*IQR(.resid)|.fitted>4*IQR(.fitted)),paste(”, ” ” p1
, .fitted, “,”, .
## geom_smooth: method=”auto” and size of largest group is <1000, so using loess. Use ‘method = x’ to ch
From the residual plot, there seem to be no outliers(see outlier detection part in the last code chunk wheere a outlier is defined if it is greater than 4*IQR(x)).
Also the p-value of cross interaction term BeeTypeWorker : DurationofV isitlog is 0.342 and hence at a significance level of 0.05 can be safely neglected.
Chapter 11: 21
N
SS(β0,β1 …βn) = Xwi(Yi − β0 − β1X1i − β2X2i − ··· − βpXpi)2
i=1
N
X
wi(Yi − β0 − β1X1i − β2X2i − ··· − βpXpi) × −1 = 0
i=1
N
nβ0Xwi + β1XwiX1i + β2XwiX2i + ··· + βp XwiXpi = XwiYi
i=1
N
X
wi(Yi − β0 − β1X1i − β2X2i − ··· − βpXpi) × −X1i = 0
i=1
N
β0XwiX1i + β1XwiX12i + β2XwiX2iX1i + ··· + βp XwiXpiX1i = XwiX1iYi
i=1
Similarly,
N
X
wi(Yi − β0 − β1X1i − β2X2i − ··· − βpXpi) × −X1i = 0
∂βp
i=1
N
β0XwiXpi + β1XwiX1iXpi + β2XwiX2iXpi + ··· + βp XwiXpi2 = XwiXpiYi
i=1
To prove that this is indeed the minimum, we need to show that
∂2SS
is convex:
∂2SS X wiX
Similarly for any 1 ≤ j ≤ p: ∂2SS
i
XwiXji2 ≥ 0
i
And for
k 6= j
:
∂2SS X
= 2 wiXjiXki ≥ 0
∂βjβk i
Pi wiX12i Pi wiX1iX2i
P wiX2iX1i Pi wiX22i
i
…Pi wiXniX1i Pi wiX22i …
…
… Pi wiX1iXni
P wiX2iXni i
Pi wiXni2
(We can take the wi out by factoring that as a separate vector) and then each element in the remaining matrix in this case can be written as Hij = xiTxj and hence this is a Gram matrix and positive definite, hence minima at the above point is guaranteed.




Reviews
There are no reviews yet.