Description
• You should submit all your code as well as any graphs that you might plot. Do not submit answers to theoretical questions.
• Do not submit the datasets.
• Include a single write-up (pdf) file which includes a brief description for each question explaining what you did. Include any observations and/or plots required by the question in this single write-up file.
• You should use Python/MATLAB for all your programming solutions.
• Your code should have appropriate documentation for readability.
• You will be graded based on what you have submitted as well as your ability to explain your code.
• Refer to the course website for assignment submission instructions.
• This assignment is supposed to be done individually. You should carry out all the implementation by yourself.
• Many of the problems below have been adapted from the Machine Learning course offered by Andrew Ng at Stanford.
1. (20 points) Linear Regression
In this problem, we will implement least squares linear regression to predict density of wine based on its acidity. Recall that the error metric for least squares is given by:
where hθ(x) = θTx and all the symbols are as discussed in the class. The files “linearX.csv”and “linearY.csv”contain the acidity of the wine (x(i)’s, x(i) ∈ R) and its density (y(i)’s, y(i) ∈ R), respectively, with one training example per row. We will implement least squares linear regression to learn the relationship between x(i)’s and y(i)’s.
(a) (8 points) Implement batch gradient descent method for optimizing J(θ). Choose an appropriate learning rate and the stopping criteria (as a function of the change in the value of J(θ)). You can initialize the parameters as θ = ~0 (the vector of all zeros). Do not forget to include the intercept term. Report your learning rate, stopping criteria and the final set of parameters obtained by your algorithm.
(b) (3 points) Plot the data on a two-dimensional graph and plot the hypothesis function learned by your algorithm in the previous part.
(c) (3 points) Draw a 3-dimensional mesh showing the error function (J(θ)) on z-axis and the parameters in the x − y plane. Display the error value using the current set of parameters at each iteration of the gradient descent. Include a time gap of 0.2 seconds in your display for each iteration so that the change in the function value can be observed by the human eye.
(d) (3 points) Repeat the part above for drawing the contours of the error function at each iteration of the gradient descent. Once again, chose a time gap of 0.2 seconds so that the change be perceived by the human eye.
(e) (3 points) Repeat the part above (i.e. draw the contours at each learning iteration) for the step size values of η = {0.1,0.5,0.9,1.3,1.7,2.1,2.5}. What do you observe? Comment.
2. (15 points) Locally Weighted Linear Regression
In this problem, we will generalize the ideas of linear regression to implement locally weighted linear regression where we want to “weigh” different training examples differently. Specifically, suppose we want to minimize
.
In the matrix notation, the error function above can alternately be written as:
for an appropriate diagonal matrix W, and where X and Y are as defined in class. In class, we worked out what happens for the case where all the weights (the w(i)’s) are the same. We derived an analytical solution for the parameters θ using what is called the normal equation:
XTXθ = XTY,
The value of θ that minimizes J(θ) is given by (XTX)− XTY . By finding the derivative OθJ(θ) and setting that to zero, we can generalize the normal equation to the weighted setting above. Find the new value of θ that minimizes J(θ) in closed form as a function of X, W and Y .
The files “weightedX.csv”and “weightedY.csv”contain the inputs (x(i)’s) and outputs (y(i)’s), respectively, for a regression problem, with one training example per row.
(a) (4 points) Implement (unweighted) linear regression (y = θTx) on this dataset (using the normal equations), and plot on the same figure the data and the straight line resulting from your fit. (Remember to include the intercept term.)
(b) (8 points) Implement locally weighted linear regression on this dataset (using the weighted normal equations you derived above), and plot on the same figure the data and the curve resulting from your fit. When evaluating hθ(.) at a query point x, use weights
with a bandwidth parameter τ = 0.8. (Again, remember to include the intercept term.)
(c) (3 points) Repeat (b) four times, with τ = 0.1,0.3,2 and 10. Which value of τ do you think works best? Comment on what happens to the fit when τ is too small or too large.
3. (15 points) Logistic Regression
Consider the log-likelihood function for logistic regression:
For the following, you will need to calculate the value of the Hessian H of the above function.
(a) (10 points) The files “logisticX.csv”and “logisticY.csv”contain the inputs (x(i) ∈ R2) and outputs (y(i) ∈ {0,1}) respectively for a binary classification problem, with one training example per row. Implement1 Newton’s method for optimizing ), and apply it to fit a logistic regression model to the data. Initialize Newton’s method with θ = 0 (the vector of all zeros). What are the coefficients θ resulting from your fit? (Remember to include the intercept term.)
(b) (5 points) Plot the training data (your axes should be x1 and x2 , corresponding to the two coordinates of the inputs, and you should use a different symbol for each point plotted to indicate whether that example had label 1 or 0). Also plot on the same figure the decision boundary fit by logistic regression. (i.e., this should be a straight line showing the boundary separating the region where h(x) > 0.5 from where h(x) ≤ 0.5.)
4. (25 points) Gaussian Discrmimant Analysis
In this problem, we will implement GDA for separating out salmons from Alaska and Canada. Each salmon is represented by two attributes x1 and x2 depicting growth ring diameters in 1) fresh water, 2) marine water, respectively. File “q4x.dat” stores the two attribute values with one entry on each row. File “q4y.dat” contains the target values (y(i)’s ∈{Alaska, Canada}) on respective rows.
(a) (6 points) Implement Gaussian Discriminant Analysis using the closed form equations described in class. Assume that both the classes have the same co-variance matrix i.e. Σ0 = Σ1 = Σ. Report the values of the means, µ0 and µ1, and the co-variance matrix Σ.
(b) (2 points) Plot the training data corresponding to the two coordinates of the input features, and you should use a different symbol for each point plotted to indicate whether that example had label Canada or Alaska.
(c) (3 points) Describe the equation of the boundary separating the two regions in terms of the parameters µ0,µ1 and Σ. Recall that GDA results in a linear separator when the two classes have identical covariance matrix. Along with the data points plotted in the part above, plot (on the same figure) decision boundary fit by GDA.
(d) (6 points) In general, GDA allows each of the target classes to have its own covariance matrix. This results (in general) results in a quadratic boundary separating the two class regions. In this case, the maximum-likelihood estimate of the co-variance matrix Σ0 can be derived using the equation:
(1)
And similarly, for Σ1. The expressions for the means remain the same as before. Implement GDA for the above problem in this more general setting. Report the values of the parameter estimates i.e. µ0, µ1, Σ0, Σ1.
(e) (5 points) Describe the equation for the quadratic boundary separating the two regions in terms of the parameters µ0, µ1 and Σ0, Σ1. On the graph plotted earlier displaying the data points and the linear separating boundary, also plot the quadratic boundary obtained in the previous step.
(f) (3 points) Carefully analyze the linear as well as the quadratic boundaries obtained. Comment on your observations.




Reviews
There are no reviews yet.