ML – Labs Solved – Assignment Hub

Description

5/5 – (1 vote)

Machine Learning Course
EPFL
Martin Jaggi & Ru¨diger Urbanke mlo.epfl.ch/page-136795.html epfmlcourse@gmail.com

(Efficient Python/NumPy Programming)
Introduction
For computational efficiency of typical operations in machine learning applications, it is very beneficial to use NumPy arrays together with vectorized commands, instead of explicit for loops. The vectorized commands are better optimized, and bring the performance of Python code (and similarly e.g. for Matlab) closer to lower level languages like C. In this exercise, you are asked to write efficient implementations for three small problems that are typical for the field of machine learning.
Getting Started
Follow the Python setup tutorial provided on our github repository here:
github.com/epfml/ML course/tree/master/labs/ex01/python setup tutorial.md
After you are set up, we recommend downloading the folder of ex01, and start by filling in the template notebooks for each of the 3 tasks below.
To get more familiar with vector and matrix operations using NumPy arrays, it is also recommended to go through the npprimer.ipynb notebook.
Useful Commands
We give a short overview over some commands that prove useful for writing vectorized code. You can read the full documentation and examples by issuing help(func).
At the beginning: import numpy as np
• a * b, a / b: element-wise multiplication and division of matrices (arrays) a and b
• a.dot(b): matrix-multiplication of two matrices a and b
• a.max(0): find the maximum element for each column of matrix a (note that NumPy uses zero-based indices, while Matlab uses one-based)
• a.max(1): find the maximum element for each row of matrix a
• np.mean(a), np.std(a): compute the mean and standard deviation of all entries of a
• a.shape: return the array dimensions of a
• a.shape[k]: return the size of array a along dimension k
• np.sum(a, axis=k): sum the elements of matrix a along dimension k
• linalg.inv(a): returns the inverse of a square matrix a
A broader tutorial can be found here: http://www.engr.ucsb.edu/~shell/che210d/numpy.pdf
For users who were more familiar with Matlab, a nice comparison of the analogous functions can be found here:
https://docs.scipy.org/doc/numpy-dev/user/numpy-for-matlab-users.html
Figure 1: Two sets of points in the plane. The circles are a subset of the dots and have been perturbed randomly.
Task A: Matrix Standardization
The different dimensions or features of a data sample often show different variances. For some subsequent operations, it is a beneficial preprocessing step to standardize the data, i.e. subtract the mean and divide by the standard deviation for each dimension. After this processing, each dimension has zero mean and unit variance. Note that this is not equivalent to data whitening, which additionally de-correlates the dimensions (by means of a coordinate rotation).
Write a function that accepts data matrix x ∈ Rn×d as input and outputs the same data after normalization. n is the number of samples, and d the number of dimensions, i.e. rows contain samples and columns features.
Task B: Pairwise Distances in the Plane
One application of machine learning to computer vision is interest point tracking. The location of corners in an image is tracked along subsequent frames of a video signal (see Figure 1 for a synthetic example). In this context, one is often interested in the pairwise distance of all points in the first frame to all points in the second frame. Matching points according to minimal distance is a simple heuristic that works well if many interest points are found in both frames and perturbations are small.
Write a function that accepts two matrices P ∈ Rp×2,Q ∈ Rq×2 as input, where each row contains the (x,y) coordinates of an interest point. Note that the number of points (p and q) do not have to be equal. As output, compute the pairwise distances of all points in P to all points in Q and collect them in matrix D. Element Di,j is the Euclidean distance of the i-th point in P to the j-th point in Q.
Task C: Likelihood of a Data Sample
A subtask of many machine learning algorithms is to compute the likelihood p(x|θ) of a sample x for a given density model with parameters θ. Given k models, we now want to assign xi to the model for which the likelihood is maximal: ai = argmaxm p(xi|θm), where m = 1,…,k. θm are the parameters of the m-th density model.
We ask you to implement the assignment step for the two model case, i.e. k = 2. As input, your function receives a data matrix x ∈ Rn×d (where xi is the i-th row of x) and the parameters θ1 = (µ1,Σ1) and θ2 = (µ2,Σ2) of two multivariate Gaussian distributions:
.
2

Reviews

There are no reviews yet.

Be the first to review “ML – Labs Solved”

ML – Labs Solved

Description

Reviews

Related products

ML – Exercises for the course Solved

ML – Exercises for the course Solved

ML – Exercises for the course Solved

ML – Exercises for the course Solved

ML – Homework 5: Conditional Probability Models Solved