Description
Homework 5: Unsupervised learning
Getting Started: Please install the sklearn package for Python 3.6. Unzipping this folder will create the directory structure shown below. You will submit your code under the Submission/Code directory.
HW05
— HW05.pdf
— Data
— Submission
|–Code
|–Figures
In this assignment, you will implement several unsupervised learning algorithms and you will use them to compress images.
Deliverables: This assignment has two types of deliverables: a report and code files.
• Report: The solution report will give your answers to the homework questions (listed below). Try to keep the maximum length of the report to 5 pages in 11 point font, including all figures and tables. You can use any software to create your report, but your report must be submitted in PDF format. Please ensure that the answers in the report follow the same order as given in this document.
Submitting Solutions: When you complete the assignment, you will upload your report and your code using the Gradescope.com service. Place your final code in Submission/Code. If you used Python to generate report figures, place them in Submission/Figures. Finally, create a zip file of your submission directory, Submission.zip (NO rar, tar or other formats). Upload this single zip file on Gradescope as your solution to the ’HW05-Unsupervised-Learning’ assignment. Gradescope will run checks to determine if your submission contains the required files in the correct locations. Finally, upload your pdf report to the ’HW05-Unsupervised-Learning-PDF’ assignment. When you upload your report please make sure to select the correct pages for each question respectively. Failure to select the correct pages will result in point deductions. The submission time for your assignment is considered to be the later of the submission timestamps of your code and report submissions.
Task:
Unsupervised learning: In contrary of supervised learning (classification, regression), unsupervised learning
algorithms attempt to learn some structure of the data using unlabeled samples. There are several algorithms within the unsupervised learning scope: principal component analysis (PCA), k-means, independent component analysis (ICA), and density estimation, among others. In this project you will use K-Means to compress images.
Algorithm and data:
k-means: clustering algorithm; it finds centroids and assign each sample to one and only one of these according to some criteria. You will use k-means to compress the following image
Figure 1: Image to compress using k-means
Questions:
1. (100 points) K-means:
(10) a. K-means is a simple unsupervised learning algorithm that splits the data into clusters. There are different ways to determine the “optimal” number of clusters; the elbow rule being a very simple one. Explain it in at most 4 sentences.
(15) b. Another issue with k-means is that the random initialization of the centroids can sometimes lead to “poor” clusters. A possible solution to this problem is presented in the algorithm called k-means++. Briefly explain the idea behind this algorithm.
k Compression rate
2
5
10
25
50
100
200
Figure 2: Example of reconstructed image using 15 clusters




Reviews
There are no reviews yet.