Description
1 EE25737: Introduction to Machine Learning
1.1 Fall 99-00, Group 2
1.2 Problem C3: Linear Classification & Decision Trees
1.2.1 [your name]
1.2.2 [your ID]
In this problem you will implement linear classification algorithms (perceptron and linear support vector machines), and decision trees. Answer the questions in your report, which should not exceed three pages.
1.3 A. Linear Classification
1.3.1 A1. Load Data
In this section, you are given a data set data_banknote_authentication.csv with two classes (y = 1, y = −1). Each data point has four features obtained from digital image processing of fake and real banknotes, and a label y = 1 for real backnotes, and y = −1 for faked ones. The first four columns are training features and the last column is the label.
Split the data set into two parts. The first 80% of the data is for training and the remaining 20% for testing. Import data with pandas library. The first 4 columns are training features, denoted by x1,x2,x3,x4, and the 5th column is the label, denoted by y.
1.3.2 A2. Perceptron Algorithm
In this part, you should implement the perceptron algorithm from the scratch. First, add one dimension with constant 1 to each data point (i.e. x0 = 1). What is the purpose of this? Perform the algorithm for 50000 iterations, and at each 500 iterations calculate and save the error
) on the test data, and in the end plot the error against the number of iter-
ations. After the end of the training process, report the final error on the test data and the final weights w.
1.3.3 A3. Generalize to non-linear classification
Map data (train and test) x by ψ(.) to the six-dimensional x′ as follows:
Apply Perceptron to x′ and repeat part A2.
1.3.4 A4. SVM algorithm
Train the SVM model on the data. In this part of the problem, you should use built-in models of libraries like Sklearn for training and predicting labels of the data. Do not change the default parameters of the model excpet max_iter if it is needed. Note that you should train the model on the pure form of x (without the added feature x0 = 1). At last, report final error ( on the training and the test samples, and final weights.
1.3.5 A5. SVM Algorithm on ψ(x)
Train the SVM model on the mapped data x′. Report the final weight vector, the final error on the training data, and the final error on the test data. Again, for training the model with libraries, the added feature should not be included.
1.3.6 A6. Conclusion
discuss and compare the resulting weights and errors ( ) of the above methods in your report.
1.4 B. Decision Trees
The dataset mushrooms.csv includes the overall features of some population of mushrooms. Each data point has 22 features (e.g. habitat, size, color, etc.) and the goal is to classify mushrooms as poisonous (y = 0) or edible (y = 1), by using Decision Trees classifiers. Use built-in models of libraries in Sklearn for training and predicting labels of the data. Do not change the default parameters of the model excpet max_depth.
1.4.1 B1. Load Data
The first cloumn is the label, and the remaining 22 columns are features of data points. Split the data into three sets: the first 70% for training, the next 20% for verification, and the remaining 10% for testing. The validation set is for choosing the best model among all models based on the error ( ). The test set is for estimating the true error of the selected model.
Import data with pandas library.
1.4.2 B2. Train Desicion Tree
Set the maximum depth of the tree to {4,6,8,10,12,14,16,18,20}. For each maximum depth, train a classifier on the training data and report the resulting loss on the validation set. Plot the loss against the maximum depth. What is the best maximum depth? Finally, for the best maximum depth, report the loss on the test set.
[10]: ## Train Desicion Tree for each depth here
Plot validation errors over the depth of trees, and explain your observations.




Reviews
There are no reviews yet.