Description
Assignment 2
Instructions on programming assignments
You are asked to modify the code in hw2 submission.py between
############################################################
############################################################
# BEGINYOURCODE
# HINTS OR INSTRUCTIONS
pass ;
# ENDYOURCODE
############################################################
############################################################
• Please use Python3 to do all assignments in the course.
• You are allowed to make your own helper functions outside the skeleton functions.
• Do not change other than hw2 submission.py. You are only allowed to use numpy package to compute and construct functions. Do not use scikit-learn, pytorch, or tensorflow etc to implement your functions.
• You can test your code with test functions in test.py.
• For your reference, you can compare your own classifiers with the ones in sklearn package that is implemented in test.py.
• Do not change seed in test.py.
• You need to get no more than 5% difference of accuracy from the reference to get full score.
• Only submit hw2 submission.py
Problem 1: Logistic Regression
In this problem, you will implement class Logistic Regression and test it on ‘breast cancer wisconsin’ dataset. The dataset has 30 features including the information about cell nucleus, such as radius, texture, concave points, and etc. The goal is to classify rather the cell (data point) is malignant (y = 1) or benign (y = 0).
1. Implement function Logistic Regression.fit
2. Implement function Logistic Regression.sigmoid 3. Implement function Logistic Regression.loss
4. Implement function Logistic Regression.predict
Problem 2: Gaussian Naive Bayes Classifier
Calculation of Bayes Theorem can be simplified by making some assumptions, such as each input variable is independent of all other input variables. Although a dramatic and unrealistic assumption, this has the effect of making the calculations of the conditional probability tractable and results in an effective classification model referred to as Naive Bayes.
In this problem, you will implement Gaussian Naive Bayes Classifier and test it on two datasets: 1) breast cancer, 2) digits. Digits dataset consist of 1797 samples with 10 classes. Each sample has 64 features, which is originally 8×8 image. Your task for both dataset is to predict correct class for the test samples. (Note: When you compute Bayes rule, denominator cannot be 0. In other words, a value inside log cannot be equal or less than 0.)
1. Implement function Naive Bayes.mean
2. Implement function Naive Bayes.std
3. Implement function Naive Bayes.fit
4. Implement function Naive Bayes.gen by class
5. Implement function Naive Bayes.calc gaussian dist
6. Implement function Naive Bayes.predict
Problem 3: Spam Naive Bayes Classifier
Similar to the previous problem, you will implement Naive Bayes Classifier to classify a bunch of emails that they are rather spam or not. You can download the dataset from the link “Spam Dataset” in the BrightSpace. Locate the ham, spam folders in spam dataset directory of your homework folder. This is real email data which contains spam emails and ham (non-spam) ones from the Enron Corporation after the company collapsed.
(Hint: You will count the number of occurrence of each word to compute likelihood of each word given a spam/ham label.)
1. Implement function Spam Naive Bayes.get word counts
2. Implement function SpamNaive Bayes.fit
3. Implement function Spam Naive Bayes.predict




Reviews
There are no reviews yet.