STAT157 – homework5 Solved

Description

5/5 – (1 vote)

1 Homework 5 – Berkeley STAT 157
Your name: XX, SID YY (Please add your name, and SID to ease Ryan and Rachel to grade.)
In this homework, we will model covariate shift and attempt to fix it using logistic regression. This is a fairly realistic scenario for data scientists. To keep things well under control and understandable we will use Fashion-MNIST as the data to experiment on.
Follow the instructions from the Fashion MNIST notebook to get the data.
In [1]: %matplotlib inline from mxnet import autograd, gluon, init, nd from mxnet.gluon import data as gdata, loss as gloss, nn, utils import numpy as np
mnist_train = gdata.vision.FashionMNIST(train=True) mnist_test = gdata.vision.FashionMNIST(train=False)
1.1 1. Logistic Regression
1. Implement the logistic loss function l(y, f) = − log(1 + exp(−yf)) in Gluon.
2. Plot its values and its derivative for y = 1 and f ∈ [−5,5], using automatic differentiation in Gluon.
3. Generate training and test datasets for a binary classification problem using Fashion-MNIST with class 1 being a combination of sneaker and pullover and class −1 being the combination of sandal and shirt categories.
4. Train a binary classifier of your choice (it can be linear or a simple MLP such as from a previous lecture) using half the data (i.e. 12,000 observations mixed as abvove) and one using the full dataset (i.e. 24,000 observations as arising from the 4 categories) and report its accuracy.
Hint – you should encapsulate the training and reporting code in a callable function since you’ll need it quite a bit in the following.
1
1.2 2. Covariate Shift
Your goal is to introduce covariate shit in the data and observe the accuracy. For this, compose a dataset of 12,000 observations, given by a mixture of sneaker and pullover and of sandal and shirt respectively, where you use a fraction λ ∈ {0.05,0.1,0.2, . . . 0.8,0.9,0.95} of one and a fraction of 1 −λ of the other datasets respectively. For instance, you might pick for λ = 0.1 a total of 600 sneaker and 5,400 pullover images and likewise 600 sandal and 5,400 shirt photos, yielding a total of 12,000 images for training. Note that the test set remains unbiased, composed of 2,000 photos for the sneaker + pullover category and of the sandal + shirt category each.
1. Generate training sets that are appropriately biased. You should have 11 datasets.
2. Train a binary classifier using this and report the test set accuracy on the unbiased test set.
1.3 3. Covariate Shift Correction
Having observed that covariate shift can be harmful, let’s try fixing it. For this we first need to compute the appropriate propensity scores dpdq((xx)). For this purpose pick a biased dataset, let’s say with λ = 0.1 and try to fix the covariate shift.
1. When training a logistic regression binary classifier to fix covariate shift, we assumed so far that both sets are of equal size. Show that re-weighting data in training and test set appropriately can help address the issue when both datasets have different size. What is the weighting?
2. Train a binary classifier (using logistic regression) distinguishing between the biased training set and the unbiased test set. Note – you need to weigh the data.
3. Use the scores to compute weights on the training set. Do they match the weight arising from the biasing distribution λ?
4. Train a binary classifier of the covariate shifted problem using the weights obtained previously and report the accuracy. Note – you will need to modify the training loop slightly such that you can compute the gradient of a weighted sum of losses.
2

Reviews

There are no reviews yet.

Be the first to review “STAT157 – homework5 Solved”

STAT157 – homework5 Solved

Description

Reviews

Related products

STAT157 – homework9 Solved

STAT157 – homework8 Solved

STAT157 – homework4 Solved

STAT157 – homework3 Solved

STAT157 – homework2 Solved