Description
[created by Fernando Valladares Monteiro and Keith Jenkins]
You are given a dataset π·!”## with 10 features and 10$ samples. We want to use a simple perceptron to test how the generalization bound, Ξ΅ varies in practice as a
function of the number of samples π%&, the tolerance level Ξ΄, and the VC dimension πβ(. Whenever you vary one of these parameters, the others should be kept constant.
Hint: Remember that the VC dimension of the perceptron equals the number of features plus one, i.e., πβ( = π + 1.
a) To study the dependence on the tolerance level use π%& = 1000, πβ( = 4, Ξ΄ = 0.01, 0.02, β¦ 0.5 and follows the steps:
i) Use the parameters above to compute theoretical generalization bound values as a function of Ξ΄.
ii) Extract a dataset π·)”* with π)”* = 10+ samples and the first π (note that π β πβ() features from π·!”##. You will use this dataset to estimate πΈ)”*.
iii) Extract a dataset π·%& with π%& samples and the first π features from π·!”##, train a perceptron on π·%&, and compute πΈ%& and πΈ)”*
iv) Repeat steps (iii) 1000 times, remembering to save all your results in arrays.
v) For each in the values above, determine the maximum value of |πΈ)”* β πΈ%&| such that, over the 1000 runs, a proportion Ξ΄ of the samples are above that value. Hint: for Ξ΄ = 0.5, this value is the median. This provides an estimate of the generalization bound as a function of the tolerance level.
vi) Plot the results from (i) and from (v) side by side.
Would you say the theoretical bound is tight, moderate, or loose in this experiment? (Letβs define tight as within a factor of 2; moderate as a factor of 2-10; and loose as a factor of more than 10.) Would you say the relationship between the bound and the tolerance level is the same in theory and in the experiment, i.e., do the plots have similar shapes? If not, conjecture why if you can.
b) To study the dependence on VC dimension, you will vary the perceptronβs complexity by selecting only the first π features. Use: π%& = 1000, πβ( = [2 β¦ 11], Ξ΄ = 0.1 and follow the steps:
i) Use the parameters above to compute theoretical generalization bound values as a function of πβ(
ii) Extract a dataset π·)(-.”*) with π)”* = 10+ samples and all the 10 features from π·!”##. You will use this dataset to estimate πΈ)”*.
2
iii) Extract a dataset π·%&(-.) with π%& = 1000 and all the features from the π·!”##.
iv) For each πβ( = [2 β¦ 11]: create datasets π·)(0″*) and π·%&(0) by extracting the first π (note that π β πβ() features of π·)(-.”*) and π·%&(-.) respectively, train a perceptron on π·%&(0), and compute πΈ%& and πΈ)”*.
v) Repeat steps (iii)-(iv) 100 times, remembering to save all your results in arrays.
vi) Determine the maximum value of |πΈ)”* β πΈ%&| such that, over the 100 runs, a
proportion Ξ΄ of the samples are above that value. This provides an estimate of the generalization bound as a function of the VC dimension.
vii) Plot the results from (i) and from (vi) side by side
Would you say the theoretical bound is tight, moderate, or loose in this experiment? (Letβs define tight as within a factor of 2; moderate as a factor of 210; and loose as a factor of more than 10.) Would you say the relationship between the bound and the VC dimension is the same in theory and in the experiment, i.e., do the plots have similar shapes? If not, conjecture why if you can. (Hint: the model that the data was generated from is very simple.)
c) To study the dependence on the number of samples use: π*12%& =
[10,30,100,300,1000,3000,10000], π*34* = π*12%&/5, πβ( = 4, Ξ΄ = 0.1 and follow the steps:
i) Use the parameters above to compute theoretical generalization bound values as a function of π*12%&.
ii) Use the parameters above to compute theoretical generalization bound values as a function of π*34*. Remember that the generalization bound for train and test sets have different formulas.
iii) Extract a dataset π·)”* with π)”* = 10+ samples and the first π (note that π β πβ() features fromπ·!”##. You will use this dataset to estimate πΈ)”* iv) For each π*12%&: extract a dataset π·*12%& with π*12%& samples and π features from π·!”##, extract a dataset π·*34* with π*12%&/5 samples and π features from π·!”##, train a perceptron on π·*12%&, and compute πΈ*12%&, πΈ*34* and πΈ)”*.
v) Repeat step (iv) 100 times, remembering to save all your results in arrays.
vi) Determine the maximum value of |πΈ)”* β πΈ*12%&| such that, over the 100 runs, a proportion Ξ΄ of the samples are above that value. This provides an estimate of the generalization bound as a function of the number of training samples.
vii) Repeat item (vi) for |πΈ)”* β πΈ*34*|. viii) Plot the results from (i) and from (vi) side by side.
ix) Plot the results from (ii) and from (vii) side by side.
3




Reviews
There are no reviews yet.