CS439 – +0/1/60+ Solved – Assignment Hub

Description

5/5 – (1 vote)

Exam Optimization for Machine Learning – CS-439
Prof. Martin Jaggi

For your examination, preferably print documents compiled from automultiple-choice.
+0/2/59+
First part, multiple choice
There is exactly one correct answer per question.
(i) find x0, such that
(ii) Calculate the difference
(iii) Output
(iv) repeat (ii)–(iii) for higher accuracy.
(1)
Question 1
Question 2
0
4.1288
4.1269
4.1212
4.1231
Question 3 significant digits in the above example (y = 17, x = 4). +0/3/58+
Question 4 The Newton-Raphson method to find zeros of f can be interpreted as a second-order optimization method. Of course, one could also use the gradient method instead. How would the iterates of this scheme look like? (For carefully chosen stepsize γ).
Question 5
objective function g : R → R is given by
tion (1) from the previous section?
Question 6 where
A ∈ Rn×n
A positive semidefinite
A positive semidefinite, and b is non-negative
−A positive semidefinite

+0/4/57+
Coordinate Descent
Question 7 Consider the least squares objective function
, (2)
for A a m × n matrix, A = [a1,…an] with columns ai.
What is the gradient ∇f(x)?
A
A>Ax
Question 8
Θ(n + m)
Θ(n2m2)
Θ(n)
Θ(mn)
Θ(m2n)
Θ(mn2)
Θ(m)
Question 9
Equation (2), given x:
Θ(n)
Θ(m2n)
Θ(m)
Θ(n2m2)
Θ(n + m)
Θ(mn)
Θ(mn2)
Question 10
i
+0/5/56+
Frank-Wolfe
Consider the linear minimization oracle (LMO) for matrix completion, that is for
min X (Zij − Yij)2
matrices
Question 11 n×m (derive it if
to computing the projection onto X?
2 2
Question 12
Question 13
Random search
Question 14
+0/6/55+
Empirical comparison of different methods
Question 15
None
Question 16
None
Question 17
Newton’s optimization method
None
Gradient Descent (with correct stepsize)

+0/7/54+
Question 18 Which optimization method corresponds to the error-curve for Algorithm 4?
Newton’s optimization method
None
Accelerated Gradient Method (with correct parameters)
For your examination, preferably print documents compiled from automultiple-choice.
+0/8/53+
Second part, true/false questions
Question 19 (Convexity) The epigraph of a function f : Rd → R is defined as
Question 20 any norm is convex.
Question 21
We define C1 + C2 := {x + x ,x ∈ C ,x ∈ C }.
Question 22
Question 23
constants for each gradient coordinate).
i
typically faster than uniform CD
Question 24
i
+0/9/52+
Third part, open questions
Answer in the space provided! Your answer must be justified with all steps. Do not cross any checkboxes, they are reserved for correction.

For your examination, preferably print documents compiled from automultiple-choice.

+0/10/51+
Question 25: 5 points. What is the sub-gradient of g? How many calls to the gradient oracle are needed to compute ∂g(x) and g(x)?
Hint: Show that for two convex functions g1(x) and g2(x), ∂gi(x) is a subgradient in the set ∂ max(g1(x),g2(x)) where gi(x) := max(g1(x),g2(x)).

0 1 2 3 4 5

+0/11/50+
Question 27: 4 points. Using the result from the previous question, show that O(n/ε2) calls to the projection oracle is sufficient to distinguish between case (N) and case (E) for our problem.

0 1 2 3 4

For your examination, preferably print documents compiled from automultiple-choice.
+0/12/49+
Question 28: 6 points.

0 1 2 3 4 5 6

The convergence of the Frank-Wolfe algorithm was analyzed in class for only smooth functions. In this question we will examine if smoothness is necessary. Consider the following non-smooth function

+0/13/48+
Newton’s second-order optimization method
As studied in the class, the update step for Newton’s optimization method for an objective function g : Rn → R is given by

+0/14/47+
Coordinate Descent
Question 31: 2 points. Given a matrix A, we define λmin(A>A) and λmax(A>A) to be the smallest and largest eigenvalues of A>A.

+0/15/46+
Question 33: 2 points. For f(x) := kAx−bk2, we now perform one step of coordinate descent. I.e.
for a given point xt ∈ Rn we do a step of the form
xt+1 := xt − γt(∇f(xt))i · ei
where ei ∈ Rn denotes a standard unit vector. For i fixed, compute the best γt.

+0/16/45+
Question 35: 6 points. Combining the two valid equations of smoothness and strong convexity (as also stated in Questions 12 and 13), prove in detailed steps that, if , SGD in this setting converges as
.
+0/17/44+
Question 36: 4 points. Recall the possible choices of learning rate (γt) in the situation of the previous question. What is the resulting rate of convergence? Which estimator do we eventually consider?
Comment on the assumption . Is it a restriction? Which choice of step size could be used,

For your examination, preferably print documents compiled from automultiple-choice.

Reviews

There are no reviews yet.

Be the first to review “CS439 – +0/1/60+ Solved”

CS439 – +0/1/60+ Solved

Description

Reviews

Related products

CS439 – Labs Solved

CS439 – Labs Solved

CS439 – Labs Solved

CS439 – +0/1/60+ Solved

CS 4395 Intro to NLP