Description
1. [8 points] Generative Adversarial Network (GAN)
(a) What is the cost function for classical GANs? Use Dw(x) as the discriminator and Gθ(x) as the generator.
Your answer:
Original GAN minimizes a divergence/distance between probability distributions. minmaxV (Dw(x),Gθ(x)) = Ex∼pdata[logDw(x)] + Ex∼pg[log(1 − Dw(Gθ(x)))]
G D
(b) Assume arbitrary capacity for both discriminator and generator. In this case we refer to the discriminator using D(x), and denote the distribution on the data domain induced by the generator via pG(x). State an equivalent problem to the one asked for in part (a), by using pG(x) and the ground truth data distribution pdata(x).
Assuming arbitrary capacity, derive the optimal discriminator D∗(x) in terms of pdata(x) and pG(x).
where D˙ = ∂D/∂x.
2
Assume arbitrary capacity and an optimal discriminator D∗(x), show that the optimal generator, G∗(x), generates the distribution , where pdata(x) is the data distribution
3
More recently, researchers have proposed to use the Wasserstein distance instead of divergences to train the models since the KL divergence often fails to give meaningful information for training. Consider three distributions, P1 ∼ U[0,1], P2 ∼ U[0.5,1.5], and P3 ∼ U[1,2]. Calculate DKL(P1,P2), DKL(P1,P3), W1(P1,P2), and W1(P1,P3), where W1 is the Wasserstein-1 distance between distributions.
Your answer:
Z
W1(P1,P2) = inf d(x,y)dγ(x,y)
γ∈Γ(µ,ν) M×M
= sup |Ex∼P1[f(x)] − Ey∼P2[f(y)]|
f∈FL
= 0.5
Z
W1(P1,P3) = inf d(x,y)dγ(x,y)
γ∈Γ(µ,ν) M×M
= sup |Ex∼P1[f(x)] − Ey∼P3[f(y)]|
f∈FL
= 1
Γ(µ,ν) denotes the collection of all measures on M × M with marginals µ and ν on the first and second factors respectively. (The set Γ(µ,ν) is also called the set of all couplings of µ and ν.) f ∈ FL, a natural class of smooth functions is the class of 1-Lipschitz functions, i.e.
FL = {f : fcontinuous,|f(x) − f(y)| ≤ kx − yk}
4




Reviews
There are no reviews yet.