MMD: A way to measure the difference between two probability space using RKHS.
GAN: Generative models also need to match two distributions.
MMD is like the f-divergence.
now take a new to think GAN as
$max_kernel min_G MMD_kernel(dataset,G)$
the discriminator is calculate a "good" kernel to illustrate the difference between the two distributions
here is another interesting ICLR submission:
important sampling MMD:https://openreview.net/pdf?id=SyuWNMZ0W
Motivation: Although GANs have been highly impactful, their learning objective can lead to mode collapse, where
the generator simply memorizes a few training examples to fool the discriminator. This pathology is
reminiscent of maximum likelihood density estimation with Gaussian mixtures: by collapsing the
variance of each component we achieve infinite likelihood and memorize the dataset, which is not
useful for a generalizable density estimate.
Method: Consider the GAN as a bayesian method to sample the parameters.
sampling instead of variational inference: we use sampling to explore a full posterior over the weights, whereas Tran et al.  perform a variational approximation centred on one of the modes of the posterior (and due to the
properties of the KL divergence is prone to an overly compact representation of even that mode);
We can instead marginalize z from our posterior updates using simple Monte Carlo
Traditional GAN as max likelihood.
two setting: semi-supervise and supervised learning test
sampling by Stochastic Gradient Hamiltonian Monte Carlo (SGHMC)(HMC is great!)
a big problem for bayesian network is that we need calculate several times to do the mc prediction which is high computational cost.[ they use 14h to do
I can't understand the generator is the MC average of several generated pictures? This is why the picture is vague.(In nonconvex setting, maybe its not a good idea to calculate the mean of the sampled parameters)
adversarial variational bayes
main idea: GAN can calculate the KL divergence of two distributions which is a term in the objective function!
AC trick: build a bridge between posterior and prior. posterior may far from prior, so the discriminator may be very bad, so using a gaussian to approximate the posterior.
An interesting paper!
Openset Domain Adaptation
Introduce the outlier in to domain adaptation.
this paper an approach that iterates betweensolving the labelling problem of target samples, i.e.,
associating a subset of the target samples to the known categoriesof the source domain, and computing a mappingfrom the source to the target domain by minimisingthe distancesof the assignments. The transformed source samplesare then used in the next iteration to re-estimate the assignmentsand update the transformation.
Objective funtion: like the energy in k-means
An interesting paper propose a good question! However the solution is naive.