What is the Wake Sleep Algorithm 🌱

Last updated on August 10, 2020

Hemholtz Machine

We have two networks:

Recognition network with weights $ϕ$ converts input data $x$ into latent representations used in successive hidden states $z$ .
Generative network reconstructs data from the latent states using weights $θ$ .

The Hemholtz machine tries to learn $ϕ$ and $θ$ such that $q_{ϕ} (z ∣ x) \approx p_{θ} (z ∣ x) \propto p_{θ} (z, x)$

$q_{ϕ} (z ∣ x)$ is the variational distribution approximating the posterior $p_{θ} (z ∣ x)$

Feed $x^{(i)}$ into the recognition network to get $μ_{ϕ} (x^{(i)})$ and $Σ_{ϕ} (x^{(i)})$
Draw $L$ samples $z_{1}^{(i)}, \dots, z_{L}^{(i)} \sim q_{ϕ} (z ∣ x^{(i)}) = N (z; μ_{ϕ} (x^{(i)}), Σ_{ϕ} (x^{(i)}))$
For each $l \in [L],$ feed $z_{l}^{(i)}$ into the generative network to get $f_{θ} (z_{l}^{(i)})$ for the likelihood $p_{θ} (x ∣ z_{l}^{(i)}) =$ Bernoulli(x; $f_{θ} (z_{l}^{(i)}))$
Optimize for $max_{θ} \sum_{i = 1}^{N} \frac{1}{L} \sum_{l = 1}^{L} \log p_{θ} (x^{(i)} ∣ z_{l}^{(i)})$

Simulate latent state by feeding input data to recognition network and maximize how well the generator’s probabilities for the hidden state fit the actual data.

Draw $z^{l} \sim N (0, I)$
Sample $x^{l}$ from the generative network $p_{θ} (x ∣ z^{l}) = B e r n o u l l i (f_{θ} (z^{l}))$
Feed $x^{l}$ into the recognition network to get $μ (x^{l})$ and $Σ (x^{l})$
Compute $q_{ϕ} (z^{l} ∣ x^{l}) = N (z^{l}; μ (x^{l}), Σ (x^{l}))$
Optimize $max_{ϕ} \frac{1}{L} \sum_{l = 1}^{L} \log q_{ϕ} (z^{l} ∣ x^{l})$

Simulate random $x$ data by following the generator. Then maximize the probability that the recognition network suggests the correct latent states given the simulated $x$ .

There are no notes linking to this note.

Here are all the notes in this garden, along with their links, visualized as a graph.