Notes of Score Diffusion Models (updating)

This is a summary of unified framework of diffusion models by Stochastic Differential Equations (SDE) and the related topics.

Background

In score-based diffusion models , the authors proposed a unified framework that connects the score-based model NCSN and DDPM through the perspective of Stochastic Differential Equations (SDE). They interpret the forward process (adding noise) and the backward process (denoising sampling) as SDE and reverse SDE, respectively.

Data Perturbation with Itô SDE

The diffusion process ${\x_t}_{t=0}^T$ from the original input $\x_0$ to Gaussian noise $\x_T$ with continuous time variable $t\in\sbr{0,T}$ can be modeled by the Itô SDE

\[\begin{equation} d\x = \f(\x, t) dt + \bs{G}(\x, t)d\w_t \end{equation}\]

$\w_t$ is the standard Wiener process. $\w_t\sim \N(0, t) \rightarrow \w_{t+\Delta t}-\w_t\sim \N(0, \Delta t) \rightarrow d\w_t \sim \sqrt{\Delta t}\bs{\epsilon} \quad \bs{\epsilon}\sim \N(0,\I)$
$\f(\x,t): \mathbb{R}^d \rightarrow \mathbb{R}^d$ are the drift coefficients and $d$ is the dimension of $\x$. It is always affine, resulting in $f(\x, t) = f(t)\x$ where $f(\cdot):\mathbb{R} \rightarrow \mathbb{R}$.
$\bs{G}(\x,t): \mathbb{R}^d \rightarrow \mathbb{R}^{d\times d}$ are the diffusion coefficients at time $t$.

We consider the case $\bs{G}(\x,t) = g(t)\I$ which is independent of $\x$. Then, Eq. (1) can be rewritten as

\[\begin{equation} d\x = f(t)\x dt + g(t)d\w_t \label{eq:forward-sde} \end{equation}\]

The perturbation kernel of this SDE have the form

\[\begin{equation} p(\x_t \mid \x_0) = \N(\x_t \mid s_t\x_0, s_t^2\sigma_t^2\I) \end{equation}\]

where

\[\begin{equation} s_t = \exp\nbr{\int_0^t f(\xi)d\xi} \quad \text{and}\quad \sigma_t^2 = \int_0^t\nbr{\frac{g(\xi)}{s(\xi)}}^2d\xi \end{equation}\]

Here, I’ll use $s_t$ and $s(t)$, and $\sigma_t$ and $\sigma(t)$ interchangeably for simplicity. The corresponding marginal distribution is obtained by

\[\begin{align} p(\x_t) &= \int p(\x_t \mid \x_0)p_{data}(\x_0)d\x_0\\ &= s_t^{-d}\int p_{data}(\x_0)\N(\x_ts_t^{-1} \mid \x_0, \sigma_t^2\I)d\x_0\\ &= s_t^{-d}\int p_{data}(\x_0)\N(\x_ts_t^{-1}-\x_0 \mid 0, \sigma_t^2\I)d\x_0\\ &= s_t^{-d} \sbr{p_{data}(\x_0) * \N\nbr{0, \sigma_t^2\I}}(\x_ts_t^{-1})\\ &= s_t^{-d} p(\x_ts_t^{-1};\sigma_t) \label{eq:marginal} \end{align}\]

The Fokker-Plank equation describes the evolution of the marginal distribution $p_t(\x)$, interchangeable with $p(\x_t)$, over time under the effect of drift forces and random (or noise) forces. It can be written as

\[\begin{align} \frac{\partial p_t(\x)}{\partial t} &= -\nabla_{\x}\sbr{\f(\x,t) p_t(\x)} + \frac{1}{2}\nabla_{\x}\nabla_{\x}\sbr{\bs{G}(\x,t)\bs{G}^T(\x,t)p_t(\x)}\\ &= -\sum_{i=1}^{d}\frac{\partial}{\partial x_i}\sbr{f_i(\x,t)p_t(\x)} + \frac{1}{2}\sum_{i=1}^{d}\sum_{j=1}^{d}\frac{\partial^2}{\partial \x_i \partial \x_j}\sbr{\sum_{k=1}^{d} G_{ik}(\x, t)G_{jk}(\x, t) p_t(\x)}\\ &= -\sum_{i=1}^{d}\frac{\partial}{\partial x_i}\sbr{f_i(\x,t)p_t(\x) -\frac{1}{2}\sbr{\nabla_{\x}\sbr{\bs{G}(\x,t)\bs{G}^T(\x,t)} + \bs{G}(\x,t)\bs{G}^T(\x,t)\nabla_{\x}\log p_t(\x)}p_t(\x)}\\ &= -\sum_{i=1}^{d}\frac{\partial}{\partial x_i}\sbr{\tilde{f}_i(\x,t)p_t(\x)} \label{eq:fokker-planck} \end{align}\]

where

\[\tilde{f}_i(\x,t) = f_i(\x,t) - \frac{1}{2}\sbr{\nabla_{\x}\sbr{\bs{G}(\x,t)\bs{G}^T(\x,t)} + \bs{G}(\x,t)\bs{G}^T(\x,t)\nabla_{\x}\log p_t(\x)}.\]

If we consider the case $\bs{G}(\x,t) = g(t)\I$ which is independent of $\x$, then Eq. $\ref{eq:fokker-planck}$ can be rewritten as

\[\begin{align} \frac{\partial p_t(\x)}{\partial t} &= -\nabla_{\x}\sbr{f(\x,t)p_t(\x)} + \frac{1}{2}g^2(t)\nabla_{\x}\nabla_{\x}\sbr{p_t(\x)}\\ &= -\sum_{i=1}^{d}\frac{\partial}{\partial x_i}\sbr{\sbr{f_i(\x,t)-\frac{1}{2}g^2(t)\nabla_{\x}\log p_t(\x)}p_t(\x)} \label{eq:fokker-planck-special} \end{align}\]

Probability Flow ODE

According to the Fokker-Plank equation, there exists an ODE which shares the same marginal distribution $p(\x)$ as the SDE. From Eq. $\ref{eq:fokker-planck-special}$, the corresponding SDE is reduced to ODE given by

\[\begin{align} d\x &= \sbr{f(\x, t) - \frac{1}{2}g^2(t)\nabla_{\x}\log p_t(\x)}dt + 0d\w_t \\ &= \sbr{f(t)\x - \frac{1}{2}g^2(t)\nabla_{\x}\log p_t(\x)}dt \label{eq:prob-flow-ode} \end{align}\]

The ODE is named as the probability flow ODE (PF ODE).

If we further build the ODE according to $s_t$ and $\sigma_t$, we have

\[\begin{equation} f(t) = \dot{s}_ts_t^{-1} \quad \text{and} \quad g_t = s_t\sqrt{2\dot{\sigma}_t\sigma_t}. \end{equation}\]

The derivation can be found in EDM paper (Eq. 28 and 34). Then, Eq. $\ref{eq:prob-flow-ode}$ can be rewritten as

\[\begin{align} d\x &= \sbr{\dot{s}_ts_t^{-1}\x - s_t^2\dot{\sigma}_t\sigma_t\nabla_{\x}\log p_t(\x)}dt\\ &= \sbr{\dot{s}_ts_t^{-1}\x - s_t^2\dot{\sigma}_t\sigma_t\nabla_{\x}\log p(\x s_t^{-1};\sigma_t)}dt\\ &= \sbr{ -\dot{\sigma}_t\sigma_t\nabla_{\x}\log p(\x_t;\sigma_t)}dt \quad \text{where} \quad s_t = 1 \end{align}\]

where the marginal $p_t(\x) = s^{-1}_t p(\x_t s_t^{-1};\sigma_t)$ and

\[p(\x_ts_t^{-1}; \sigma_t) = \sbr{p_{data}(\x_0) * \N\nbr{0, \sigma_t^2\I}}(\x_ts_t^{-1})\]

as shown in Eq. $\ref{eq:marginal}$.

Connection Between PF ODE and SDE

According to the Eq. (102) in , the authors derived a family of SDE for any choice of $g(t)$. The SDE is given by

\[\begin{align} d\x &= \nbr{\frac{1}{2}g^2(t) - \dot{\sigma}_t\sigma_t}\nabla_{\x}\log p(\x;\sigma_t)dt + g(t)d\w_t\\ &= \hat{\f}(\x, t)dt + g(t)d\w_t \label{eq:general-sde} \end{align}\]

The PF ODE is a special case of the SDE when $g(t) = 0$. The derivation is given in the EDM paper (Appendix B.5). It is a little bit long but not difficult to follow. Furthermore, the authors parameterized $g(t) = \sqrt{2\beta(t)}\sigma_t$ and Eq. $\ref{eq:general-sde}$ becomes

\[\begin{equation} d\x = - \dot{\sigma}_t\sigma_t\nabla_{\x}\log p(\x;\sigma_t)dt + \beta(t)\sigma_t^2\nabla_{\x}\log p(\x;\sigma_t)dt + \sqrt{2\beta(t)}\sigma_td\w_t \label{eq:general-sde-beta} \end{equation}\]

where $\beta(t)$ is a free function.

Reverse SDE

The reverse diffusion process from $\x_T$ to $\x_0$ can also be modeled by Itô SDE according to :

\[\begin{align} d\x &= \sbr{\hat{\f}(\x,t) - g^2(t)\nabla_{\x}\log p_t(\x)}dt + g(t)d\w_t \\ &= - \dot{\sigma}_t\sigma_t\nabla_{\x}\log p(\x;\sigma_t)dt \underbrace{- \beta(t)\sigma_t^2\nabla_{\x}\log p(\x;\sigma_t)dt + \sqrt{2\beta(t)}\sigma_td\w_t}_{\text{Langevin dynamics: noise cancellation}} \end{align}\]

Notes of Score Diffusion Models (updating)

Background

Data Perturbation with Itô SDE

Probability Flow ODE

Connection Between PF ODE and SDE

Reverse SDE

Unified Framework of Score Diffusion Models

NCSN (VE SDE)

DDPM (VP SDE)

DDIM

Preconditioning

Sampling

Flow Matching

Consistency Model

Notes of Score Diffusion Models (updating)

Background

Data Perturbation with Itô SDE

Probability Flow ODE

Connection Between PF ODE and SDE

Reverse SDE

Unified Framework of Score Diffusion Models

NCSN (VE SDE)

DDPM (VP SDE)

DDIM

Preconditioning

Sampling

Related Topics

Flow Matching

Consistency Model