LION: Latent Point Diffusion Models for 3D Shape Generation

63bd0ac90074b.jpg

Background

  • Denoising Diffusion Models

    63bd0beda0c2a.png

  • Forward Process

    q(x1:Tx0):=t=1Tq(xtxt1),q(xtxt1):=N(xt;1βtxt1,βtI)q\left(\mathbf{x}_{1: T} \mid \mathbf{x}_0\right):=\prod_{t=1}^T q\left(\mathbf{x}_t \mid \mathbf{x}_{t-1}\right), \quad q\left(\mathbf{x}_t \mid \mathbf{x}_{t-1}\right):=\mathcal{N}\left(\mathbf{x}_t ; \sqrt{1-\beta_t} \mathbf{x}_{t-1}, \beta_t \boldsymbol{I}\right)

    • TT : number of steps
    • q(xtxt1)q(\mathbf{x}_t|\mathbf{x}_{t-1}) : Gaussian transition kernel, gradually adds noise to the input with a variance schedule β1,,βT\beta_1,\cdots,\beta_T
    • The βt\beta_t are chosen such that the chain approximately converges to a standard Gaussian distribution after TT steps, q(xT)N(xT;0,I)q(\mathbf{x}_T)\approx \mathcal{N}(\mathbf{x}_T;\mathbf{0},\mathbf{I})
    • Property: sampling xt\mathbf{x}_t at an arbitrary timestep tt in closed form(αt:=1βt\alpha_t:=1-\beta_t, αˉt:=s=1tαs\bar{\alpha}_t:=\prod_{s=1}^t \alpha_s): q(xtx0)=N(xt;αˉtx0,(1αˉt)I)q\left(\mathbf{x}_t \mid \mathbf{x}_0\right)=\mathcal{N}\left(\mathbf{x}_t ; \sqrt{\bar{\alpha}_t} \mathbf{x}_0,\left(1-\bar{\alpha}_t\right) \mathbf{I}\right)
  • DDMs

    DDMs learn a parametrized reverse process(model parameter θ\theta) that inverts the forward diffusion: pθ(x0:T):=p(xT)t1Tpθ(xt1xt),pθ(xt1xt):=N(xt1;μθ(xt,t),ρt2I)p_{\boldsymbol{\theta}}\left(\mathbf{x}_{0: T}\right):=p\left(\mathbf{x}_T\right) \prod_{t-1}^T p_{\boldsymbol{\theta}}\left(\mathbf{x}_{t-1} \mid \mathbf{x}_t\right), \quad p_{\boldsymbol{\theta}}\left(\mathbf{x}_{t-1} \mid \mathbf{x}_t\right):=\mathcal{N}\left(\mathbf{x}_{t-1} ; \mu_{\boldsymbol{\theta}}\left(\mathbf{x}_t, t\right), \rho_t^2 \boldsymbol{I}\right)

    • NLL(Negative log likelihood) E[logpθ(x0)]Eq[logpθ(x0:T)q(x1:Tx0)]=Eq[logp(xT)t1logpθ(xt1xt)q(xtxt1)]=:L\mathbb{E}\left[-\log p_\theta\left(\mathbf{x}_0\right)\right] \leq \mathbb{E}_q\left[-\log \frac{p_\theta\left(\mathbf{x}_{0: T}\right)}{q\left(\mathbf{x}_{1: T} \mid \mathbf{x}_0\right)}\right]=\mathbb{E}_q\left[-\log p\left(\mathbf{x}_T\right)-\sum_{t \geq 1} \log \frac{p_\theta\left(\mathbf{x}_{t-1} \mid \mathbf{x}_t\right)}{q\left(\mathbf{x}_t \mid \mathbf{x}_{t-1}\right)}\right]=: L L=Eq[DKL(q(xTx0)p(xT))LT+t>1DKL(q(xt1xt,x0)pθ(xt1xt))Lt1logpθ(x0x1)L0]L=\mathbb{E}_q[\underbrace{D_{\mathrm{KL}}\left(q\left(\mathbf{x}_T \mid \mathbf{x}_0\right) \| p\left(\mathbf{x}_T\right)\right)}_{L_T}+\sum_{t>1} \underbrace{D_{\mathrm{KL}}\left(q\left(\mathbf{x}_{t-1} \mid \mathbf{x}_t, \mathbf{x}_0\right) \| p_\theta\left(\mathbf{x}_{t-1} \mid \mathbf{x}_t\right)\right)}_{L_{t-1}} \underbrace{-\log p_\theta\left(\mathbf{x}_0 \mid \mathbf{x}_1\right)}_{L_0}]
    • Details

      L=Eq[logpθ(x0:T)q(x1:Tx0)]=Eq[logp(xT)t1logpθ(xt1xt)q(xtxt1)]=Eq[logp(xT)t>1logpθ(xt1xt)q(xtxt1)logpθ(x0x1)q(x1x0)]=Eq[logp(xT)t>1logpθ(xt1xt)q(xt1xt,x0)q(xt1x0)q(xtx0)logpθ(x0x1)q(x1x0)]=Eq[logp(xT)q(xTx0)t>1logpθ(xt1xt)q(xt1xt,x0)logpθ(x0x1)]=Eq[DKL(q(xTx0)p(xT))+t>1DKL(q(xt1xt,x0)pθ(xt1xt))logpθ(x0x1)]\begin{aligned}L & =\mathbb{E}_q\left[-\log \frac{p_\theta\left(\mathbf{x}_{0: T}\right)}{q\left(\mathbf{x}_{1: T} \mid \mathbf{x}_0\right)}\right] \\& =\mathbb{E}_q\left[-\log p\left(\mathbf{x}_T\right)-\sum_{t \geq 1} \log \frac{p_\theta\left(\mathbf{x}_{t-1} \mid \mathbf{x}_t\right)}{q\left(\mathbf{x}_t \mid \mathbf{x}_{t-1}\right)}\right] \\& =\mathbb{E}_q\left[-\log p\left(\mathbf{x}_T\right)-\sum_{t>1} \log \frac{p_\theta\left(\mathbf{x}_{t-1} \mid \mathbf{x}_t\right)}{q\left(\mathbf{x}_t \mid \mathbf{x}_{t-1}\right)}-\log \frac{p_\theta\left(\mathbf{x}_0 \mid \mathbf{x}_1\right)}{q\left(\mathbf{x}_1 \mid \mathbf{x}_0\right)}\right] \\& =\mathbb{E}_q\left[-\log p\left(\mathbf{x}_T\right)-\sum_{t>1} \log \frac{p_\theta\left(\mathbf{x}_{t-1} \mid \mathbf{x}_t\right)}{q\left(\mathbf{x}_{t-1} \mid \mathbf{x}_t, \mathbf{x}_0\right)} \cdot \frac{q\left(\mathbf{x}_{t-1} \mid \mathbf{x}_0\right)}{q\left(\mathbf{x}_t \mid \mathbf{x}_0\right)}-\log \frac{p_\theta\left(\mathbf{x}_0 \mid \mathbf{x}_1\right)}{q\left(\mathbf{x}_1 \mid \mathbf{x}_0\right)}\right] \\& =\mathbb{E}_q\left[-\log \frac{p\left(\mathbf{x}_T\right)}{q\left(\mathbf{x}_T \mid \mathbf{x}_0\right)}-\sum_{t>1} \log \frac{p_\theta\left(\mathbf{x}_{t-1} \mid \mathbf{x}_t\right)}{q\left(\mathbf{x}_{t-1} \mid \mathbf{x}_t, \mathbf{x}_0\right)}-\log p_\theta\left(\mathbf{x}_0 \mid \mathbf{x}_1\right)\right]\\&=\mathbb{E}_q\left[D_{\mathrm{KL}}\left(q\left(\mathbf{x}_T \mid \mathbf{x}_0\right) \| p\left(\mathbf{x}_T\right)\right)+\sum_{t>1} D_{\mathrm{KL}}\left(q\left(\mathbf{x}_{t-1} \mid \mathbf{x}_t, \mathbf{x}_0\right) \| p_\theta\left(\mathbf{x}_{t-1} \mid \mathbf{x}_t\right)\right)-\log p_\theta\left(\mathbf{x}_0 \mid \mathbf{x}_1\right)\right]\end{aligned}

  • Directly tracing

    q(xt1xt,x0)=N(xt1;μ~t(xt,x0),β~tI)q\left(\mathbf{x}_{t-1} \mid \mathbf{x}_t, \mathbf{x}_0\right)=\mathcal{N}\left(\mathbf{x}_{t-1} ; \tilde{\boldsymbol{\mu}}_t\left(\mathbf{x}_t, \mathbf{x}_0\right), \tilde{\beta}_t \mathbf{I}\right) > where μ~t(xt,x0):=αˉt1βt1αˉtx0+αt(1αˉt1)1αˉtxt\quad \tilde{\boldsymbol{\mu}}_t\left(\mathbf{x}_t, \mathbf{x}_0\right):=\frac{\sqrt{\bar{\alpha}_{t-1}} \beta_t}{1-\bar{\alpha}_t} \mathbf{x}_0+\frac{\sqrt{\alpha_t}\left(1-\bar{\alpha}_{t-1}\right)}{1-\bar{\alpha}_t} \mathbf{x}_t \quad and β~t:=1αˉt11αˉtβt\quad \tilde{\beta}_t:=\frac{1-\bar{\alpha}_{t-1}}{1-\bar{\alpha}_t} \beta_t

Diffusion models and denoising antoencoders

  • Forward process and LTL_T
    • Ignore the fact that the forward process variances βt\beta_t are learnable by reparameterization and instead fix them to constrant.
    • LTL_T is a constant during training and can be ignored.
  • Reverse process and L1:T1L_{1:T-1}
    • Analysis pθ(xt1xt)=N(xt1;μθ(xt,t),Σθ(xt,t))p_\theta\left(\mathbf{x}_{t-1} \mid \mathbf{x}_t\right)=\mathcal{N}\left(\mathbf{x}_{t-1} ; \boldsymbol{\mu}_\theta\left(\mathbf{x}_t, t\right), \mathbf{\Sigma}_\theta\left(\mathbf{x}_t, t\right)\right)
    • Set Σθ(xt,t)=σt2I\mathbf{\Sigma}_\theta\left(\mathbf{x}_t, t\right)=\sigma_t^2 \mathbf{I} to untrained time dependent constants Lt1=Eq[12σt2μ~t(xt,x0)μθ(xt,t)2]+CL_{t-1}=\mathbb{E}_q\left[\frac{1}{2 \sigma_t^2}\left\|\tilde{\boldsymbol{\mu}}_t\left(\mathbf{x}_t, \mathbf{x}_0\right)-\boldsymbol{\mu}_\theta\left(\mathbf{x}_t, t\right)\right\|^2\right]+C Lt1C=Eq[12σt2αˉt1βt1αˉtx0+αt(1αˉt1)1αˉtxtμθ(xt,t)2]=Ex0,ϵ[12σt21αt(xt(x0,ϵ)βt1αˉtϵ)μθ(xt(x0,ϵ),t)2]\begin{aligned}L_{t-1}-C & =\mathbb{E}_q\left[\frac{1}{2 \sigma_t^2}\left\|\frac{\sqrt{\bar{\alpha}_{t-1}} \beta_t}{1-\bar{\alpha}_t} \mathbf{x}_0+\frac{\sqrt{\alpha_t}\left(1-\bar{\alpha}_{t-1}\right)}{1-\bar{\alpha}_t} \mathbf{x}_t-\boldsymbol{\mu}_\theta\left(\mathbf{x}_t, t\right)\right\|^2\right] \\& =\mathbb{E}_{\mathbf{x}_0, \boldsymbol{\epsilon}}\left[\frac{1}{2 \sigma_t^2}\left\|\frac{1}{\sqrt{\alpha_t}}\left(\mathbf{x}_t\left(\mathbf{x}_0, \boldsymbol{\epsilon}\right)-\frac{\beta_t}{\sqrt{1-\bar{\alpha}_t}} \boldsymbol{\epsilon}\right)-\boldsymbol{\mu}_\theta\left(\mathbf{x}_t\left(\mathbf{x}_0, \boldsymbol{\epsilon}\right), t\right)\right\|^2\right]\end{aligned} μ\boldsymbol{\mu} must predict 1αt(xtβt1αˉtϵ)\frac{1}{\sqrt{\alpha_t}}\left(\mathbf{x}_t-\frac{\beta_t}{\sqrt{1-\bar{\alpha}_t}} \boldsymbol{\epsilon}\right) given xt\textbf{x}_t
    • Choose μθ(xt,t)=1αt(xtβt1αˉtϵθ(xt,t))\boldsymbol{\mu}_\theta\left(\mathbf{x}_t, t\right)=\frac{1}{\sqrt{\alpha_t}}\left(\mathbf{x}_t-\frac{\beta_t}{\sqrt{1-\bar{\alpha}_t}} \boldsymbol{\epsilon}_\theta\left(\mathbf{x}_t, t\right)\right) Ex0,ϵ[βt22σt2αt(1αˉt)ϵϵθ(αˉtx0+1αˉtϵ,t)2]\mathbb{E}_{\mathbf{x}_0, \boldsymbol{\epsilon}}\left[\frac{\beta_t^2}{2 \sigma_t^2 \alpha_t\left(1-\bar{\alpha}_t\right)}\left\|\boldsymbol{\epsilon}-\boldsymbol{\epsilon}_\theta\left(\sqrt{\bar{\alpha}_t} \mathbf{x}_0+\sqrt{1-\bar{\alpha}_t} \boldsymbol{\epsilon}, t\right)\right\|^2\right]

Hierarchical Latent Point Diffusion Models(LION)

63bd0c4098c1f.png

Loss Function

minθEtU{1,T},x0p(x0),ϵN(0,I)[w(t)ϵϵθ(αtx0+σtϵ,t)22],w(t)=βt22ρt2(1βt)(1αt2)\min _{\boldsymbol{\theta}} \mathbb{E}_{t \sim U\{1, T\}, \mathbf{x}_0 \sim p\left(\mathbf{x}_0\right), \boldsymbol{\epsilon} \sim \mathcal{N}(\mathbf{0}, I)}\left[w(t)\left\|\boldsymbol{\epsilon}-\boldsymbol{\epsilon}_{\boldsymbol{\theta}}\left(\alpha_t \mathbf{x}_0+\sigma_t \boldsymbol{\epsilon}, t\right)\right\|_2^2\right], \quad w(t)=\frac{\beta_t^2}{2 \rho_t^2\left(1-\beta_t\right)\left(1-\alpha_t^2\right)}

  • ωt\omega_t : often set to 1 (constant)
  • After training: xt1=11βt(xtβt1αt2ϵθ(xt,t))+ρtη\mathbf{x}_{t-1}=\frac{1}{\sqrt{1-\beta_t}}\left(\mathbf{x}_t-\frac{\beta_t}{\sqrt{1-\alpha_t^2}} \boldsymbol{\epsilon}_{\boldsymbol{\theta}}\left(\mathbf{x}_t, t\right)\right)+\rho_t \boldsymbol{\eta}

H-VAE Configuration

  • Point Clouds xR3×N\mathbf{x}\in \mathbb{R}^{3\times N}
  • Global shape latent z0RDz\mathbf{z}_0 \in \mathbb{R}^{D_{\mathbf{z}}}
  • Point cloud-structured latent h0R(3+Dh)×N\mathbf{h}_0 \in \mathbb{R}^{\left(3+D_{\mathbf{h}}\right) \times N}
  • h0\mathbf{h}_0 :  a latent point cloud consisting of NN points with xyz-coordinates in R3\mathbb{R}^3

Two Stage Training

  • First Stage : Training it as a regular VAE with standard Gaussian priors

    LELBO(ϕ,ξ)=Ep(x),qϕ(z0x),qϕ(h0x,z0)[logpξ(xh0,z0)λzDKL(qϕ(z0x)p(z0))λhDKL(qϕ(h0x,z0)p(h0))]\begin{aligned}\mathcal{L}_{\mathrm{ELBO}}(\boldsymbol{\phi}, \boldsymbol{\xi}) & =\mathbb{E}_{p(\mathbf{x}), q_\phi\left(\mathbf{z}_0 \mid \mathbf{x}\right), q_\phi\left(\mathbf{h}_0 \mid \mathbf{x}, \mathbf{z}_0\right)}\left[\log p_{\boldsymbol{\xi}}\left(\mathbf{x} \mid \mathbf{h}_0, \mathbf{z}_0\right)\right. \\& \left.-\lambda_{\mathbf{z}} D_{\mathrm{KL}}\left(q_\phi\left(\mathbf{z}_0 \mid \mathbf{x}\right) \mid p\left(\mathbf{z}_0\right)\right)-\lambda_{\mathbf{h}} D_{\mathrm{KL}}\left(q_\phi\left(\mathbf{h}_0 \mid \mathbf{x}, \mathbf{z}_0\right) \mid p\left(\mathbf{h}_0\right)\right)\right]\end{aligned}

    • ϕ\boldsymbol\phi: encoder parameters
    • ξ\boldsymbol\xi : decoder parameters
  • Second Stage: train the latent DDMs on the latent encodings
    • Fix the VAE's encoder and decoder networks
    • Train two latent DDMs on the encoding z0\mathbf{z}_0 and h0\mathbf{h}_0 sampled from qϕ(z0x)q_\phi\left(\mathbf{z}_0 \mid \mathbf{x}\right) and qϕ(h0x,z0)q_\phi\left(\mathbf{h}_0 \mid \mathbf{x}, \mathbf{z}_0\right) LSMz(θ)=EtU{1,T},p(x),qϕ(z0x),ϵN(0,I)ϵϵθ(zt,t)22,LSMh(ψ)=EtU{1,T},p(x),qϕ(z0x),qϕ(h0x,z0),ϵN(0,I)ϵϵψ(ht,z0,t)22,\begin{aligned}\mathcal{L}_{\mathrm{SM}^{\mathbf{z}}}(\boldsymbol{\theta}) & =\mathbb{E}_{t \sim U\{1, T\}, p(\mathbf{x}), q_\phi\left(\mathbf{z}_0 \mid \mathbf{x}\right), \boldsymbol{\epsilon} \sim \mathcal{N}(\mathbf{0}, \boldsymbol{I})}\left\|\boldsymbol{\epsilon}-\boldsymbol{\epsilon}_{\boldsymbol{\theta}}\left(\mathbf{z}_t, t\right)\right\|_2^2, \\ \mathcal{L}_{\mathrm{SM}^{\mathrm{h}}}(\boldsymbol{\psi}) & =\mathbb{E}_{t \sim U\{1, T\}, p(\mathbf{x}), q_\phi\left(\mathbf{z}_0 \mid \mathbf{x}\right), q_\phi\left(\mathbf{h}_0 \mid \mathbf{x}, \mathbf{z}_0\right), \boldsymbol{\epsilon} \sim \mathcal{N}(\mathbf{0}, \boldsymbol{I})}\left\|\boldsymbol{\epsilon}-\boldsymbol{\epsilon}_{\boldsymbol{\psi}}\left(\mathbf{h}_t, \mathbf{z}_0, t\right)\right\|_2^2,\end{aligned}
暂无评论

发送评论 编辑评论


				
|´・ω・)ノ
ヾ(≧∇≦*)ゝ
(☆ω☆)
(╯‵□′)╯︵┴─┴
 ̄﹃ ̄
(/ω\)
∠( ᐛ 」∠)_
(๑•̀ㅁ•́ฅ)
→_→
୧(๑•̀⌄•́๑)૭
٩(ˊᗜˋ*)و
(ノ°ο°)ノ
(´இ皿இ`)
⌇●﹏●⌇
(ฅ´ω`ฅ)
(╯°A°)╯︵○○○
φ( ̄∇ ̄o)
ヾ(´・ ・`。)ノ"
( ง ᵒ̌皿ᵒ̌)ง⁼³₌₃
(ó﹏ò。)
Σ(っ °Д °;)っ
( ,,´・ω・)ノ"(´っω・`。)
╮(╯▽╰)╭
o(*////▽////*)q
>﹏<
( ๑´•ω•) "(ㆆᴗㆆ)
😂
😀
😅
😊
🙂
🙃
😌
😍
😘
😜
😝
😏
😒
🙄
😳
😡
😔
😫
😱
😭
💩
👻
🙌
🖕
👍
👫
👬
👭
🌚
🌝
🙈
💊
😶
🙏
🍦
🍉
😣
Source: github.com/k4yt3x/flowerhd
颜文字
Emoji
小恐龙
花!
上一篇
下一篇