**Potential Based Diffusion Motion Planning and Generalization.**
Our method encodes motion planning constraints to latent vectors and uses the potential functions to generate motion plans.
We show that our method can directly generalize to cluttered heterogeneous environments via composing potentials,
while the potential functions are only trained on simple homogeneous environments.

Effective motion planning in high dimensional spaces is a long-standing open problem in robotics. One class of traditional motion planning algorithms corresponds to potential-based motion planning. An advantage of potential based motion planning is composability -- different motion constraints can be easily combined by adding corresponding potentials. However, constructing motion paths from potentials requires solving a global optimization across configuration space potential landscape, which is often prone to local minima. We propose a new approach towards learning potential based motion planning, where we train a neural network to capture and learn an easily optimizable potentials over motion planning trajectories. We illustrate the effectiveness of such approach, significantly outperforming both classical and recent learned motion planning approaches and avoiding issues with local minima. We further illustrate its inherent composability, enabling us to generalize to a multitude of different motion constraints.

\begin{equation}
q_{1:T}^* = \text{arg\,min}_{q_{1:T}} U_\theta(q_{1:T}, q_{\text{st}}, q_{\text{e}}, C),
\label{eqn:potential_traj}
\end{equation}

where $q_{1:T}^*$ is a successful motion plan from $q_{\text{st}}$ to $q_{\text{e}}$.
To learn the potential function above, we propose to learn an
EBM
across a dataset of solved motion planning $D = \{q_{\text{st}}^i, q_{\text{e}}^i, q_{1:T}^i, C^i \}$, where $e^{-E_\theta(q_{1:T}|q_{\text{st}}, q_{\text{e}}, C)} \propto p(q_{1:T}|q_{\text{st}}, q_{\text{e}}, C)$. Since the dataset $D$ is of solved motion planning problems, the learned energy function $E_\theta$ will have minimal energy at successful motion plans $q_{1:T}^*$ and thus satisfy our potential function $U_\theta$.
\begin{equation}
\mathcal{L}_{\text{MSE}}=\|\mathbf{\epsilon} - \nabla_{q_{1:T}} E_\theta(\sqrt{1-\beta_s} q_{1:T}^i + \sqrt{\beta_s} \mathbf{\epsilon}, s, q_{\text{st}}^i, q_{\text{e}}^i, C^i)\|^2
\label{eqn:train_obj}
\end{equation}

where $\epsilon$ is sampled from Gaussian noise $\mathcal{N}(0, 1)$, $s \in\{1,2,...,S\}$ is the denoising diffusion step (we set $S = 100$), and $\beta_s$ is the corresponding Gaussian noise corruption on a motion planning path $q_{1:T}^i$. We refer to $E_\theta$ as the
\begin{align}
& q_{1:T}^{s-1}=q_{1:T}^{s}-\gamma \epsilon + \xi, \quad \xi \sim \mathcal{N} \bigl(0, \sigma^2_s I \bigl), \notag \\
& \text{where} \: \: \epsilon = \nabla_{q_{1:T}} E_\theta(q_{1:T}, s, q_{\text{st}}, q_{\text{e}}, C)
\label{eqn:diffusion_opt}
\end{align}

To parameterize the energy function $E_\theta(q_{1:T}, s, q_{\text{st}}, q_{\text{e}}, C)$,
we use classifier-free guidance
to form a peaker composite energy function conditioned on $C$.
$\gamma$ and $\sigma^2_s$ are diffusion specific scaling constants (A rescaling term at each diffusion step is omitted above for clarity). The final predicted motion path $q_{1:T}^*$ corresponds to the output $q_{1:T}^0$ after running $S$ steps of optimization from the diffusion potential function.
\begin{align}
&q_{1:T}^{s-1}=q_{1:T}^{s}-\gamma \epsilon^{\text{comb}}+\xi, \quad \xi \sim \mathcal{N} \bigl(0, \sigma^2_s I \bigl), \label{eqn:diffusion_opt_comb} \\
& \text{where} \: \: \epsilon^{\text{comb}} = \nabla_{q_{1:T}} (E_\theta^{1}(q_{1:T}, s, q_{\text{st}}, q_{\text{e}}, C_1) + E_\theta^{2}(q_{1:T}, s, q_{\text{st}}, q_{\text{e}}, C_2)). \notag
\end{align}

This potential function $E^{\text{comb}}$ will have low energy precisely at motion planning paths $q_{1:T}$ which satisfy both constraints, with sampling corresponding to optimizing this potential function. Hence, by iteratively sampling from the composite potential function, we can obtain motion plans that satisfy both constraints. Please see our
compositional results below.
**Trajectory Denoising Process on Maze2D.**
Motion trajectories are randomly initialized from Gaussian distribution in timestep $S = 100$. Noises are iteratively removed via the gradient of the energy function. A feasible collision-free trajectory can be obtained at timestep $S = 0$.
We use DDIM to accelerate sampling speed.

Our method can generalize to more difficult out-of-distribution environments by sampling from the composite diffusion potential function. We demonstrate qualitative results where we compose potentials of the same obstacle type, of different obstacle types, and of static and dynamic obstacles. Please see our paper for more comprehensive results.

```
@inproceedings{
luo2024potential,
title={Potential Based Diffusion Motion Planning},
author={Yunhao Luo and Chen Sun and Joshua B. Tenenbaum and Yilun Du},
booktitle={Forty-first International Conference on Machine Learning},
url={https://openreview.net/forum?id=Qb68Rs0p9f}
year={2024},
}
```

We propose new samplers, inspired by MCMC, to enable successful compositional generation. Further, we propose an energy-based parameterization of diffusion models which enables the use of new compositional operators and more sophisticated, Metropolis-corrected samplers.

We present a method to compose different diffusion models together, drawing on the close connection of
diffusion models with EBMs. We illustrate how compositional operators enable
the ability to composing multiple sets of objects together as well as generate images subject to
complex text prompts.

Diffuser is a denoising diffusion probabilistic model that plans by iteratively refining randomly sampled noise. The denoising process lends itself to flexible conditioning, by either using gradients of an objective function to bias plans toward high-reward regions or conditioning the plan to reach a specified goal.