Abstract and 1. Introduction

  1. Background & Related Work

  2. Method

    3.1 Sampling Small Mutations

    3.2 Policy

    3.3 Value Network & Search

    3.4 Architecture

  3. Experiments

    4.1 Environments

    4.2 Baselines

    4.3 Ablations

  4. Conclusion, Acknowledgments and Disclosure of Funding, and References

Appendix

A. Mutation Algorithm

B. Context-Free Grammars

C. Sketch Simulation

D. Complexity Filtering

E. Tree Path Algorithm

F. Implementation Details

Abstract

Large language models generate code one token at a time. Their autoregressive generation process lacks the feedback of observing the program’s output. Training LLMs to suggest edits directly can be challenging due to the scarcity of rich edit data. To address these problems, we propose neural diffusion models that operate on syntax trees of any context-free grammar. Similar to image diffusion models, our method also inverts “noise” applied to syntax trees. Rather than generating code sequentially, we iteratively edit it while preserving syntactic validity, which makes it easy to combine this neural model with search. We apply our approach to inverse graphics tasks, where our model learns to convert images into programs that produce those images. Combined with search, our model is able to write graphics programs, see the execution result, and debug them to meet the required specifications. We additionally show how our system can write graphics programs for hand-drawn sketches. Video results can be found at https://tree-diffusion.github.io.

1 Introduction

Large language models (LLMs) have made remarkable progress in code generation, but their autoregressive nature presents a fundamental challenge: they generate code token by token, without access to the program’s runtime output from the previously generated tokens. This makes it difficult to correct errors, as the model lacks the feedback loop of seeing the program’s output and adjusting accordingly. While LLMs can be trained to suggest edits to existing code [6, 42, 17], acquiring sufficient training data for this task is difficult.

In this paper, we introduce a new approach to program synthesis using neural diffusion models that operate directly on syntax trees. Diffusion models have previously been used to great success in image generation [14, 22, 31]. By leveraging diffusion, we let the model learn to iteratively refine programs while ensuring syntactic validity. Crucially, our approach allows the model to observe the program’s output at each step, effectively enabling a debugging process.

In the spirit of systems like AlphaZero [29], the iterative nature of diffusion naturally lends itself to search-based program synthesis. By training a value model alongside our diffusion model, we can guide the denoising process toward programs that are likely to achieve the desired output. This allows us to efficiently explore the program space, making more informed decisions at each step of the generation process.

We implement our approach for inverse graphics tasks, where we posit domain-specific languages for drawing images. Inverse graphics tasks are naturally suitable for our approach since small changes in

the code produce semantically meaningful changes in the rendered image. For example, a misplaced shape on the image can be easily seen and fixed in program space.

Our main contributions for this work are (a) a novel approach to program synthesis using diffusion on syntax trees and (b) an implementation of our approach for inverse graphics tasks that significantly outperforms previous methods.

This paper is available on arxiv under CC BY-SA 4.0 DEED license.

Authors:

(1) Shreyas Kapur, University of California, Berkeley ([email protected]);

(2) Erik Jenner, University of California, Berkeley ([email protected]);

(3) Stuart Russell, University of California, Berkeley ([email protected]).