Abstract and 1. Introduction

  1. Background & Related Work

  2. Method

    3.1 Sampling Small Mutations

    3.2 Policy

    3.3 Value Network & Search

    3.4 Architecture

  3. Experiments

    4.1 Environments

    4.2 Baselines

    4.3 Ablations

  4. Conclusion, Acknowledgments and Disclosure of Funding, and References

Appendix

A. Mutation Algorithm

B. Context-Free Grammars

C. Sketch Simulation

D. Complexity Filtering

E. Tree Path Algorithm

F. Implementation Details

3.2 Policy

3.2.1 Forward Process

3.2.2 Reverse Mutation Paths

Since we have access to the ground-truth mutations, we can generate targets to train a neural network by simply reversing the sampled trajectory through the forward process Markov-Chain, z0 → z1 → . . .. At first glance, this may seem a reasonable choice. However, training to simply invert the last mutation can potentially create a much noisier signal for the neural network.

Consider the case where, within a much larger syntax tree, a color was mutated as,

Authors:

(1) Shreyas Kapur, University of California, Berkeley ([email protected]);

(2) Erik Jenner, University of California, Berkeley ([email protected]);

(3) Stuart Russell, University of California, Berkeley ([email protected]).


This paper is available on arxiv under CC BY-SA 4.0 DEED license.