Abstract and 1. Introduction

  1. Related Works
  2. Preliminary
  3. Q*: A General, Versatile and Agile Deliberation Framework for LLMs
  4. Experiments
  5. Conclusion and References

3 Preliminary

3.1 Formulate the Multi-step Reasoning of LLMs as an MDP

3.2 A* Search

A* [30] is an important heuristic search algorithm in deliberative planning [38], multi-agent pathfinding [39], and constraint reasoning [40]. Originally, A* is proposed for finding the shortest path from source s to goal g in path planning problems. It associates each frontier vertex n with a value f(n) = g(n) + h(n), where g(n) is the accumulated path cost from source s and h(n) is a heuristic value that estimates the cost of the shortest path from n to goal g. The algorithm adopts a best-first search strategy, i.e., in each iteration it always picks the vertex with minimum f-value to explore until reaching the goal. When the heuristic h(·) is admissible [41], A* guarantees to find the optimal path.

4 Q*: A General, Versatile and Agile Deliberation Framework for LLMs

Most of modern LLMs generate natural languages in an auto-regressive way, i.e., predict the next token in a sequence given the previously generated tokens (cf. Eq. (2)). Therefore, when applied to multi-step reasoning problem, LLMs can potentially introduce errors, hallucinations and inconsistent statements in the subsequent reasoning trace if any previous step is incorrect, which may fail to solve the current problem. Indeed, given the fact that LLMs produce each token with limited computation resources, there is no way to devote more computational efforts to solve difficult problems. In short, LLMs cannot perform in-depth deliberation which is essential for solving complex multi-step reasoning problems.

4.1 Estimation of Optimal Q-value

4.2 Deliberative Planning with A*

Authors:

(1) Chaojie Wang*, Skywork AI;

(2) Yanchen Deng*, Nanyang Technological University;

(3) Zhiyi Lyu, Nanyang Technological University;

(4) Liang Zeng, Skywork AI;

(5) Jujie He, Skywork AI.


This paper is available on arxiv under CC BY 4.0 license.