This story on HackerNoon has a decentralized backup on Sia.
Transaction ID: 7yp7ROR1htFeEmvSzbIEkaCzdxwCaHP9OINtokd5p8s
Cover

Few-shot In-Context Preference Learning Using Large Language Models: Full Prompts and ICPL Details

Written by @languagemodels | Published on 2024/12/3

TL;DR
For more information on Few-shot In-Context Preference Learning (ICPL), including full prompts and detailed insights into the methodology, visit our site for comprehensive resources and videos.
  1. Abstract and Introduction
  2. Related Work
  3. Problem Definition
  4. Method
  5. Experiments
  6. Conclusion and References

A. Appendix

A.1. Full Prompts and A.2 ICPL Details

A. 3 Baseline Details

A.4 Environment Details

A.5 Proxy Human Preference

A.6 Human-in-the-Loop Preference

A APPENDIX

We would suggest visiting https://sites.google.com/view/few-shot-icpl/home for more information and videos.

A.1 FULL PROMPTS

Prompt 1: Initial System Prompts of Synthesizing Reward Functions

Prompt 2: Feedback Prompts

Prompt 3: Prompts of Tips for Writing Reward Functions

Prompt 4: Prompts of Describing Differences

A.2 ICPL DETAILS

The full pseudocode of ICPL is listed in Algo. 2.

Authors:

(1) Chao Yu, Tsinghua University;

(2) Hong Lu, Tsinghua University;

(3) Jiaxuan Gao, Tsinghua University;

(4) Qixin Tan, Tsinghua University;

(5) Xinting Yang, Tsinghua University;

(6) Yu Wang, with equal advising from Tsinghua University;

(7) Yi Wu, with equal advising from Tsinghua University and the Shanghai Qi Zhi Institute;

(8) Eugene Vinitsky, with equal advising from New York University (zoeyuchao@gmail.com).


This paper is available on arxiv under CC 4.0 license.

[story continues]


Written by
@languagemodels
Large Language Models (LLMs) ushered in a technological revolution. We breakdown how the most important models work.

Topics and
tags
reinforcement-learning|in-context-learning|preference-learning|large-language-models|reward-functions|rlhf-efficiency|human-in-the-loop-rl|in-context-preference-learning
This story on HackerNoon has a decentralized backup on Sia.
Transaction ID: 7yp7ROR1htFeEmvSzbIEkaCzdxwCaHP9OINtokd5p8s