Authors:

(1) Arindam Mitra;

(2) Luciano Del Corro, work done while at Microsoft;

(3) Shweti Mahajan, work done while at Microsoft;

(4) Andres Codas, denote equal contributions;

(5) Clarisse Simoes, denote equal contributions;

(6) Sahaj Agarwal;

(7) Xuxi Chen, work done while at Microsoft;;

(8) Anastasia Razdaibiedina, work done while at Microsoft;

(9) Erik Jones, work done while at Microsoft;

(10) Kriti Aggarwal, work done while at Microsoft;

(11) Hamid Palangi;

(12) Guoqing Zheng;

(13) Corby Rosset;

(14) Hamed Khanpour;

(15) Ahmed Awadall.

Abstract and Introduction

Preliminaries

Teaching Orca 2 to be a Cautious Reasoner

Technical Details

Experimental Setup

Evaluation Results

Limitations

Conclusions and References

A. AGIEval Subtask Metrics

B. BigBench-Hard Subtask Metrics

C. Evaluation of Grounding in Abstractive Summarization

D. Evaluation of Safety

E. Prompts used in Evaluation

F. Illustrative Example from Evaluation Benchmarks and Corresponding Model Outpu

F Illustrative Example from Evaluation Benchmarks and Corresponding Model Output

This paper is available on arxiv under CC 4.0 license.