Author:

(1) Jan G´orecki, Department of Informatics and Mathematics, Silesian University in Opava, Univerzitnı namestı 1934/3, 733 40 Karvina, Czech Republic ([email protected]).

Abstract and 1 Introduction

2 Methodology

2.1 The task

2.2 The communication protocol

2.3 The copula family

3 Pair programming with ChatGPT

3.1 Warm up

3.2 The density

3.3 The estimation

3.4 The sampling

3.5 The visualization

3.6 The parallelization

4 Summary and discussion

5 Conclusion and Acknowledgments

Appendix A: The solution in Python

Appendix B: The solution in R

References

Abstract

Without writing a single line of code by a human, an example Monte Carlo simulation based application for stochastic dependence modeling with copulas is developed using a state-of-the-art large language model (LLM) fine-tuned for conversations. This includes interaction with ChatGPT in natural language and using mathematical formalism, which, under careful supervision by a human-expert, led to producing a working code in MATLAB, Python and R for sampling from a given copula model, evaluation of the model’s density, performing maximum likelihood estimation, optimizing the code for parallel computing for CPUs as well as for GPUs, and visualization of the computed results. In contrast to other emerging studies that assess the accuracy of LLMs like ChatGPT on tasks from a selected area, this work rather investigates ways how to achieve a successful solution of a standard statistical task in a collaboration of a human-expert and artificial intelligence (AI). Particularly, through careful prompt engineering, we separate successful solutions generated by ChatGPT from unsuccessful ones, resulting in a comprehensive list of related pros and cons. It is demonstrated that if the typical pitfalls are avoided, we can substantially benefit from collaborating with an AI partner. For example, we show that if ChatGPT is not able to provide a correct solution due to a lack of or incorrect knowledge, the human-expert can feed it with the correct knowledge, e.g., in the form of mathematical theorems and formulas, and make it to apply the gained knowledge in order to provide a solution that is correct. Such ability presents an attractive opportunity to achieve a programmed solution even for users with rather limited knowledge of programming techniques.

1 Introduction

The recent progress in solving natural language processing (NLP) tasks using large language models (LLMs) resulted in models with previously unseen quality of text generation and contextual understanding. These models, such as BERT (Devlin et al., 2018), RoBERTa (Liu et al., 2019) and GPT-3 (Brown et al.,2020), are capable of performing a wide range of NLP tasks, including text classification, question-answering, text summarization, and more. With more than 100 million users registered in two months after release for public testing through a web portal[1], ChatGPT[2] is the LLM that currently most resonates in the artificial intelligent (AI) community. This conversational AI is fine-tuned from the GPT-3.5 series with reinforcement learning from human feedback (Christiano et al., 2017; Stiennon et al., 2020), using nearly the same methods as InstructGPT (Ouyang et al., 2022), but with slight differences in the data collection setup. In March 2023, the ChatGPT’s developer released a new version of GPT-3.5, GPT-4 (OpenAI, 2023). In the time of the writing of this paper, GPT-4 was not freely available, so our results do not include its outputs. However, as a technical report for some of the model’s properties is available, we add the relevant information where appropriate.

A particular result of the ChatGPT’s fine-tuning is that it can generate corresponding code in many programming languages given a task description in natural language. This can be exploited in pair programming (Williams, 2001) with ChatGPT, which then offers several benefits, including:

• Enhanced productivity: ChatGPT can help automate repetitive and time-consuming programming tasks, freeing up time for developers to focus on higher-level problem-solving and creative work. On average a time saving of 55% was reported for the task of writing an HTTP server in JavaScript in the study conducted by the GitHub Next team[3] for GitHub Copilot[4]. The latter is another code suggestion tool that generates code snippets based on natural language descriptions, powered by an LLM similar to ChatGPT, Codex (Chen et al., 2021).

• Improved code quality: Pair programming with ChatGPT can help identify errors and bugs in the code before they become bigger problems. ChatGPT can also suggest improvements to code architecture and design.

• Knowledge sharing: ChatGPT can help less experienced developers learn from more experienced team members by providing suggestions and guidance.

• Better code documentation: ChatGPT can help create more detailed and accurate code documentation by generating comments and annotations based on the code.

• Accessibility: ChatGPT can make programming more accessible to people who may not have a programming background, allowing them to collaborate with developers and contribute to projects in a meaningful way. For example, having developed a new theory that requires computations, it might be appealing and time-effective for researchers to use tools like ChatGPT to implement the solution without the need to involve typically expensive manpower in software engineering.

Currently, there appear several studies that assess the accuracy of LLMs like ChatGPT based on a set of tasks from a particular area. For example, multiple aspects of mathematical skills of ChatGPT are evaluated in Frieder et al. (2023), with the main observation that it is not yet ready to deliver high-quality proofs or calculations consistently. In Katz et al. (2023), a preliminary version of GPT-4 was experimentally evaluated against prior generations of GPT on the entire Uniform Bar Examination (UBE)[5], and it is reported that GPT-4 significantly outperforms both human test-takers and prior models, demonstrating a 26% increase over the GPT-3.5-based model and beating humans in five of seven subject areas. In Bang et al. (2023), an extensive evaluation of ChatGPT using 21 data sets covering 8 different NLP tasks such as summarization, sentiment analysis and question answering is presented. The authors found that, on the one hand, ChatGPT outperforms LLMs with so-called zero-shot learning (Brown et al., 2020) on most tasks and even out-performs fine-tuned models on some tasks. On the other hand, they conclude that ChatGPT suffers from hallucination problems like other LLMs and it generates more extrinsic hallucinations from its parametric memory as it does not have access to an external knowledge base. Interestingly, the authors observed in several tasks that the possibility of interaction with ChatGPT enables human collaboration with the underlying LLM to improve its performance.

The latter observation is the main focus of this work. Rather than evaluating the accuracy of LLMs, we investigate ways to benefit from pair programming with an AI partner in order to achieve a successful solution of a task requiring intensive computations. Despite many impressive recent achievements of stateof-the-art LLMs, achieving a functional code is far from being straightforward; one of many unsuccessful attempts is reported at freeCodeCamp [6]. Importantly, successful attempts are also emerging. In Maddigan and Susnjak (2023), the authors report that LLMs together with the proposed prompts can offer a reliable approach to rendering visualisations from natural language queries, even when queries are highly misspecified and underspecified. However, in many areas, including computationally intensive solutions of analytically intractable statistical problems, a study that demonstrates benefits from pair programming with an AI partner is missing.

This work fills this gap and considers applications involving copulas (Nelsen, 2006; Joe, 2014) as models for stochastic dependence between random variables. These applications are known for their analytical intractability, hence, the Monte Carlo (MC) approach is most widely used to compute the involved quantities of interest. As the MC approach often involves large computation efforts, conducting a MC study requires one to implement all underlying concepts. We demonstrate how to make ChatGPT produce a working implementation for such an application by interacting with it in a natural language and using math ematical formalism. To fully illustrate the coding abilities of ChatGPT, the human role is pushed to an extreme, and all the mentioned tasks are implemented without a single line of code written by the human or tweaking the generated code in any way. It is important to emphasize that even if the application under consideration relates to a specific area of probability and statistics, our observations apply in a wider scope as the tasks we consider (sampling from a given (copula) model, evaluation of the model’s density, performing maximum likelihood estimation, optimizing the code for parallel computing and visualization of the computed results) commonly appear in many statistical applications. Also, we do not present just one way to achieve a successful solution for a given task. Most of the successful solutions are complemented with examples demonstrating which adjustments of our prompts for ChatGPT turn unsuccessful solutions to successful ones. This results in a comprehensive list of related pros and cons, suggesting that if the typical pitfalls are avoided, we can substantially benefit from a collaboration with LLMs like ChatGPT. Particularly, we demonstrate that if ChatGPT is not able to provide a correct solution due to limitations in its knowledge, it is possible to feed it with the necessary knowledge, and make ChatGPT apply this knowledge to provide a correct solution. Having all the sub-tasks of the main task successfully coded in a particular programming language, we also demonstrate how to fully exploit several impressive abilities of ChatGPT. For example, by a simple high-level prompt like “Now code it in Python.”, ChatGPT correctly transpiles the code from one to another programming language in a few seconds. Also, if an error in the code produced by ChatGPT is encountered during execution, it is demonstrated that ChatGPT is not only able to identify the error, but even immediately produces a corrected version after the error message is copy-pasted to ChatGPT’s web interface.

The paper is organized as follows. Section 2 presents the tasks we consider and sets up the way we interact with ChatGPT. Section 3 presents the development of the task via pair programming with ChatGPT. Section 4 summarizes the pros and cons observed during the task development, including a discussion on how to mitigate the latter, and Section 5 concludes.

This paper is available on arxiv under CC BY 4.0 DEED license.

[1] https://www.demandsage.com/chatgpt-statistics/

[2] https://openai.com/blog/chatgpt/

[3] https://github.blog/2022-09-07-...

[4] https://github.com/features/copilot

[5] https://www.ncbex.org/exams/ube/

[6] https://www.freecodecamp.org/news/pair-programming.