Abstract

1 Introduction

2 Data Collection

3 RQ1: What types of software engineering inquiries do developers present to ChatGPT in the initial prompt?

4 RQ2: How do developers present their inquiries to ChatGPT in multi-turn conversations?

5 RQ3: What are the characteristics of the sharing behavior?

6 Discussions

7 Threats to Validity

8 Related Work

9 Conclusion and Future Work

References

Conclusion And Future Work

In this paper, we study the role of ChatGPT in collaborative coding by analyzing developers’ shared conversations with ChatGPT in GitHub pull requests and issues, leveraging the DevGPT dataset. Our key findings include: (1) Developers seek ChatGPT’s assistance across 16 types of software engineering inquiries. The most frequently encountered requests involve code generation, conceptual understanding, how-to guidance, issue resolution, and code review. (2)

In code generation and issue resolution tasks, developers often go beyond the conventional inputs of textual descriptions or buggy code, which are standard benchmarks for FMs in code generation and program repair. This indicates a broader range of inputs being utilized in real-world scenarios. (3) Developers engage with ChatGPT during multiturn conversations through iterative follow-up questions, prompt refinement, and clarification inquiries.

These methods are employed to enhance the quality and relevance of ChatGPT’s responses progressively. (4) Developers with different roles—such as issue authors, PR authors, and code reviewers—utilize shared conversations with ChatGPT to supplement their role-specific contributions. This practice aims to improve the efficiency and transparency of collaborative software development processes. In the future, we plan to propose automated approaches that can automatically identify the types of prompts based on our taxonomies.

We also plan to explore whether the existing best practices for prompt engineering have been applied in the collected shared conversations and if applying them will influence the flow of multi-turn conversations.

Acknowledgements

We acknowledge the support of the Natural Sciences and Engineering Research Council of Canada (NSERC), [funding reference number: RGPIN-2019-05071].

Conflict of Interest

The authors declare that they have no conflict of interest.

Data Availability Statements

The results, source code, and data related to this study are available at https: //github.com/RISElabQueens/analyzing-shared-conversation

References

Arya D, Wang W, Guo JL, Cheng J (2019) Analysis and detection of information types of open source software issue discussions. In: 2019 IEEE/ACM 41st International Conference on Software Engineering (ICSE), IEEE, pp 454–464

Baltes S, Treude C, Robillard MP (2020) Contextual documentation referencing on stack overflow. IEEE Transactions on Software Engineering 48(1):135–149

Barke S, James MB, Polikarpova N (2023) Grounded copilot: How programmers interact with code-generating models. Proceedings of the ACM on Programming Languages 7(OOPSLA1):85–111

Beyer S, Macho C, Di Penta M, Pinzger M (2020) What kind of questions do developers ask on stack overflow? a comparison of automated approaches to classify posts into question categories. Empirical Software Engineering 25:2258–2301

Chen M, Tworek J, Jun H, Yuan Q, Pinto HPdO, Kaplan J, Edwards H, Burda Y, Joseph N, Brockman G, et al. (2021) Evaluating large language models trained on code. arXiv preprint arXiv:210703374

Deng Y, Xia CS, Yang C, Zhang SD, Yang S, Zhang L (2024) Large language models are edge-case generators: Crafting unusual programs for fuzzing deep learning libraries. In: Proceedings of the 46th IEEE/ACM International Conference on Software Engineering, pp 1–13

Di Sorbo A, Panichella S, Visaggio CA, Di Penta M, Canfora G, Gall HC (2015) Development emails content analyzer: Intention mining in developer discussions (t). In: 2015 30th IEEE/ACM International Conference on Automated Software Engineering (ASE), IEEE, pp 12–23

Gagniuc PA (2017) Markov chains: from theory to implementation and experimentation. John Wiley & Sons

G´omez C, Cleary B, Singer L (2013) A study of innovation diffusion through link sharing on stack overflow. In: 2013 10th working conference on mining software repositories (MSR), IEEE, pp 81–84

Guo Q, Cao J, Xie X, Liu S, Li X, Chen B, Peng X (2024) Exploring the potential of chatgpt in automated code refinement: An empirical study. In: Proceedings of the 46th IEEE/ACM International Conference on Software Engineering, pp 1–13

Hata H, Treude C, Kula RG, Ishio T (2019) 9.6 million links in source code comments: Purpose, evolution, and decay. In: 2019 IEEE/ACM 41st International Conference on Software Engineering (ICSE), IEEE, pp 1211–1221

Hata H, Novielli N, Baltes S, Kula RG, Treude C (2022) Github discussions: An exploratory study of early adoption. Empirical Software Engineering 27:1–32

Hou X, Zhao Y, Liu Y, Yang Z, Wang K, Li L, Luo X, Lo D, Grundy J, Wang H (2023) Large language models for software engineering: A systematic literature review. arXiv preprint arXiv:230810620

Huang Q, Xia X, Lo D, Murphy GC (2018) Automating intention mining. IEEE Transactions on Software Engineering 46(10):1098–1119

Jiang N, Liu K, Lutellier T, Tan L (2023) Impact of code language models on automated program repair. In: Proceedings of the 45th International Conference on Software Engineering, IEEE Press, ICSE ’23, p 1430–1442, DOI 10.1109/ICSE48619.2023.00125, URL https://doi.org/ 10.1109/ICSE48619.2023.00125

Landis JR, Koch GG (1977) The measurement of observer agreement for categorical data. biometrics pp 159–174

Li L, Ren Z, Li X, Zou W, Jiang H (2018) How are issue units linked? empirical study on the linking behavior in github. In: 2018 25th Asia-Pacific Software Engineering Conference (APSEC), IEEE, pp 386–395

Liang JT, Yang C, Myers BA (2024) A large-scale survey on the usability of ai programming assistants: Successes and challenges. In: Proceedings of the 46th IEEE/ACM International Conference on Software Engineering, pp 1–13

Liu J, Xia X, Lo D, Zhang H, Zou Y, Hassan AE, Li S (2021) Broken external links on stack overflow. IEEE Transactions on Software Engineering 48(9):3242–3267

Liu J, Zhang H, Xia X, Lo D, Zou Y, Hassan AE, Li S (2022) An exploratory study on the repeatedly shared external links on stack overflow. Empirical Software Engineering 27:1–32

Lu J, Yu L, Li X, Yang L, Zuo C (2023) Llama-reviewer: Advancing code review automation with large language models through parameter-efficient fine-tuning. In: 2023 IEEE 34th International Symposium on Software Reliability Engineering (ISSRE), IEEE, pp 647658

Mozannar H, Bansal G, Fourney A, Horvitz E (2022) Reading between the lines: Modeling user behavior and costs in ai-assisted programming. arXiv preprint arXiv:221014306

Nijkamp E, Pang B, Hayashi H, Tu L, Wang H, Zhou Y, Savarese S, Xiong C (2022) Codegen: An open large language model for code with multi-turn program synthesis. In: The Eleventh International Conference on Learning Representations

Nurwidyantoro A, Shahin M, Chaudron MR, Hussain W, Shams R, Perera H, Oliver G, Whittle J (2022) Human values in software development artefacts: A case study on issue discussions in three android applications. Information and Software Technology 141:106731

OpenAI (2024a) ChatGTP Shared Links FAQ. URL https://help.openai. com/en/articles/7925741-chatgpt-shared-links-faq, accessed: 2024- 01-23

OpenAI (2024b) Create a Shared Link. URL https://help.openai.com/en/ articles/7943611-create-a-shared-link, accessed: 2024-01-23

Qu C, Yang L, Croft WB, Zhang Y, Trippas JR, Qiu M (2019) User intent prediction in information-seeking conversations. In: Proceedings of the 2019 Conference on Human Information Interaction and Retrieval, pp 25–33

Rosen C, Shihab E (2016) What are mobile developers asking about? a large scale study using stack overflow. Empirical Software Engineering 21:1192– 1223

Ross SI Martinez F, Houde S, Muller M, Weisz JD (2023) The programmer’s assistant: Conversational interaction with a large language model for software development. In: Proceedings of the 28th International Conference on Intelligent User Interfaces, pp 491–514

Shi L, Chen X, Yang Y, Jiang H, Jiang Z, Niu N, Wang Q (2021) A first look at developers’ live chat on gitter. In: Proceedings of the 29th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering, pp 391–403

Siddiq ML, Santos J, Tanvir RH, Ulfat N, Rifat FA, Lopes VC (2023) Exploring the effectiveness of large language models in generating unit tests. arXiv preprint arXiv:230500418

Treude C, Barzilay O, Storey MA (2011) How do programmers ask and answer questions on the web?(nier track). In: Proceedings of the 33rd international conference on software engineering, pp 804–807

Vaithilingam P, Zhang T, Glassman EL (2022) Expectation vs. experience: Evaluating the usability of code generation tools powered by large language models. In: Chi conference on human factors in computing systems extended abstracts, pp 1–7

Viviani G, Famelis M, Xia X, Janik-Jones C, Murphy GC (2019) Locating latent design information in developer discussions: A study on pull requests. IEEE Transactions on Software Engineering 47(7):1402–1413

Wan Z, Xia X, Hassan AE (2019) What is discussed about blockchain? a case study on the use of balanced lda and the reference architecture of a domain to capture online discussions about blockchain platforms across the stack exchange communities. IEEE Transactions on Software Engineering (01):1–1

Wang D, Xiao T, Thongtanunam P, Kula RG, Matsumoto K (2021) Understanding shared links and their intentions to meet information needs in modern code review: A case study of the openstack and qt projects. Empirical Software Engineering 26:1–32

Wang X, Chen Y, Yuan L, Zhang Y, Li Y, Peng H, Ji H (2024) Executable code actions elicit better llm agents. arXiv preprint arXiv:240201030

Xiao T, Baltes S, Hata H, Treude C, Kula RG, Ishio T, Matsumoto K (2023) 18 million links in commit messages: purpose, evolution, and decay. Empirical Software Engineering 28(4):91

Xiao T, Treude C, Hata H, Matsumoto K (2024) Devgpt: Studying developerchatgpt conversations. In: Proceedings of the International Conference on Mining Software Repositories (MSR 2024)

Ye D, Xing Z, Kapre N (2017) The structure and dynamics of knowledge network in domain-specific q&a sites: a case study of stack overflow. Empirical Software Engineering 22:375–406

Zampetti F, Ponzanelli L, Bavota G, Mocci A, Di Penta M, Lanza M (2017) How developers document pull requests with external references. In: 2017 IEEE/ACM 25th International Conference on Program Comprehension (ICPC), IEEE, pp 23–33

Zhang B, Liang P, Zhou X, Ahmad A, Waseem M (2023) Practices and challenges of using github copilot: An empirical study. arXiv preprint arXiv:230308733

Zhang T, Gao C, Ma L, Lyu M, Kim M (2019) An empirical study of common challenges in developing deep learning applications. In: 2019 IEEE 30th International Symposium on Software Reliability Engineering (ISSRE), IEEE, pp 104–115

Zhang Y, Yu Y, Wang H, Vasilescu B, Filkov V (2018) Within-ecosystem issue linking: a large-scale study of rails. In: Proceedings of the 7th international workshop on software mining, pp 12–19

Ziegler A, Kalliamvakou E, Li XA, Rice A, Rifkin D, Simister S, Sittampalam G, Aftandilian E (2022) Productivity assessment of neural code completion. In: Proceedings of the 6th ACM SIGPLAN International Symposium on Machine Programming, pp 21–29

Authors

  1. Huizi Hao
  2. Kazi Amit Hasan
  3. Hong Qin
  4. Marcos Macedo
  5. Yuan Tian
  6. Steven H. H. Ding
  7. Ahmed E. Hassan

This paper is available on arxiv under CC BY-NC-SA 4.0 license.