Table of Links
2. Methodology and 2.1. Research Questions
3. Results and Interpretation and 3.1. Type of Problems (RQ1)
4. Implications
4.1. Implications for the Copilot Users
4.2. Implications for the Copilot Team
4.3. Implications for Researchers
6. Related Work
6.1. Evaluating the Quality of Code Generated by Copilot
6.2. Copilot’s Impact on Practical Development and 6.3. Conclusive Summary
5. Threats to Validity
The threats to validity are discussed according to the guidelines in Runeson and Höst (2009), and internal validity is not considered, since we did not investigate the relationships between variables and results.
Construct validity: As the processes of data labelling, data extraction, and data analysis in this study were conducted manually, there is a risk of introducing personal bias. Therefore, we implemented some strategies to enhance the construct validity. In order to reduce this threat, the first and third authors carried out pilot experiments to agree on the criteria for data labelling and data extraction. If any disagreements arose during these processes, the second author was involved to achieve a consensus. The results of data extraction were rechecked by the three authors to ensure accuracy. The data analysis was conducted by the first author. When uncertainties emerged, the first author discussed them with the second and third authors to achieve joint agreement. For the results of the data analysis, the negotiated agreement approach (Campbell et al., 2013) was employed to address any conflicts.
External validity: For our research, the primary threat to external validity is the selection of data sources. To maximize external validity, we chose GitHub Issues, GitHub Discussions, and SO posts as data sources. GitHub Issues is a tool used to report and track software issues, allowing users to report errors, request features, and raise questions to developers. While GitHub Discussions is a new feature on GitHub that aims to provide a more open and organized platform for users to communicate and share insights with other community members. As a popular Q&A community, Stack Overflow is also a platform for many developers to engage in discussions and share insights regarding Copilot usage. These platforms contain a substantial amount of relevant data, and their data are complementary to each other. Consequently, we were able to collect diverse usage-related data of Copilot from a large number of developers and projects from these three data sources. However, despite all these efforts, we admitted that there may still be relevant data that we missed.
Reliability: To minimize potential uncertainties arising from the research methodology, we have implemented multiple measures to maximize the reliability of our study. We conducted a pilot labelling to assess the consistency of the two authors prior to the formal data labelling process. The Cohen’s Kappa coefficients of the three pilot labelling processes are 0.824, 0.834, and 0.806, indicating good agreement between the authors. Throughout the data labelling, extraction, and analysis process, we thoroughly discussed and resolved any inconsistencies within the team to ensure the consistency and accuracy of the result. Furthermore, we have made available the dataset of the study (Zhou et al., 2024) to enable other researchers to validate our findings.
Authors:
(1) Xiyu Zhou, School of Computer Science, Wuhan University, Wuhan, China ([email protected]);
(2) Peng Liang (Corresponding Author), School of Computer Science, Wuhan University, Wuhan, China ([email protected]);
(3) Beiqi Zhang, School of Computer Science, Wuhan University, Wuhan, China ([email protected]);
(4) Zengyang Li, School of Computer Science, Central China Normal University, Wuhan, China ([email protected]);
(5) Aakash Ahmad, School of Computing and Communications, Lancaster University Leipzig, Leipzig, Germany ([email protected]);
(6) Mojtaba Shahin, School of Computing Technologies, RMIT University, Melbourne, Australia ([email protected]);
(7) Muhammad Waseem, Faculty of Information Technology, University of Jyväskylä, Jyväskylä, Finland ([email protected]).
This paper is