Abstract and 1. Introduction

2. Methodology and 2.1. Research Questions

2.2. Data Collection

2.3. Data Labelling

2.4. Data Extraction

2.5. Data Analysis

3. Results and Interpretation and 3.1. Type of Problems (RQ1)

3.2. Type of Causes (RQ2)

3.3. Type of Solutions (RQ3)

4. Implications

4.1. Implications for the Copilot Users

4.2. Implications for the Copilot Team

4.3. Implications for Researchers

5. Threats to Validity

6. Related Work

6.1. Evaluating the Quality of Code Generated by Copilot

6.2. Copilot’s Impact on Practical Development and 6.3. Conclusive Summary

7. Conclusions, Data availability, Acknowledgments, CRediT authorship contribution statement and References

2.4. Data Extraction

To answer the three RQs in Section 2.1, we established a set of data items for data extraction, as presented in Table 1. Data items D1-D3 intend to extract the information of problems, underlying causes, and potential solutions from the filtered data to answer RQ1-RQ3, respectively. These three data items could be extracted from any part of a GitHub issue, discussion, or SO post, such as the title, the problem description, comments, and discussions.

2.4.1. Pilot Data Extraction

The first and third author conducted a pilot data extraction on 20 randomly selected GitHub issues, 20 discussions, and 20 SO posts, and in case of any discrepancies, the second author was involved to reach a consensus. The results indicated that the three data items could be extracted from our dataset. Based on the observation, we established the following criteria for formal data extraction: (1) If the same problem was identified by multiple users, we recorded it only once. (2) If multiple problems were identified within the same GitHub issue, GitHub discussion, or SO post, we recorded each one separately. (3) For a problem that has multiple causes mentioned, we only recorded the cause confirmed by the reporter of the problem or the Copilot team as the root cause. (4) For a problem that has multiple solutions suggested, we only recorded the solutions that were confirmed by the reporter of the problem or the Copilot team to actually solve the problem.

2.4.2. Formal Data Extraction

The first and third authors conducted the formal data extraction from the filtered dataset to extract the data items. Subsequently, they discussed and reached a consensus with the second author on inconsistencies to ensure that the data extraction process adhered to the predetermined criteria. Each extracted data item was reviewed multiple times by the three authors to ensure accuracy. The data extraction results were compiled and recorded in MS Excel (Zhou et al., 2024).

It is important to note that not all collected data include the cause and solution of a problem. Although we selected closed GitHub issues and answered GitHub discussions and SO posts during the data collection phase, the specifics of each piece of data vary significantly. Sometimes, the respondents to a Copilot related problem might offer a solution without a detailed analysis of that issue, preventing us from extracting the underlying causes. In other situations, although the cause of a problem was identified, the user did not describe the specific resolution process. For example, a user found that Copilot “cannot work correctly on VSCode remote server” and realized it was due to “the bad network”, but did not provide any detailed solutions (Discussion #14907). Additionally, even when some responses provided both causes and solutions, they might not be accepted or proven effective by the problem’s reporter or the Copilot team members. For example, a user asked for “a way to set up GitHub copilot in Google Colab”, but the user neither accepted nor replied to the three proposed answers (SO #72431032). Therefore, we cannot consider any of the three answers as an effective solution to his problem.

2.5. Data Analysis

To answer the three RQs formulated in Section 2, we conducted data analysis by using the Open Coding and Constant Comparison methods, which are two widely employed techniques from Grounded Theory during qualitative data analysis (Stol et al., 2016). Open Coding is not confined by pre-existing theoretical frameworks; instead, it encourages researchers to generate codes based on the actual content within the data. These codes constitute descriptive summarizations of the data, aiming to capture the underlying themes. In Constant Comparison, researchers continuously compare the coded data, dynamically refining and adjusting the categories based on their similarities and differences.

The specific process of data analysis includes four steps: 1) The first author meticulously reviewed the collected data and then assigned descriptive codes that succinctly encapsulated the core themes. For instance, the issue in Discussion #10598 was coded as “Stopped Giving Inline Suggestions”, which was reported by a user who noticed that his previously functioning Copilot had suddenly stopped providing code suggestions in VSCode. 2) The first author compared different codes to identify patterns, commonalities, and distinctions among them. Through this iterative comparison process, similar codes were merged into higherlevel types and categories. For example, the code of Discussion #10598, along with other akin codes, formed into the type of FUNCTIONALITY FAILURE, which further belongs to the category of Operation Issue. Once uncertainties arose, the first author engaged in discussions with the second and third authors to achieve a consensus. It should be noted that, due to the nature of Constant Comparison, both the types and the categories underwent several rounds of refinement before reaching their final form. 4) The initial version of the analysis results was further verified by the second and third authors, and the negotiated agreement approach (Campbell et al., 2013) was employed to address the conflicts. The final results are presented in Section 3.

Authors:

(1) Xiyu Zhou, School of Computer Science, Wuhan University, Wuhan, China ([email protected]);

(2) Peng Liang (Corresponding Author), School of Computer Science, Wuhan University, Wuhan, China ([email protected]);

(3) Beiqi Zhang, School of Computer Science, Wuhan University, Wuhan, China ([email protected]);

(4) Zengyang Li, School of Computer Science, Central China Normal University, Wuhan, China ([email protected]);

(5) Aakash Ahmad, School of Computing and Communications, Lancaster University Leipzig, Leipzig, Germany ([email protected]);

(6) Mojtaba Shahin, School of Computing Technologies, RMIT University, Melbourne, Australia ([email protected]);

(7) Muhammad Waseem, Faculty of Information Technology, University of Jyväskylä, Jyväskylä, Finland ([email protected]).


This paper is available on arxiv under CC BY 4.0 DEED license.