Table of Links
2. Methodology and 2.1. Research Questions
3. Results and Interpretation and 3.1. Type of Problems (RQ1)
4. Implications
4.1. Implications for the Copilot Users
4.2. Implications for the Copilot Team
4.3. Implications for Researchers
6. Related Work
6.1. Evaluating the Quality of Code Generated by Copilot
6.2. Copilot’s Impact on Practical Development and 6.3. Conclusive Summary
3. Results and Interpretation
In this section, we report the results of three RQs and provide their interpretation. The results of Copilot usage problems are categorized into two levels: categories (e.g., Suggestion Content Issue) and types (e.g., LESS EFFICIENT SUGGESTION). Meanwhile, the results for causes and solutions are organized as types only (e.g., Network Connection Error). We also provide the mapping relationship of Copilot related problems to their causes and solutions. As mentioned in Section 2.4.2, only causes that were proven to lead to the problems and solutions that could resolve the problems were extracted and provided in the results. Therefore, not all problems have corresponding causes and solutions. It is worth noting that due to the rapid update of Copilot, some problems and feature requests raised by users have already been addressed in newer releases of Copilot. We identified these two scenarios separately as two types of solutions, i.e., Bug Fixed by Copilot and Feature Implemented by Copilot. However, due to the absence of Copilot version information in the dataset, we could not consider the version information thoroughly in this study. To help understand the taxonomies of problems, causes, and solutions of using Copilot, we provide examples with the “#” symbol, which indicates the “GitHub Issue ID”, “GitHub Discussion ID”, or “SO Post ID” in the dataset (Zhou et al., 2024).
3.1. Type of Problems (RQ1)
Fig. 2 presents the taxonomy of the problems extracted from our dataset. We have identified a total of 1,355 problems related to Copilot usage. It can be observed that Operation Issue (57.5%) accounts for the majority of problems faced by Copilot users. Furthermore, there are a notable number of users who have countered Compatibility Issue (15.6%) when using Copilot in different environments, followed by users who have raised Feature Request (15.0%) based on their user experience and requirements. Additionally, smaller percentages were identified as Suggestion Content Issue (4.4%), User Experience Issue (4.3%), and Copyright and Policy Issue (3.3%).
3.1.1. Operation Issue (57.5%)
Operation Issue refers to a category of obstacles encountered by users when attempting to utilize some of the fundamental functions of Copilot. This category of problems is divided into six types, which are elaborated below.
• FUNCTIONALITY FAILURE refers to the abnormality of code generation-related features provided by Copilot. Copilot offers various interactive features to better engage with users, such as “previous/next suggestion”, “viewing all suggestions”, and “configuration of shortcut keys to accept suggestions”. Users may encounter exceptions when using these features. For example, a user reported that “Copilot no longer suggesting on PyCharm” after a period of not using it (Discussion #11199).
• STARTUP ISSUE refers to errors or malfunctions encountered by users attempting to run Copilot. This issue results in a complete failure of Copilot to execute and is typically accompanied by error messages. Such problems may arise either during a user’s initial usage of Copilot or unexpectedly after several successful runs. For example, a user failed to activate Copilot after installing it on VSCode, and received an error message stating “Cannot find module” (Issue #383).
• AUTHENTICATION FAILURE refers to the issues related to user login and authentication difficulties when using Copilot. Copilot requires users to log in to their GitHub account before using the service. Only users with access permissions (including paid subscriptions, student identity verification, etc.) can use the code generation service of Copilot. During the authentication process, users may encounter AUTHENTICATION FAILURE, resulting in the inability to use Copilot. For example, a user was repeatedly prompted with the message “waiting for GitHub Authentication” in IDEA, and was unable to log in (SO #72505280).
• ACCESSING FAILURE refers to the situation where users fail to access Copilot’s server, which often involves errors related to server connections. A user may encounter an error message like “GitHub Copilot could not connect to server” (Discussion #11801).
• INSTALLATION ISSUE refers to the problems encountered by users during the installation process of Copilot, including installation errors, inability to find installation methods, and other related problems. For instance, a user failed to install Copilot on VSCode insiders, and the server log showed “Error while installing the extension” (Issue #189).
• VERSION CONTROL ISSUE refers to the problems that users encounter when adjusting the version of Copilot or its runtime environment (e.g., IDE), including the inability to upgrade the Copilot version or abnormal issues like continuing to prompt for upgrades even after upgrading. For example, a user reported that “copilot plugin fails to update” when using it in IntelliJ IDEA (Discussion #17298).
Interpretation: We identified Operation Issue at various stages of user interaction with Copilot. Users tend to report these problems and seek assistance, making Operation Issue the most prevalent category of problems related to Copilot. FUNCTIONALITY FAILURE (226), AUTHENTICATION FAILURE (198), and STARTUP ISSUE (193), are the top three types of such problems. We attribute the higher frequency of the first two types to the deficiencies in Copilot’s feature design and stability, which are also influenced by users’ environments in which Copilot operates. AUTHENTICATION FAILURE mainly stems from particular details encountered during the login process when users need to access Copilot with their GitHub accounts.
3.1.2. Compatibility Issue (15.6%)
This category covers the problems that arise from mismatches between Copilot and its runtime environment. Copilot operates as a plugin in various IDEs and code editors (e.g., VSCode and IntelliJ IDEA), and the complexity of the environments Copilot operates on can result in an increased number of compatibility issues. These problems are further classified into three types, which are elaborated below.
• EDITOR/IDE COMPATIBILITY ISSUE refers to issues arising from mismatches between Copilot and its IDE or editor. These problems typically manifest as Copilot being unable to operate properly in a specific IDE or editor. For example, a user previously found that Copilot “does not work in Neovim” while writing a Python program, even though the Copilot status showed that “Copilot: Enabled and Online” (SO #72174839).
• PLUG-IN COMPATIBILITY ISSUE refers to a type of matching issue that arises when Copilot and other plugins are activated and working together at the same time. Such problems can cause partial or complete malfunctions of Copilot and other plugins. They are usually identified through troubleshooting methods such as disabling Copilot or other plugins. For instance, one user reported that “a Keyboard shortcut conflict with Emmet” prevented him from receiving code suggestions generated by Copilot (Issue #47).
• KEYBOARD COMPATIBILITY ISSUE refers to the situation when the functionality of Copilot can not be used in some uncommon keyboard layouts. For example, a user with a German keyboard layout could not use most of the code generation-related features of Copilot (Discussion #7094).
Interpretation: Compatibility Issue arises from the complex environments in which users utilize Copilot, as well as the compatibility robustness of Copilot itself. In the case of EDITOR/IDE COMPATIBILITY ISSUE (136), VSCode, the platform officially recommended for using Copilot, has garnered a higher number of reported issues about compatibility. We also identified many problems in other widely used IDEs, like Visual Studio, IntelliJ IDEA, and PyCharm. The appearance of PLUG-IN COMPATIBILITY ISSUE (70) is less predictable, which often arises when using Copilot with other code completion tools.
3.1.3. Feature Request (15.0%)
Feature Request refers to the features that users request to add or improve based on their experience and actual needs when using Copilot. These feature requests not only help improve the user experience of Copilot but also contribute to the exploration of how AI code generation tools like Copilot can better interact with developers. This category is further divided into four types, as shown below.
• FUNCTION REQUEST refers to the requests for developing new functions in Copilot, which typically arise from users’ genuine needs and difficulties encountered while utilizing the tool. For example, a user requested that Copilot should be able to “look at the context of a project with multiple files”, rather than generate code suggestions based on the context in a single file (SO #73848372).
• INTEGRATION REQUEST refers to a type of request for Copilot to be available on certain platforms or to be integrated with other plugins. This is mainly due to the desire of some users to use Copilot in the environments they are familiar with. For instance, a user called for “Support for Intellij 2022.2 EAP family” (Discussion #17045). The requests for integration also reflect the popularity of Copilot among developers to some extent.
• UI REQUEST refers to requests made by users for changes to the User Interface (UI) of Copilot, such as modifying the appearance of the Copilot icon. These requests generally aim to improve the visual effects and user experience of Copilot. For example, a user requested the addition of a “status indicator” to provide information about the current working status of Copilot (Issue #163).
• PROFESSIONAL COPILOT VERSION refers to requests from some users for a professional version of Copilot. These users are developers from certain companies who hope to receive more professional and reliable code generation services in their actual work. They may have higher requirements for the reliability and security of Copilot’s code, as well as team certification and other aspects. For example, a user asked for “an on-prem version for companies to purchase”, so that they could deploy Copilot in a local environment (Discussion #38858).
Interpretation: For FUNCTION REQUEST (115), we observed that users expressed a desire for greater flexibility in configuring Copilot to align more closely with their development habits. Common requests include the ability to accept Copilot’s suggestions word by word and to specify where Copilot should automatically operate in terms of file types or code development scopes. More innovative demands involve the need for Copilot to provide suggestions according to the whole project, as well as features like code explanation and chat functionality which have been provided in GitHub Copilot Chat (Zhao, 2023). INTEGRATION REQUEST (72) reflects the wish of developers to use Copilot in their familiar environments. This places greater demands on the Copilot team, as we have identified a significant number of Compatibility Issues.
3.1.4. Suggestion Content Issue (4.4%)
This category of problems refers to the issues related to the content of the code generated by Copilot. The generation of code suggestions is the core feature of AI code generation tools like Copilot, and the quality of the suggestions directly determines whether users will adopt them. Therefore, the content of the generated code is naturally an area of concern for users, researchers, and the Copilot team. These problems are further divided into seven specific situations, which are detailed below.
• LOW QUALITY SUGGESTION refers to situations where Copilot is unable to comprehend the context sufficiently to generate useful code. Such code suggestions may not have any syntactical errors, but due to their poor quality, they are unlikely to be adopted by users. For instance, Copilot once generated an empty method containing only a return statement without meeting the requirements specified in the user’s code (Discussion #6631).
• NONSENSICAL SUGGESTION refers to the code suggestions provided by Copilot that are completely irrelevant to the needs of users. Such suggestions are considered almost unusable and provide little heuristic assistance to the user. For example, a user once received an inaccessible fake URL generated by Copilot, which was of no help with his programming task (Discussion #14212).
• SUGGESTION WITH BUGS refers to the situation where Copilot is able to generate relevant code based on the context, but the suggested code contains some bugs. This can result in the program being able to run, but not as the developer intended, or in some cases, it may cause errors or crashes. For example, a user reported that Copilot suggested using “setState(!state)” instead of “setState(true)”, which caused a logical bug in his code (Issue #43).
• INCOMPREHENSIBLE SUGGESTION refers to the situation where Copilot provides potentially useful code suggestions, but due to the complexity of the code or user’s lack of experience, they found it challenging to comprehend the suggested code and need more time to verify its correctness. For example, a user complained that “My Github Copilot just autocompleted it for me, then I scoured the internet trying to find information pertaining to it but could not” (SO #73075410).
• SUGGESTION WITH INVALID SYNTAX refers to the situation where the suggestions generated by Copilot may contain syntax errors that prevent the program from running properly. For example, a user found that the code suggestions provided by Copilot “missed curly brackets, leading to erroneous code when accepting suggestion” (Discussion #38941).
• LESS EFFICIENT SUGGESTION refers to the code suggestions generated by Copilot that are functionally correct and meet the requirements of users, but may suffer from suboptimal execution efficiency or convoluted logic, potentially impacting the overall quality of the code. For example, when a user requested Copilot to “find the cup with the most water” and “find the maximum amount of water in any cup”, the suggested code performed adequately but was not optimized for efficiency (Issue #162).
• INSECURE SUGGESTION refers to the code suggestions generated by Copilot that introduce security vulnerabilities. For example, a user indicated that the code suggestion lacked accountability for the sizes being read (Discussion #6636).
Interpretation: The quality of code suggestions is a critical factor in determining the capability of Copilot for practical code development. We identified a relatively small number of Suggestion Content Issues, possibly indicating that users are less inclined to report problems related to suggested code compared to usage-related problems. Among these problems, LOW QUALITY SUGGESTION (25), NONSENSICAL SUGGESTION (12), and SUGGESTION WITH BUGS (8) are the three most frequently reported types, while INSECURE SUGGESTION (2) and LESS EFFICIENT SUGGESTION (2) are less prevalent. This result implies that the main concern for users could be whether Copilot can provide code suggestions that have significant referential value.
3.1.5. User Experience Issue (4.3%)
This category covers user feedback on their experience of using Copilot. Compared with Operation Issue, Copilot generally runs and functions as intended, but the user experience is suboptimal. User Experience Issue can provide insights into areas where Copilot could be improved. User Experience Issue can be further classified into four types, which are detailed below.
• POOR FUNCTIONALITY EXPERIENCE refers to a type of user experience problem where the usage of Copilot’s code generation-related functionalities is unsatisfactory. When such problems arise, Copilot’s functionalities remain operational, which is different from FUNCTIONALITY FAILURE. However, users expressed dissatisfaction with their experience when using these functionalities. These problems can often hinder the coordination between users and Copilot, and even decrease the efficiency of development work. Such problems often highlight areas where Copilot could be further enhanced and may potentially motivate users to propose some Feature Requests. For example, a user felt that the code automatically generated and popped up by Copilot was quite noisy (Issue #97).
• POOR SUBSCRIPTION EXPERIENCE refers to the obstacles that users encounter during the process of subscribing to the services of Copilot. Copilot offers several subscription methods (e.g., student verification, paid subscription), leading to some inconvenience for users during this process. For example, one user felt lost and was unsure about what to do next after setting up a billing (Discussion #19119).
• POOR PERFORMANCE refers to performance issues that occur when Copilot is running, which directly impacts the user experience. These problems include high CPU usage, long response times, and overly frequent server access. For example, a user complained that Copilot took “around 1-2 minutes for it to show one suggestion” on VSCode, which was very slow (Discussion #19491).
• POOR AUTHENTICATION EXPERIENCE refers to the inconvenience that users encounter when authenticating their identities before using Copilot. While users successfully navigate the login procedure, they may experience a suboptimal user experience. This could be due to factors such as an unwieldy process flow or the absence of explicit instructions. For example, a user complained that Copilot frequently “prompt to enable GitHub Copilot on every VSCode launch” which can be a significant source of frustration (SO #70065121).
Interpretation: User Experience Issues provide valuable insights into the direction for improving Copilot. Among the POOR FUNCTIONALITY EXPERIENCE (24), the most commonly reported problems involve Copilot’s inline suggestions that cause disruptions to the coding process of users (5) and the inconvenience of not being able to accept certain portions of the suggested code (2). These concerns align with some of the demands mentioned by users in Feature Request, e.g., setting when Copilot can generate code and the length of suggested code.
3.1.6. Copyright and Policy Issue (3.3%)
Copilot is trained on a large corpus of open source code and generates code suggestions based on the users’ code context. The way in which Copilot operates raises concerns regarding potential copyright and policy issues, as expressed by some users. These problems are divided into three types, as shown below.
• CODE COPYRIGHT ISSUE refers to the concerns raised by some code authors regarding the unauthorized use of their open-source code by Copilot for model training. GitHub is currently one of the most popular webbased code hosting platforms, and since the release of Copilot, there have been suspicions among some code authors that their code hosted on GitHub has been used for training without proper consideration of their license. For example, a user “started migration” of his projects on GitHub because he was worried about his code being used to train Copilot without permission (Issue #150).
• CODE TELEMETRY ISSUE refers to the concerns expressed by users regarding Copilot collecting their code to generate suggestions, which may potentially result in the leakage of confidential code. Some users may also simply be unwilling to have their own code, as well as the code generated by Copilot for them, collected for other purposes. For example, a user is concerned that using Copilot might “give away API key” in his code (SO #70559637).
• VIOLATION OF MARKETPLACE POLICY is a specific case where a user reported that Copilot was able to be published on the VSCode marketplace despite using proposed APIs, while other plugins were prohibited. The user suspected that this behavior may be in violation of the Marketplace Policy (Issue #3).
Interpretation: The emergence of Copyright and Policy Issue reveals the concerns of users about the way Copilot works. Copilot is trained on multi-language open-source code and also needs to collect users’ code context during its operation to generate suggestions. These two facts have led people to pay more attention to copyright and intellectual property problems when using Copilot, especially in inhouse development.
Authors:
(1) Xiyu Zhou, School of Computer Science, Wuhan University, Wuhan, China ([email protected]);
(2) Peng Liang (Corresponding Author), School of Computer Science, Wuhan University, Wuhan, China ([email protected]);
(3) Beiqi Zhang, School of Computer Science, Wuhan University, Wuhan, China ([email protected]);
(4) Zengyang Li, School of Computer Science, Central China Normal University, Wuhan, China ([email protected]);
(5) Aakash Ahmad, School of Computing and Communications, Lancaster University Leipzig, Leipzig, Germany ([email protected]);
(6) Mojtaba Shahin, School of Computing Technologies, RMIT University, Melbourne, Australia ([email protected]);
(7) Muhammad Waseem, Faculty of Information Technology, University of Jyväskylä, Jyväskylä, Finland ([email protected]).
This paper is