Abstract and 1. Introduction

2. Methodology and 2.1. Research Questions

2.2. Data Collection

2.3. Data Labelling

2.4. Data Extraction

2.5. Data Analysis

3. Results and Interpretation and 3.1. Type of Problems (RQ1)

3.2. Type of Causes (RQ2)

3.3. Type of Solutions (RQ3)

4. Implications

4.1. Implications for the Copilot Users

4.2. Implications for the Copilot Team

4.3. Implications for Researchers

5. Threats to Validity

6. Related Work

6.1. Evaluating the Quality of Code Generated by Copilot

6.2. Copilot’s Impact on Practical Development and 6.3. Conclusive Summary

7. Conclusions, Data availability, Acknowledgments, CRediT authorship contribution statement and References

7. Conclusions

In this study, we focused on the problems users may encounter when using GitHub Copilot, as well as their underlying causes and potential solutions. After identifying the RQs, we collected data from GitHub Issues, GitHub Discussions, and Stack Overflow. After manual screening, we obtained 476 GitHub issues, 706 GitHub discussions, and 142 SO posts related to Copilot and got a total of 1355 problems, 391 causes, and 497 solutions based on our data extraction criteria. The results indicate that Operation Issue and Compatibility Issue are the most common problems faced by users. Copilot Internal Error, Network Connection Error, and Editor/IDE Compatibility Issue are identified as the most common causes of these problems. Bug Fixed by Copilot, Modify Configuration/Setting and Use Suitable Version are the predominant solution. For the Copilot users, our results suggest that they should carefully review its code suggestions and seek inspiration from them. IDEs or code editors that are officially supported by Copilot can lead to a better user experience. For the Copilot team, it is essential to enhance the compatibility and provide support for a broader range of IDEs and code editors, simplify the configuration process, diversify the code suggestions while improving their quality, and address concerns related to intellectual property and copyright. Additionally, users are asking for more customization options to tailor Copilot’s behavior and more control over the code content generated by Copilot. For researchers, we found that additional time is required for code suggestion verification when utilizing Copilot, thus making code explanation feature especially valuable. Programming tasks in different application domains and the purpose for which users employ Copilot should be taken into consideration when assessing user satisfaction with Copilot.

In the next step, we plan to conduct an industrial survey with code testing experiments to assess the real-world usage of Copilot by users, as well as its performance in terms of security, maintainability, and other aspects. Considering the emergence of LLMs is likely to drive a significant proliferation of AI code assistant tools, a comparison between Copilot and other LLM-based tools holds valuable insights.

Data availability

We have shared the link to our dataset in the reference (Zhou et al., 2024).

Acknowledgments

This work is supported by the National Natural Science Foundation of China under Grant Nos. 62172311 and 62176099, the Natural Science Foundation of Hubei Province of China under Grant No. 2021CFB577, and the Knowledge Innovation Program of Wuhan-Shuguang Project under Grant No. 2022010801020280.

CRediT authorship contribution statement

Xiyu Zhou: Conceptualization, Investigation, Data curation, Formal analysis, Writing - Original draft preparation. Peng Liang: Conceptualization, Methodology, Investigation, Data curation, Supervision, Writing - review and editing. Beiqi Zhang: Investigation, Data curation, Formal Analysis, Writing - Original draft preparation. Zengyang Li: Conceptualization, Methodology, Writing - review and editing. Aakash Ahmad: Conceptualization, Methodology, Writing - review and editing. Mojtaba Shahin: Conceptualization, Methodology, Writing - review and editing. Muhammad Waseem: Methodology, Writing - review and editing.

References

Al Madi, N., 2023. How readable is model-generated code? examining readability and visual inspection of github copilot, in: Proceedings of the 37th International Conference on Automated Software Engineering (ASE), ACM. pp. 1–5.

Asare, O., Nagappan, M., Asokan, N., 2023. Is github’s copilot as bad as humans at introducing vulnerabilities in code? Empirical Software Engineering 28, 129.

Austin, J., Odena, A., Nye, M., Bosma, M., Michalewski, H., Dohan, D., Jiang, E., Cai, C., Terry, M., Le, Q., et al., 2021. Program synthesis with large language models. arXiv preprint abs/2108.07732.

Barke, S., James, M.B., Polikarpova, N., 2023. Grounded copilot: How programmers interact with code-generating models. Proceedings of the ACM on Programming Languages 7, 1–27.

Bird, C., Ford, D., Zimmermann, T., Forsgren, N., Kalliamvakou, E., Lowdermilk, T., Gazit, I., 2023. Taking flight with copilot: Early insights and opportunities of ai-powered pair-programming tools. ACM Queue 20, 35–57.

Campbell, J.L., Quincy, C., Osserman, J., Pedersen, O.K., 2013. Coding indepth semistructured interviews: Problems of unitization and intercoder reliability and agreement. Sociological Methods & Research 42, 294– 320.

Cohen, J., 1960. A coefficient of agreement for nominal scales. Educational and Psychological Measurement 20, 37–46.

Eclipse, 2019. Eclipse Code Recommenders. https:// projects.eclipse.org/projects/technology.recommenders.

Fu, Y., Liang, P., Tahir, A., Li, Z., Shahin, M., Yu, J., 2024. Security weaknesses of copilot generated code in github. arXiv preprint arXiv:2310.02059.

Gartner, 2023. Gartner Identifies the Top 10 Strategic Technology Trends for 2024. https://www.gartner.com/en/newsroom/press-releases/2023- 10-16-gartner-identifies-the-top-10-strategic-technology-trendsfor-2024.

GitHub, 2024a. About GitHub Copilot Enterprise. https: //docs.github.com/en/copilot/github-copilot-enterprise/overview/ about-github-copilot-enterprise.

GitHub, 2024b. Configuring network settings for GitHub Copilot. https://docs.github.com/en/copilot/configuring-github-copilot/ configuring-network-settings-for-github-copilot.

GitHub, 2024c. GitHub Copilot · Your AI Pair Programmer. https: //github.com/features/copilot.

Gustavo, S., Hammond, P., Teo, N., Ramesh, K., Brendan, D.G., Siddharth, G., 2023. Lost at c: A user study on the security implications of large language model code assistants, in: Proceedings of the 32nd USENIX Security Symposium (USENIX Security), USENIX. pp. 2205–2222.

Imai, S., 2022. Is github copilot a substitute for human pair-programming? an empirical study, in: Proceedings of the 44th International Conference on Software Engineering (ICSE): Companion, IEEE. pp. 319–321.

Jaworski, M., Piotrkowski, D., 2023. Study of software developers’ experience using the github copilot tool in the software development process. arXiv preprint abs/2301.04991.

Liang, J.T., Yang, C., Myers, B.A., 2024. A large-scale survey on the usability of ai programming assistants: Successes and challenges, in: Proceedings of the 45th International Conference on Software Engineering (ICSE), ACM. pp. 1–13.

Luan, S., Yang, D., Barnaby, C., Sen, K., Chandra, S., 2019. Aroma: Code recommendation via structural code search. Proceedings of the ACM on Programming Languages 3, 1–28.

Mastropaolo, A., Pascarella, L., Guglielmi, E., Ciniselli, M., Scalabrino, S., Oliveto, R., Bavota, G., 2023. On the robustness of code generation techniques: An empirical study on github copilot, in: Proceedings of the 45th International Conference on Software Engineering (ICSE), IEEE. p. 2149–2160.

Moradi Dakhel, A., Majdinasab, V., Nikanjam, A., Khomh, F., Desmarais, M.C., Jiang, Z.M., 2023. Github copilot ai pair programmer: Asset or liability? Journal of Systems and Software 203, 111734.

Nguyen, N., Nadi, S., 2022. An empirical evaluation of github copilot’s code suggestions, in: Proceedings of the 19th IEEE/ACM International Conference on Mining Software Repositories (MSR), IEEE. pp. 1–5.

Pearce, H., Ahmad, B., Tan, B., Dolan-Gavitt, B., Karri, R., 2022. Asleep at the keyboard? assessing the security of github copilot’s code contributions, in: Proceedings of the 43rd IEEE Symposium on Security and Privacy (S&P), IEEE. pp. 754–768.

Peng, S., Kalliamvakou, E., Cihon, P., Demirer, M., 2023. The impact of ai on developer productivity: Evidence from github copilot. arXiv preprint abs/2302.06590.

Pope, T., 2024. Copilot.vim. https://github.com/github/copilot.vim.

Robillard, M., Walker, R., Zimmermann, T., 2010. Recommendation systems for software engineering. IEEE Software 27, 80–86.

Runeson, P., Höst, M., 2009. Guidelines for conducting and reporting case study research in software engineering. Empirical Software Engineering 14, 131–164.

Siddiq, M.L., Majumder, S.H., Mim, M.R., Jajodia, S., Santos, J.C., 2022. An empirical study of code smells in transformer-based code generation techniques, in: Proceedings of the 22nd International Working Conference on Source Code Analysis and Manipulation (SCAM), IEEE. pp. 71–82.

Sobania, D., Briesch, M., Rothlauf, F., 2022. Choose your programming copilot: A comparison of the program synthesis performance of github copilot and genetic programming, in: Proceedings of the 24th Genetic and Evolutionary Computation Conference (GECCO), ACM. pp. 1019– 1027.

Stol, K.J., Ralph, P., Fitzgerald, B., 2016. Grounded theory in software engineering research: a critical review and guidelines, in: Proceedings of the 38th International conference on software engineering (ICSE), ACM. pp. 120–131.

Vaithilingam, P., Zhang, T., Glassman, E.L., 2022. Expectation vs. experience: Evaluating the usability of code generation tools powered by large language models, in: Proceedings of the 42nd CHI Conference on Human Factors in Computing Systems (CHI) Extended Abstracts, ACM. pp. 1–7.

Wang, C., Hu, J., Gao, C., Jin, Y., Xie, T., Huang, H., Lei, Z., Deng, Y., 2023a. How practitioners expect code completion?, in: Proceedings of the 31st ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering (ESEC/FES), ACM. pp. 1294–1306.

Wang, R., Cheng, R., Ford, D., Zimmermann, T., 2023b. Investigating and designing for trust in ai-powered code generation tools. arXiv preprint abs/2305.11248.

Weisz, J.D., Muller, M., Houde, S., Richards, J., Ross, S.I., Martinez, F., Agarwal, M., Talamadupula, K., 2021. Perfection not required? human-ai partnerships in code translation, in: Proceedings of the 26th International Conference on Intelligent User Interfaces (IUI), ACM. p. 402–412.

Wilkinson, L., 2024. GitHub Copilot drives revenue growth amid subscriber base expansion. https://www.ciodive.com/news/github-copilotsubscriber-count-revenue-growth/706201/.

Yetistiren, B., Ozsoy, I., Tuzun, E., 2022. Assessing the quality of github copilot’s code generation, in: Proceedings of the 18th International Conference on Predictive Models and Data Analytics in Software Engineering (PROMISE), ACM. pp. 62–71.

Zhang, B., Liang, P., Feng, Q., Fu, Y., Li, Z., 2024. Copilot refinement: Addressing code smells in copilot-generated python code. arXiv preprint arXiv:2401.14176.

Zhang, B., Liang, P., Zhou, X., Ahmad, A., Waseem, M., 2023. Demystifying practices, challenges and expected features of using github copilot. International Journal of Software Engineering and Knowledge Engineering 33, 1653–1672.

Zhao, S., 2023. GitHub Copilot Chat. https://github.blog/2023-12- 29-github-copilot-chat-now-generally-available-for-organizationsand-individuals/.

Zhou, X., Liang, P., Zhang, B., Li, Z., Ahmad, A., Shahin, M., Waseem, M., 2024. Dataset of the Paper “Exploring the Problems, their Causes, and Solutions of AI Pair Programming: A Study with Practitioners of GitHub Copilot”. https://doi.org/10.5281/zenodo.11080113.

Ziegler, A., Kalliamvakou, E., Li, X.A., Rice, A., Rifkin, D., Simister, S., Sittampalam, G., Aftandilian, E., 2024. Measuring github copilot’s impact on productivity. Communications of the ACM 67, 54–63.

Authors:

(1) Xiyu Zhou, School of Computer Science, Wuhan University, Wuhan, China ([email protected]);

(2) Peng Liang (Corresponding Author), School of Computer Science, Wuhan University, Wuhan, China ([email protected]);

(3) Beiqi Zhang, School of Computer Science, Wuhan University, Wuhan, China ([email protected]);

(4) Zengyang Li, School of Computer Science, Central China Normal University, Wuhan, China ([email protected]);

(5) Aakash Ahmad, School of Computing and Communications, Lancaster University Leipzig, Leipzig, Germany ([email protected]);

(6) Mojtaba Shahin, School of Computing Technologies, RMIT University, Melbourne, Australia ([email protected]);

(7) Muhammad Waseem, Faculty of Information Technology, University of Jyväskylä, Jyväskylä, Finland ([email protected]).


This paper is available on arxiv under CC BY 4.0 DEED license.