Table of Links
2. Prior conceptualisations of intelligent assistance for programmers
3. A brief overview of large language models for code generation
4. Commercial programming tools that use large language models
5. Reliability, safety, and security implications of code-generating AI models
6. Usability and design studies of AI-assisted programming
7. Experience reports and 7.1. Writing effective prompts is hard
7.2. The activity of programming shifts towards checking and unfamiliar debugging
7.3. These tools are useful for boilerplate and code reuse
8. The inadequacy of existing metaphors for AI-assisted programming
8.2. AI assistance as compilation
8.3. AI assistance as pair programming
8.4. A distinct way of programming
9. Issues with application to end-user programming
9.1. Issue 1: Intent specification, problem decomposition and computational thinking
9.2. Issue 2: Code correctness, quality and (over)confidence
9.3. Issue 3: Code comprehension and maintenance
9.4. Issue 4: Consequences of automation in end-user programming
9.5. Issue 5: No code, and the dilemma of the direct answer
References
Allamanis, M., Barr, E. T., Devanbu, P. T., & Sutton, C. (2018). A survey of machine learning for big code and naturalness. ACM Comput. Surv., 51(4), 81:1–81:37. Retrieved from <https://doi .org/10.1145/3212695> doi: 10.1145/3212695
Allamanis, M., & Brockschmidt, M. (2017). Smartpaste: Learning to adapt source code. arXiv preprint arXiv:1705.07867.
Austin, J., Odena, A., Nye, M., Bosma, M., Michalewski, H., Dohan, D., . . . Sutton, C. (2021). Program synthesis with large language models. arXiv. Retrieved from https://arxiv.org/abs/2108 .07732 doi: 10.48550/ARXIV.2108.07732
Barik, T., Ford, D., Murphy-Hill, E., & Parnin, C. (2018). How should compilers explain problems to developers? In Proceedings of the 2018 26th acm joint meeting on european software engineering conference and symposium on the foundations of software engineering (pp. 633–643).
Barik, T., Johnson, B., & Murphy-Hill, E. (2015). I heart hacker news: expanding qualitative research findings by analyzing social news websites. In Proceedings of the 2015 10th joint meeting on foundations of software engineering (pp. 882–885).
Barke, S., James, M. B., & Polikarpova, N. (2022). Grounded copilot: How programmers interact with code-generating models. arXiv. Retrieved from https://arxiv.org/abs/2206.15000 doi: 10.48550/ARXIV.2206.15000
Basman, A., Church, L., Klokmose, C. N., & Clark, C. B. (2016). Software and how it lives onembedding live programs in the world around them. In Ppig (p. 19).
Bender, E. M., Gebru, T., McMillan-Major, A., & Shmitchell, S. (2021). On the dangers of stochastic parrots: Can language models be too big? In M. C. Elish, W. Isaac, & R. S. Zemel (Eds.), Facct ’21: 2021 ACM conference on fairness, accountability, and transparency, virtual event / toronto, canada, march 3-10, 2021 (pp. 610–623). ACM. Retrieved from https://doi.org/10.1145/ 3442188.3445922 doi: 10.1145/3442188.3445922
Bergström, I., & Blackwell, A. F. (2016). The practices of programming. In 2016 ieee symposium on visual languages and human-centric computing (vl/hcc) (pp. 190–198).
Blackwell, A. F. (2002a). First steps in programming: A rationale for attention investment models. In Proceedings ieee 2002 symposia on human centric computing languages and environments (pp. 2–10).
Blackwell, A. F. (2002b). What is programming? In Ppig (p. 20).
Bødker, S. (2015). Third-wave hci, 10 years later - participation and sharing. Interactions, 22(5), 24–31. Retrieved from https://doi.org/10.1145/2804405 doi: 10.1145/2804405
Brown, T. B., Mann, B., Ryder, N., Subbiah, M., Kaplan, J., Dhariwal, P., . . . Amodei, D. (2020). Language models are few-shot learners.
Cao, J., Fleming, S. D., Burnett, M., & Scaffidi, C. (2015). Idea garden: Situated support for problem solving by end-user programmers. Interacting with Computers, 27(6), 640–660.
Chalhoub, G., & Sarkar, A. (2022). “It’s Freedom to Put Things Where My Mind Wants”: Understanding and Improving the User Experience of Structuring Data in Spreadsheets. In CHI Conference on Human Factors in Computing Systems. New York, NY, USA: Association for Computing Machinery. Retrieved from https://doi.org/10.1145/3491102.3501833 doi: 10.1145/3491102 .3501833
Chen, M., Tworek, J., Jun, H., Yuan, Q., de Oliveira Pinto, H. P., Kaplan, J., . . . Zaremba, W. (2021). Evaluating large language models trained on code. CoRR, abs/2107.03374. Retrieved from https://arxiv.org/abs/2107.03374
Chen, M., Tworek, J., Jun, H., Yuan, Q., Ponde, H., Kaplan, J., . . . Zaremba, W. (2021). Evaluating large language models trained on code. ArXiv, abs/2107.03374.
Chowdhery, A., Narang, S., Devlin, J., Bosma, M., Mishra, G., Roberts, A., . . . Fiedel, N. (2022). Palm: Scaling language modeling with pathways. ArXiv, abs/2204.02311.
Colmerauer, A., & Roussel, P. (1996). The birth of prolog. In History of programming languages—ii (pp. 331–367).
Devlin, J., Chang, M.-W., Lee, K., & Toutanova, K. (2019, June). BERT: Pre-training of deep bidirectional transformers for language understanding. In Proceedings of the 2019 conference of the north American chapter of the association for computational linguistics: Human language technologies, volume 1 (long and short papers) (pp. 4171–4186). Minneapolis, Minnesota: Association for Computational Linguistics. Retrieved from https://aclanthology.org/N19-1423 doi: 10.18653/v1/N19-1423
Green, T., & Blackwell, A. (1998). Cognitive dimensions of information artefacts: a tutorial. In Bcs hci conference (Vol. 98, pp. 1–75).
Green, T. R. (1989). Cognitive dimensions of notations. People and computers V, 443–460.
Green, T. R., & Petre, M. (1992). When visual programs are harder to read than textual programs. In Human-computer interaction: Tasks and organisation, proceedings of ecce-6 (6th european conference on cognitive ergonomics). gc van der veer, mj tauber, s. bagnarola and m. antavolits. rome, cud (pp. 167–180).
Gulwani, S. (2011). Automating string processing in spreadsheets using input-output examples. In T. Ball & M. Sagiv (Eds.), Proceedings of the 38th ACM SIGPLAN-SIGACT symposium on principles of programming languages, POPL 2011, austin, tx, usa, january 26-28, 2011 (pp. 317–330). ACM. Retrieved from https://doi.org/10.1145/1926385.1926423 doi: 10.1145/1926385 .1926423
Hannay, J. E., Dybå, T., Arisholm, E., & Sjøberg, D. I. (2009). The effectiveness of pair programming: A meta-analysis. Information and software technology, 51(7), 1110–1122.
Henley, A. Z., & Fleming, S. D. (2014). The patchworks code editor: Toward faster navigation with less code arranging and fewer navigation mistakes. In Proceedings of the sigchi conference on human factors in computing systems (pp. 2511–2520).
Hermans, F., Pinzger, M., & van Deursen, A. (2015). Detecting and refactoring code smells in spreadsheet formulas. Empirical Software Engineering, 20(2), 549–575.
Hindle, A., Barr, E. T., Gabel, M., Su, Z., & Devanbu, P. T. (2016). On the naturalness of software. Commun. ACM, 59(5), 122–131. Retrieved from https://doi.org/10.1145/2902362 doi: 10.1145/2902362
Hindle, A., Barr, E. T., Su, Z., Gabel, M., & Devanbu, P. T. (2012). On the naturalness of software. In M. Glinz, G. C. Murphy, & M. Pezzè (Eds.), 34th international conference on software engineering, ICSE 2012, june 2-9, 2012, zurich, switzerland (pp. 837–847). IEEE Computer Society. Retrieved from https://doi.org/10.1109/ICSE.2012.6227135 doi: 10.1109/ICSE.2012.6227135
Hoare, C. A. R. (1969). An axiomatic basis for computer programming. Commun. ACM, 12(10), 576– 580. Retrieved from https://doi.org/10.1145/363235.363259 doi: 10.1145/363235 .363259
Hochreiter, S., & Schmidhuber, J. (1997, nov). Long short-term memory. Neural Comput., 9(8), 1735–1780. Retrieved from https://doi.org/10.1162/neco.1997.9.8.1735 doi: 10 .1162/neco.1997.9.8.1735
Horvitz, E. (1999). Principles of mixed-initiative user interfaces. In Proceedings of the sigchi conference on human factors in computing systems (pp. 159–166).
Hutchins, E. L., Hollan, J. D., & Norman, D. A. (1985). Direct manipulation interfaces. Hum. Comput. Interact., 1(4), 311–338. Retrieved from https://doi.org/10.1207/s15327051hci0104 _2 doi: 10.1207/s15327051hci0104\_2
Imai, S. (2022). Is github copilot a substitute for human pair-programming? an empirical study. In 2022 ieee/acm 44th international conference on software engineering: Companion proceedings (icsecompanion) (pp. 319–321).
Jiang, E., Toh, E., Molina, A., Olson, K., Kayacik, C., Donsbach, A., . . . Terry, M. (2022). Discovering the syntax and strategies of natural language programming with generative language models. In Chi conference on human factors in computing systems (pp. 1–19).
Kery, M. B., & Myers, B. A. (2017). Exploring exploratory programming. In 2017 ieee symposium on visual languages and human-centric computing (vl/hcc) (pp. 25–29).
Ko, A. J., & Myers, B. A. (2004). Designing the whyline: a debugging interface for asking questions about program behavior. In Proceedings of the sigchi conference on human factors in computing systems (pp. 151–158).
Kulesza, T., Amershi, S., Caruana, R., Fisher, D., & Charles, D. (2014). Structured labeling for facilitating concept evolution in machine learning. In Proceedings of the sigchi conference on human factors in computing systems (pp. 3075–3084).
Kurlander, D., Cypher, A., & Halbert, D. C. (1993). Watch what i do: programming by demonstration. MIT press.
Lau, S., Srinivasa Ragavan, S. S., Milne, K., Barik, T., & Sarkar, A. (2021). Tweakit: Supporting enduser programmers who transmogrify code. In Proceedings of the 2021 CHI Conference on Human Factors in Computing Systems (pp. 1–12).
Li, J., Tang, T., Zhao, W. X., & Wen, J.-R. (2021, 8). Pretrained language model for text generation: A survey. In Z.-H. Zhou (Ed.), Proceedings of the thirtieth international joint conference on artificial intelligence, IJCAI-21 (pp. 4492–4499). International Joint Conferences on Artificial Intelligence Organization. Retrieved from https://doi.org/10.24963/ijcai.2021/612 (Survey Track) doi: 10.24963/ijcai.2021/612
Li, Y., Choi, D., Chung, J., Kushman, N., Schrittwieser, J., Leblond, R., . . . Vinyals, O. (2022b). Competition-level code generation with alphacode. arXiv. Retrieved from https://arxiv.org/ abs/2203.07814 doi: 10.48550/ARXIV.2203.07814
Li, Y., Choi, D. H., Chung, J., Kushman, N., Schrittwieser, J., Leblond, R., . . . Vinyals, O. (2022a). Competition-level code generation with alphacode. ArXiv, abs/2203.07814.
Lieberman, H. (2001). Your wish is my command: Programming by example. Morgan Kaufmann.
Lieberman, H., & Liu, H. (2006). Feasibility studies for programming in natural language. In End user development (pp. 459–473). Springer.
Liu, S., Chen, Y., Xie, X., Siow, J. K., & Liu, Y. (2021). Retrieval-augmented generation for code summarization via hybrid GNN. In International conference on learning representations. Retrieved from https://openreview.net/forum?id=zv-typ1gPxA
Lu, S., Guo, D., Ren, S., Huang, J., Svyatkovskiy, A., Blanco, A., . . . Liu, S. (2021). Codexglue: A machine learning benchmark dataset for code understanding and generation. ArXiv, abs/2102.04664.
Luger, E., & Sellen, A. (2016). "like having a really bad pa" the gulf between user expectation and experience of conversational agents. In Proceedings of the 2016 chi conference on human factors in computing systems (pp. 5286–5297).
Macvean, A., Church, L., Daughtry, J., & Citro, C. (2016). Api usability at scale. In Ppig (p. 26).
Madi, N. A. (2022). How readable is model-generated code? examining readability and visual inspection of github copilot. arXiv preprint arXiv:2208.14613.
Marasoiu, M., Church, L., & Blackwell, A. (2015). An empirical investigation of code completion usage by professional software developers. In PPIG.
Mikolov, T., Sutskever, I., Chen, K., Corrado, G. S., & Dean, J. (2013). Distributed representations of words and phrases and their compositionality. In C. Burges, L. Bottou, M. Welling, Z. Ghahramani, & K. Weinberger (Eds.), Advances in neural information processing systems (Vol. 26). Curran Associates, Inc. Retrieved from https://proceedings.neurips.cc/paper/2013/file/ 9aa42b31882ec039965f3c4923ce901b-Paper.pdf
Miller, L. A. (1981). Natural language programming: Styles, strategies, and contrasts. IBM Systems Journal, 20(2), 184–215.
Mou, L., Li, G., Zhang, L., Wang, T., & Jin, Z. (2016). Convolutional neural networks over tree structures for programming language processing. In Aaai.
Mu, J., & Sarkar, A. (2019). Do we need natural language? Exploring restricted language interfaces for complex domains. In Extended Abstracts of the 2019 CHI Conference on Human Factors in Computing Systems (pp. 1–6).
Myers, B. A. (1992). Demonstrational interfaces: A step beyond direct manipulation. Computer, 25(8), 61–73.
Myers, B. A., & Stylos, J. (2016). Improving api usability. Communications of the ACM, 59(6), 62–69.
Nardi, B. A. (1993). A small matter of programming: perspectives on end user computing. MIT press.
Nguyen, A. T., Nguyen, T. T., & Nguyen, T. N. (2015). Divide-and-conquer approach for multi-phase statistical migration for source code (t). 2015 30th IEEE/ACM International Conference on Automated Software Engineering (ASE), 585-596.
Pandita, R., Parnin, C., Hermans, F., & Murphy-Hill, E. (2018). No half-measures: A study of manual and tool-assisted end-user programming tasks in excel. In 2018 ieee symposium on visual languages and human-centric computing (vl/hcc) (pp. 95–103).
Panko, R. R. (2008). Reducing overconfidence in spreadsheet development. arXiv preprint arXiv:0804.0941.
Pearce, H., Ahmad, B., Tan, B., Dolan-Gavitt, B., & Karri, R. (2021). Asleep at the keyboard? assessing the security of github copilot’s code contributions. arXiv. Retrieved from https://arxiv.org/ abs/2108.09293 doi: 10.48550/ARXIV.2108.09293
Piccioni, M., Furia, C. A., & Meyer, B. (2013). An empirical study of api usability. In 2013 acm/ieee international symposium on empirical software engineering and measurement (pp. 5–14).
Potthast, M., Hagen, M., & Stein, B. (2021). The dilemma of the direct answer. In Acm sigir forum (Vol. 54, pp. 1–12).
Raychev, V., Vechev, M. T., & Krause, A. (2015). Predicting program properties from "big code". In S. K. Rajamani & D. Walker (Eds.), Proceedings of the 42nd annual ACM SIGPLAN-SIGACT symposium on principles of programming languages, POPL 2015, mumbai, india, january 15-17, 2015 (pp. 111–124). ACM. Retrieved from https://doi.org/10.1145/2676726.2677009 doi: 10.1145/2676726.2677009&
Rouchy, P. (2006). Aspects of prolog history: Logic programming and professional dynamics. Blekinge Institute of Technology, Sweden).(English). TeamEthno-Online(2), 85–100.
Salge, C. A. D. L., & Berente, N. (2016). Pair programming vs. solo programming: What do we know after 15 years of research? In 2016 49th hawaii international conference on system sciences (hicss) (pp. 5398–5406).
Sarkar, A. (2016). Interactive analytical modelling (Tech. Rep. No. UCAM-CL-TR-920). University of Cambridge, Computer Laboratory. Retrieved from https://www.cl.cam.ac.uk/ techreports/UCAM-CL-TR-920.pdf doi: 10.48456/tr-920
Sarkar, A. (2022, March). Is explainable AI a race against model complexity? In Workshop on Transparency and Explanations in Smart Systems (TeXSS), in conjunction with ACM Intelligent User Interfaces (IUI 2022) (pp. 192–199). Retrieved from http://ceur-ws.org/Vol-3124/ paper22.pdf
Sarkar, A., & Gordon, A. D. (2018, September). How do people learn to use spreadsheets? (work in progress). In Proceedings of the 29th Annual Conference of the Psychology of Programming Interest Group (PPIG 2018) (pp. 28–35).
Sarkar, A., Jamnik, M., Blackwell, A. F., & Spott, M. (2015). Interactive visual machine learning in spreadsheets. In 2015 IEEE Symposium on Visual Languages and Human-Centric Computing (VL/HCC) (pp. 159–163).
Sarkar, A., Srinivasa Ragavan, S., Williams, J., & Gordon, A. D. (2022). End-user encounters with lambda abstraction in spreadsheets: Apollo’s bow or Achilles’ heel? In 2022 IEEE Symposium on Visual Languages and Human-Centric Computing (VL/HCC).
Shneiderman, B., & Norwood, N. (1993). 1.1 direct manipulation: a step beyond programming. Sparks of innovation in human-computer interaction, 17.
Silver, A. (2018, May). Introducing visual studio intellicode. Microsoft. Retrieved from https://devblogs.microsoft.com/visualstudio/introducing-visual -studio-intellicode/
Srinivasa Ragavan, S., Hou, Z., Wang, Y., Gordon, A. D., Zhang, H., & Zhang, D. (2022). Gridbook: Natural language formulas for the spreadsheet grid. In 27th international conference on intelligent user interfaces (p. 345–368). New York, NY, USA: Association for Computing Machinery. Retrieved from https://doi.org/10.1145/3490099.3511161 doi: 10.1145/3490099.3511161
Srinivasa Ragavan, S., Kuttal, S. K., Hill, C., Sarma, A., Piorkowski, D., & Burnett, M. (2016). Foraging among an overabundance of similar variants. In Proceedings of the 2016 chi conference on human factors in computing systems (pp. 3509–3521).
Sutskever, I., Vinyals, O., & Le, Q. V. (2014). Sequence to sequence learning with neural networks. In Proceedings of the 27th international conference on neural information processing systems - volume 2 (p. 3104–3112). Cambridge, MA, USA: MIT Press.
Tanimoto, S. L. (2013). A perspective on the evolution of live programming. In 2013 1st international workshop on live programming (live) (pp. 31–34).
Vaithilingam, P., Zhang, T., & Glassman, E. L. (2022). Expectation vs. experience: Evaluating the usability of code generation tools powered by large language models. In Chi conference on human factors in computing systems extended abstracts (pp. 1–7).
Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A. N., . . . Polosukhin, I. (2017). Attention is all you need. In Proceedings of the 31st international conference on neural information processing systems (p. 6000–6010). Red Hook, NY, USA: Curran Associates Inc.
Wei, J., Goyal, M., Durrett, G., & Dillig, I. (2020). Lambdanet: Probabilistic type inference using graph neural networks. ArXiv, abs/2005.02161.
Wei, Y., Chandrasekaran, N., Gulwani, S., & Hamadi, Y. (2015, May). Building bing developer assistant (Tech. Rep. No. MSR-TR-2015-36). Retrieved from https://www.microsoft.com/en-us/ research/publication/building-bing-developer-assistant/
Weiss, D. (2022, Jun). Blog / tabnine announcements / announcing our next-generation ai models. Tabnine. Retrieved from https://www.tabnine.com/blog/announcing-tabnine-next -generation/
Williams, J., Negreanu, C., Gordon, A. D., & Sarkar, A. (2020). Understanding and inferring units in spreadsheets. In 2020 IEEE Symposium on Visual Languages and Human-Centric Computing (VL/HCC) (pp. 1–9).
Williams, L. A., & Kessler, R. R. (2000). All i really need to know about pair programming i learned in kindergarten. Communications of the ACM, 43(5), 108–114.
Wing, J. (2011). Research notebook: Computational thinking—what and why. The link magazine, 6, 20–23.
Xu, F. F., Alon, U., Neubig, G., & Hellendoorn, V. J. (2022). A systematic evaluation of large language models of code. Proceedings of the 6th ACM SIGPLAN International Symposium on Machine Programming.
Xu, F. F., Vasilescu, B., & Neubig, G. (2022). In-IDE Code Generation from Natural Language: Promise and Challenges. ACM Transactions on Software Engineering and Methodology (TOSEM), 31(2), 1–47.
Yoon, Y., & Myers, B. A. (2015). Supporting selective undo in a code editor. In 2015 ieee/acm 37th ieee international conference on software engineering (Vol. 1, pp. 223–233).
Zhang, H., Jain, A., Khandelwal, G., Kaushik, C., Ge, S., & Hu, W. (2016). Bing developer assistant: improving developer productivity by recommending sample code. In Proceedings of the 2016 24th acm sigsoft international symposium on foundations of software engineering (pp. 956–961).
Ziegler, A. (2021, Jun). Github copilot research recitation. Microsoft. Retrieved from https:// github.blog/2021-06-30-github-copilot-research-recitation/
Ziegler, A., Kalliamvakou, E., Simister, S., Sittampalam, G., Li, A., Rice, A., . . . Aftandilian, E. (2022). Productivity assessment of neural code completion. arXiv preprint arXiv:2205.06537.
Authors:
(1) Advait Sarkar, Microsoft Research, University of Cambridge ([email protected]);
(2) Andrew D. Gordon, Microsoft Research, University of Edinburgh ([email protected]);
(3) Carina Negreanu, Microsoft Research ([email protected]);
(4) Christian Poelitz, Microsoft Research ([email protected]);
(5) Sruti Srinivasa Ragavan, Microsoft Research ([email protected]);
(6) Ben Zorn, Microsoft Research ([email protected]).
This paper is