sia.hackernoon.com

Table of Links

5. Discussion and Future Work

6. Conclusion, Acknowledgments, and References

5 DISCUSSION AND FUTURE WORK

5.1 LLM, Your pAIr Programmer?

Before the occurrence of LLM-based tools that claim to be “your AI pair programmer [26],” people have already been developing AI-powered systems to assist programmers, such as code completion tools (e.g., Tabnine), code refactoring and formal verification systems, and code synthesis and debugging tools. The evaluation focus has mostly been on usability design, cost-efficiency, and productivity [53, 56], but not on the feasibility of using these AI-assisted programming tools as the pair programming partner.

With recent advancements in generative LLM technologies, commercial AI tools like Copilot which are capable of offering real-time code suggestions and feedback beyond auto-completion seem to have a closer resemblance to a pair programming partner [12]. Many studies have evaluated and critiqued Copilot’s ability to generate correct, efficient [21, 58], secure [7, 62], readable [3], and verifiable [88] code. Without doubt, Copilot generates defects and errors in its suggested code, but humans are far from error-free either. A programmer cannot be and does not need to be perfect to bring benefit into the pair programming experiences, but would Copilot be qualified as a programming partner?

In answering this question, researchers start to look into the interaction dynamics between programmers and the claimed AI pair programmer. Some researchers argue against the characterization of AI-assisted programming as pair programming. They believe that the analogy to human-AI pair programming is rather superficial, as what makes human-human pair programming effective (e.g., productive communication) disappear in human-AI pair programming. According to Sarkar et al. [72], “LLM-assisted programming ought to be viewed as a new way of programming with its own distinct properties and challenges.”

We used the phrase “human-AI pAIr programming” in this paper, simply because we adopt the definition of pair programming that a pair work on the same device and the same task, so we can conveniently compare human and AI as a pair programming partner. As reviewed in Section 3 and Section 4, Copilot and a human partner share a lot of similar outcomes in pair programming, but the moderators for human-AI pair programming are less examined. We believe that this comparison is meaningful in that it helps us derive insights to keep improving LLM-based programming tools.

Note that in this paper, we mostly covered studies using the VSCode Extension Copilot. Tools like ChatGPT may support the communication aspect better than Copilot [82], and there are also Bard developed by Google [27] and an experimental version of Copilot Labs by Github [25], which support more functionalities such as fix bug, clean, and customizable prompts. Those tools may already improve the human-AI pair programming interaction in some ways, so future studies could also compare across a variety of LLMbased programming tools.

There is another challenge to describing AI as a pair programmer, following the debate on anthropomorphizing user interfaces [74] and ongoing discussion as AI demonstrates increasing capabilities to replicate human behaviors [43, 80]. The concern is that anthropomorphized AI could mislead designers and deceive users, impede user agency and responsibility, have deeper ethical and social risks, and may not be more effective anyways.

However, in educational literature, researchers have been trying to make agents provide naturalistic and human-like interactions with students, using teachable agents [13, 59], pedagogical agents [44, 46, 49], conversational agent [69, 71], etc. Kuttal et al. [41] explored the trade-off of using a human vs. AI agent as the pair programming partner. They found that human-human and human-AI led to similar productivity, code quality, and self-efficacy results, and students “trusted and showed humility towards agents.” They also found that AI agents successfully facilitated knowledge transfer while failing at providing logical explanations or discussions.

Those anthropomorphized agents mostly seem to be effective in improving learning and motivation [32, 73]. Some explained the effects using social agency theory [49], cognitive load theory [44], and social cues related multimedia learning principles [48]. How well can we apply these theories to LLM-supported AI agents, and what’s different in industry versus educational context would be interesting to explore. More works are welcomed to create a shared vocabulary for this field.

Authors:

(1) Qianou Ma (Corresponding author), Carnegie Mellon University, Pittsburgh, USA ([email protected]);

(2) Tongshuang Wu, Carnegie Mellon University, Pittsburgh, USA ([email protected]);

(3) Kenneth Koedinger, Carnegie Mellon University, Pittsburgh, USA ([email protected]).

This paper is available on arxiv under CC BY-NC-SA 4.0 DEED license.