Abstract and 1. Introduction

2. Contexts, Methods, and Tasks

3. Mixed Outcomes

3.1. Quality and 3.2 Productivity

3.3. Learning and 3.4 Cost

4. Moderators

4.1. Task Types & Complexity

4.2. Compatibility

4.3. Communication

4.4. Collaboration

4.5. Logistics

5. Discussion and Future Work

5.1. LLM, Your pAIr Programmer?

5.2. LLM, A Better pAIr Programmer?

5.3. LLM, Students’ pAIr Programmer?

6. Conclusion, Acknowledgments, and References

4 MODERATORS

In search of the explanations of the cost-benefit of human-human pair programming experiences, researchers have found moderators such as task type & complexity [31], compatibility factors like expertise [6, 67], communication [17, 24, 65],collaboration factors like over-reliance and role-switching [30, 70, 87], and logistics difficulties including scheduling and training [11, 31] (as shown in the bottom rows of Table 1).

These key factors influence the success of human-human pair programming. If they work well, pair programming helps programmers catch errors more easily, solve problems more quickly, review code more thoroughly, and produce overall higher-quality code; it also promotes knowledge sharing among team members, which can lead to a more cohesive and effective team. If not, challenges such as scheduling and finding suitable pairs with compatible working styles usually result in a low cost-efficiency in pair programming, and slow down the development process if there are conflicts or disagreements between pair partners [11, 18].

For human-AI pair programming’s moderators, much was unexplored – we do not know what could make human-AI pair programming more or less effective. Therefore, in this section, we discuss the key moderators that are examined in the human-human pair programming literature, and individual examples of moderating effects are provided in Table 1.

4.1 Task Types & Complexity

For task type and task complexity, Chaparro et al. [16] found that debugging tasks lead to less satisfaction and perceived efficacy compared to comprehension and refactoring tasks. Hannay et al. [31] found that the duration is shorter for low complexity tasks, at the expense of lower quality results, and quality is higher when complexity is higher, but it requires considerably greater effort. Arisholm et al. [6] found that the moderating effect of complexity also depends on the expertise of the pair, where “benefits of correctness on complex system apply mainly to juniors, whereas the reductions in duration to perform the tasks correctly on the simple system apply mainly to intermediates and seniors.”

4.2 Compatibility

Salleh et al. [70] listed multiple factors for pair compatibility, such as personality, perceived skills, actual skills (expertise), self-esteem, gender, and work ethic. Thomas et al. [81] found that paired students with similar self-confidence levels produce their best work. Hannay et al. [30] found that Big Five personality traits only have modest predictive value on pair programming performance, and expertise, task complexity, and country have stronger prediction power in comparison. There also seems to be evidence that women benefit from pair programming more than men do [29, 67].

Expertise as a compatibility factor has been extensively studied in the human-human pair programming literature. For example, researchers found that a student pair performs the best when their expertise is similar [70] and students preferred to be paired with similarly skilled partners [16]. However, in industry, Jensen [36] reported that when both members were near the same capability level and strongly opinionated, the collaboration was counter-productive and troublesome.

In the introductory programming context, Lui and Chan [45] found that pairing up novices results in a larger improvement in productivity than pairing up experts. However, there are concerns about the risk of “the blind leading the blind” if they don’t have an expert to consult with [4]. Researchers also found that less-skilled students learn and enjoy more than more-skilled students in pair programming [16, 47]. However, when the knowledge gap is too large, students can be less satisfied and the benefits of quality may be smaller [60]. Chong and Hurlbutt [17] reported that a novice programmer collaborating with an expert may become disengaged, have lower self-esteem, and be afraid of slowing down or annoying their more-skilled partner [4].

Authors:

(1) Qianou Ma (Corresponding author), Carnegie Mellon University, Pittsburgh, USA ([email protected]);

(2) Tongshuang Wu, Carnegie Mellon University, Pittsburgh, USA ([email protected]);

(3) Kenneth Koedinger, Carnegie Mellon University, Pittsburgh, USA ([email protected]).


This paper is available on arxiv under CC BY-NC-SA 4.0 DEED license.