Authors:
(1) Amador Duran, I3US Institute, Universidad de Sevilla, Sevilla, Spain and SCORE Lab, Universidad de Sevilla, Sevilla, Spain ([email protected]);
(2) Pablo Fernandez, I3US Institute, Universidad de Sevilla, Sevilla, Spain and SCORE Lab, Universidad de Sevilla, Sevilla, Spain ([email protected]);
(3) Beatriz Bernardez, I3US Institute, Universidad de Sevilla, Sevilla, Spain and SCORE Lab, Universidad de Sevilla, Sevilla, Spain ([email protected]);
(4) Nathaniel Weinman, Computer Science Division, University of California, Berkeley, Berkeley, USA ([email protected]);
(5) Aslıhan Akalın, Computer Science Division, University of California, Berkeley, Berkeley, USA ([email protected]);
(6) Armando Fox, Computer Science Division, University of California, Berkeley, Berkeley, USA ([email protected]).
Table of Links
1.3 Other Gender Identities and 1.4 Structure of the Paper
3 Original Study (Seville Dec, 2021) and 3.1 Participants
3.3 Factors (Independent Variables)
3.4 Response Variables (Dependent Variables)
4 First Replication (Berkeley May, 2022)
5 Discussion and Threats to Validity and 5.1 Operationalization of the Cause Construct — Treatment
5.2 Operationalization of the Effect Construct — Metrics
5.3 Sampling the Population — Participants
6.1 Replication in Different Cultural Background
6.2 Using Chatbots as Partners and AI-based Utterance Coding
Datasets, Compliance with Ethical Standards, Acknowledgements, and References
A. Questionnaire #1 and #2 response items
B. Evolution of the twincode User Interface
C. User Interface of tag-a-chat
Abstract
Context. Women have historically been underrepresented in Software Engineering, due in part to an unwelcoming climate pervaded by the widely-held gender bias that men outperform women at programming. Pair programming is both widely used in industry and has been shown to increase student interest in Software Engineering, particularly among women; but if those same gender biases are also present in pair programming, its potential for attracting women to the field could be thwarted. Objective. We aim to explore the effects of gender bias in pair programming. Specifically, in a remote setting in which students cannot directly observe the gender of their peers, we study whether the perceived productivity, perceived technical competency of the partner, and collaboration/interaction behaviors of Software Engineering students differ depending on the perceived gender of their remote partner. To our knowledge, this is the first study specifically focusing on the impact of gender stereotypes and bias within pairs in pair programming. Method. We have developed an online pair-programming platform (twincode) that provides a collaborative editing window and a chat pane, both of which are heavily instrumented. Students in the control group had no information about their partner’s gender, whereas students in the treatment group could see a gendered avatar representing the other participant as a man or as a woman. The gender of the avatar was swapped between programming tasks to analyze 45 variables related to the collaborative coding behavior, chat utterances, and questionnaire responses of 46 pairs in the original study at the University of Seville, and 23 pairs in the external replication at the University of California, Berkeley. Results. We did not observe any statistically significant effect of the gender bias treatment, nor any interaction between the perceived partner’s gender and subject’s gender, in any of the 45 response variables measured in the original study. In the external replication, we observed statistically significant effects with moderate to large sizes in four of the 45 dependent variables within the experimental group, comparing how subjects acted when their partners were represented as a man or a woman. Conclusions. The results in the original study do not show any clear effect of gender bias in remote pair programming among current Software Engineering students. In the external replication, it seems that students delete more source code characters when they have a woman partner, and communicate using more informal utterances, reflections and yes/no questions when they have a man partner, although these results must be considered carefully because of the small number of subjects in the replication, and because when false discovery rate adjustments are applied, only the result about informal utterances remains significant. In any case, more replications are needed in order to confirm or refute the results in the same and other Software Engineering students populations.
1 Introduction
Besides being widely used in industry, pair programming is becoming increasingly common in Software Engineering education because of its demonstrated positive influence on grades, class performance, confidence, productivity, and motivation to stay in Software Engineering and Computer Science academic majors [12], especially for women, as reported by [60].
In pair programming, two partners work closely together to solve a programming task, in which their ability to engage collaboratively with each other is essential. However, these collaborative interactions can be influenced by implicit gender bias [28], which is a widely observed phenomenon even in highly-structured and professional settings, such as those reported by [30] and [12], and which is based on the assumption that women are less technically competent than men [38]. Since research in the social sciences indicates that an individual’s behavior is clearly affected by the behavior of their peers [17], we aim to explore how and whether gender bias affects the pair programming experience among Software Engineering students.
Our study is based on the hypothesis that gender bias will lead to observable differences based on subjects’ perceptions of the gender of their pair programming partners, i.e. they will score men and women differently on similar tasks, and they will also behave and communicate differently depending on whether they perceive their partner as a man or as a woman, even though their partner remains the same on all tasks. Specifically, in a non-colocated, i.e. remote, pair programming setting in which peer gender cannot be directly observed, our goal is to identify the potential effects of gender bias by observing student pairs when the perceived gender of one of the peers changes.
To study our hypothesis, we have applied methodological triangulation [13], using several methods to collect data and approaching a complex phenomenon like human behavior from more than one standpoint [9]. In our case, three different data sources have been used: (1) questionnaires to measure changes in subjects’ perceptions, (2) data collected automatically during the pair programming tasks to measure behavioral changes, and (3) data produced by several experimenters analyzing the message interchange during the pair programming tasks to measure changes in communication.
Assuming a remote pair programming setting, which has been proved to have similar results than co-located pair programming as reported by [53] and [3], our research questions with respect to subjects’ perceptions are the following:
RQ1 Does gender bias affect perceived productivity compared to solo programming? That is, do perceived differences between in-pair and solo productivity depend on the perceived partner’s gender?
RQ2 Does gender bias affect the partner’s perceived technical competency compared to one’s own technical competency? That is, do perceived differences between one’s own and partners’ technical competency depend on the perceived partner’s gender?
RQ3 Does gender bias affect the partner’s perceived positive and negative aspects? That is, do perceived positive and negative aspects of their partners depend on the perceived partner’s gender?[1].
RQ4 Does gender bias affect how partners’ skills are compared? That is, do perceived partners’ skills depend on the perceived partner’s gender when they are compared?
With respect to the subjects’ behavior during remote pair programming, assuming that gender bias could cause a subject to be more or less proactive on the programming task, or more or less verbose during chatting, our research question—based on what we can automatically measure—is the following:
RQ5 Does gender bias affect the frequencies or relative frequencies with which each partner produces source code additions, source code deletions, successful validations, failed validations, and chat utterances? That is, do these frequencies depend on the perceived partner’s gender?
Regarding subjects’ communication during remote pair programming, we are interested in knowing whether gender bias affects how subjects communicate with their partners, i.e., whether they use a more formal or informal style, and whether they use some types of chat utterances more than others. Our related research questions are the following:
RQ6 Does gender bias affect the relative frequency of formal and informal chat utterances? That is, does the formality of the messages depend on the perceived partner’s gender?
RQ7 Does gender bias affect the frequency or relative frequency of the different types of chat utterances? That is, do the frequencies of the different types of messages depend on the perceived partner’s gender?
This paper is
[1] This research question, and its associated variables, were added after the presentation of the related registered report at ESEM’2021 [16]. We thought that including an open question could improve the data collection process.