ABSTRACT

INTRODUCTION

SEARCH ENGINES AND WOMEN’S DESCRIPTIVE REPRESENTATION

MEASURING THE EXTENT OF ALGORITHMIC REPRESENTATION

GENERAL DISCUSSION, CONCLUSION AND REFERENCES

MEASURING THE EXTENT OF ALGORITHMIC REPRESENTATION

Study 1: Auditing gender bias in Google image search

To investigate women’s algorithmic representation (H1, H2), we conducted an algorithm audit of Google image searches. Algorithm auditing has become the dominant research method for “diagnosing problematic behavior in algorithmic systems” (Bandy 2021, 1). Such audits often involve the creation of virtual agents used to simulate the user interaction with an algorithmic system under controlled conditions and to generate system outputs (Urman et al. 2022, 2024). The audit focuses on Google image searches for several reasons. First, analyses of Google trends have linked voters’ information search behavior to political events and election outcomes, thus highlighting its relevance of the search engine for political decision-making (Stephens-Davidowitz 2014; Trevisan et al. 2018; Urman and Makhortykh 2023). Second, longitudinal analyses indicate that the average citizen is spending less time reading and more time viewing images (American Academy of Arts & Sciences 2019). Politicians in turn respond to voters’ visual preference by increasingly relying on visual communication for their political image building (Bast 2024; Carpinella and Bauer 2021). Third, a recent large-scale study on representation of professional occupations in Google text and image searches suggests that images were particularly likely to prime and amplify gender bias and thus “come at a critical social cost” (Guilbeault et al. 2024, 6).

For the first study, we manually simulated user activity in 56 different countries with a local IP address (accessed through a VPN service called Le VPN; https://www.le-vpn.com/) in the first week of August 2023.5 The simulation consisted of deploying a virtual agent—that is, a computer script—that was automated to conduct Google image search queries in in each geographical location. For consistency, we used the “.com” version of Google due to the possibility of not having a languagespecific version of the search engine for all cases. We conducted a Google image search query for the country’s lower or single chamber of parliament with the following pattern: [name of legislative body][person] (see Table B1 in the Supplemental Materials). For bicameral systems, the query was repeated for the upper chamber as well. In line with Vlasceanu and Amodio (2022), we used the country’s dominant language to conduct the query and added the term “person” to obtain results related to people rather than buildings. Moreover, the abstract term "person" carries no or only very weak gendered connotations in most languages whereas more specific terms such as citizen, politician, or member of parliament would require choosing a grammatical gender (for a discussion see Vlasceanu and Amodio 2022). For the United States, the queries thus read: “house of representatives person” and “senate person”. For each conducted query, we collected the first 75 images and extracted the number of persons and their gender by means of computer vision using the commercial Amazon Rekognition platform (https://aws.amazon.com/rekognition/).

This yielded a data set of 6,363 images 5All studies were pre-registered and received ethical clearance from the institutional review board. See section A in the Supplemental Materials for more information. 6Amazon Rekognition predicts the gender of a depicted person as a male vs. female binary based on physical appearance. An algorithmic system itself, Amazon Rekognition’s gender prediction has been shown to work best for (white) cis-gender women and men with true positive rates of 95% and 99% respectively (Scheuerman depicting 58,343 persons. We then merged the data at the country-level with women’s actual descriptive representation in these legislative bodies (Inter-parliamentary Union 2024). We used the clean browser and cleared its history after each query to prevent the possible impact of the browser history on search outputs.

Our main measure of algorithmic representation is the share of women of all depicted persons in each image (e.g., 50% in an image with two women and two men). We additionally repeat all analyses with the absolute number of depicted women and men as well as a dummy variable predicting the presence of at least one man or woman in each image. We first test our expectations that search engine algorithms have a baseline bias that results in women’s absolute underrepresentation in Google image outputs compared to men (H1). We run generalized linear mixed effects models with by-country random intercepts to predict the share, presence and number of women in output images. In line with our expectation, the results show a consistent algorithmic underrepresentation of women on all three measures and in search queries for both lower and upper chambers (see Table 1). Of note, the extent of women’s algorithmic underrepresentation (lower: 29.2%, CI[26.8%-31.7%]; upper: 29.1%, CI[26.8%-31.5%]) lies just a few percentage points above their average global descriptive representation of 26.7%.

Next we turn to our hypothesis that women’s algorithmic representation tracks with the gendered distribution into the political roles, as measured by women’s descriptive representation (H2). For this, we assess the correlation between women’s algorithmic and actual descriptive representation. We find significant positive correlations in both chambers (lower: r(56) = 0.37, p = 0.005; upper: r(28) = 0.52, p = 0.004). These associations indicate that the proportion of women in Google images is higher (lower) in countries and chambers with more (fewer) elected women (see Figure 1A).7 Contrary to expectation, the evidence does not indicate a clear pattern of relative algorithmic repreet al. 2019). Commercial gender detection models notoriously perform worse for women and non-white people but retain accuracy rates of 80% and better (Albiero et al. 2020; Schwemmer et al. 2020). We compared the automatically annotated gender with a (binary) manual classification conducted by the authors (n = 300) and achieved satisfying reliability (Krippendorff’s 𝛼 = 0.92). 7These bivariate associations hold even after controlling for country- or query-level predictors (see section C1 in the Supplemental Materials for robustness checks of this finding)

sentation. Figure 1B illustrates the differences between women’s actual and algorithmic representation for all queries. We find no significant difference for the majority of countries and chambers (e.g., the U.S. House of Representatives), indicating that search engines accurately mirror women’s inclusion in most legislative institutions. Google’s search algorithm introduces bias in women’s underrepresentation in 20 cases. For instance, there is a relative underrepresentation of women in the output for the U.S. Senate query by -5.8% percentage points (CI[-11.6%- -0.07%]). However, women are algorithmically overrepresented relative to their descriptive representation in 21 cases.

Study 2: Internal replication and robustness check Study 2 constitutes an internal replication of the first study with the goal to probe the robustness of the previous audit (a) for the time of data collection and (b) the noise due to randomization of search engine outputs (Haim et al. 2017; Urman et al. 2022). Data collection for study 2 included queries for 20 legislative bodies (nine bicameral and two unicameral countries) and took place in March 2024, almost a year after the first study. We deployed 20 virtual agents—rather than just one—which simultaneously conducted the same queries to account for random noise in each Google search. To account for geographical influences on Google searches, we paralleled this procedure in each of the eleven countries by modeling the location of the virtual agents through the set of IP addresses provided by Google Compute Engine.8 For each query we collected and coded up to the first 50 images in the Google search output. This resulted in a data set of 152,098 images depicting 1,324,560 persons

For the analysis we repeat the generalized linear mixed effects models from the previous study with random intercepts for the country of the query, the location in which the agent performed the query, and the agent id. We first test the baseline bias in women’s representation on Google image searches (H1). Confirming results from the previous study, women are underrepresented on all measures (see Table 1). For example, women account for 22.5% (CI[18.2%-26.9%]) of depicted persons in searches for lower parliamentary chambers and 25.0% (CI[20.2%-29.8%]) in those for upper chambers. Next we assess whether the distribution of women’s representation on Google image search outputs mirrors their actual inclusion in governments across countries (H2). We find that a percentage point increase in women’s descriptive representation results in an increase in their algorithmic representation by half a percentage point (b = 0.50, CI[0.46:0.53], p < 0.001; see Table C2.2 in the Supplemental Materials). Mirroring the bivariate analysis from study 1, we find a positive correlation between women’s algorithmic and descriptive representation across 20 legislative bodies (r(20) = 0.33, p = 0.14), though this bivariate association is not statistically significant.

This paper is available on arxiv under CC by 4.0 Deed (Attribution 4.0 International) license.