sia.hackernoon.com

Our methodology is intentionally designed to support implementation in practice, specifically within broader algorithmic auditing frameworks. Broad system-level frameworks, such as “SMACTR" or “SIIM", provide essential structure to approaching the ambiguous problem space of auditing for harms and bias in industrial systems [9, 47]. Our proposed framework falls more in line with the idea of "disaggregated evaluations," which is targeted by the SIIM framework or the "Testing" step of SMACTR, where the focus lies on analyzing AI outputs for harm or bias [7, 9, 47]. We present our framework for a disaggregated evaluation of latent factor recommendation algorithms leveraging the SIIM framework for analyzing recommendation bias in practice [9]. This framework consists of four steps: scope, identify, implement, and monitor and flag. The first step, scope, addresses the problem of determining “what” to analyze. For example, what sensitive attribute should be analyzed in our LFR model vector output? The second step, identify, focuses on determining the best-suited methodologies for said analysis based on the scope and outputs of the system. The third step, implement, represents the time dedicated to conducting the analysis and determining how to manipulate the data to leverage identified methods. Finally, the fourth step, monitor and flag, answers the vital question of if significant levels of bias exist within the system.

Our methodology framework provides practitioners with guidance to analyze attribute association bias systematically with the SIIM framework. We do this by specifically focusing on how to scope the attribute to be measured for bias, identifying and implementing our methodologies to determine the existence of bias, and then finally, flagging for significant results and significant changes in bias after mitigation. The following three sections capture these four steps: scope, identify & implement, and flag. We combine identify and implement to guide when to leverage a method in conjunction with quantitative instructions for implementation. We focus on flagging over monitoring since our framework addresses ad hoc testing for significance rather than setting baselines for long-term systematic monitoring. It is an essential, yet challenging, task to standardize the creation of baselines for monitoring, since determining levels of harm is highly context specific [36, 37]. Providing practical guidance for setting baselines is out of the scope of this paper but can be seen as an impactful direction for future research.

4.1 Scope

When approaching bias evaluation, practitioners must first scope what they wish to measure [9]. In the case of attribute association bias, one must define which attribute needs to be targeted during the evaluation. For our proposed evaluation framework, an attribute can refer to any entity characteristic defined by a binary relationship differentiating two entity groups (e.g., “male” and “female” for binary gender). Scoping the attribute will involve defining the two characteristically opposing groups of entity embeddings for calculating the attribute association bias. In addition to defining the attribute and its respective entity groups, one must determine what groups of entity embeddings should be tested for attribute association bias. For example, if one defines user gender as the attribute, the practitioner would need to determine how to group items as test entities for analysis. However, if the practitioner wishes to observe artist gender bias, they would need to define groups of user entities for their evaluation. The test entity embeddings should be considered based on their perceived risk of attribute association bias. For example, if a group of entities is historically stereotyped, that group’s embeddings could be a candidate for attribute association bias testing.

4.2 Identify & Implement

The next step of our framework focuses on identifying and implementing methods for evaluating attribute association bias. First, we introduce and provide implementation instructions for exploring attribute association bias. In addition to introducing these methods, we provide guidance for identifying which methods to use during the evaluation to help practitioners find the correct methods for their specific use case We present four categories of evaluation methods for exploring attribute association bias in latent factor recommendation algorithms: (1) latent space visualization, (2) bias directions, (3) bias evaluation metrics, and (4) classification for explaining bias. We introduce these methods in order of the type of analysis the practitioner wishes to conduct, from initial exploration to targeted measurement for determining mitigation needs. Thus, we provide support for evaluating bias across different phases of analysis while addressing bias and harm within one’s recommendation system. When first exploring the existence of attribute association bias, we suggest implementing latent space visualization and bias direction methods.

These two methods may alert practitioners to the existence of significant levels of attribute association bias. Latent space visualization provides an easily interpretable view of the attribute. It can signal when more analysis is needed, but it should not be used as a quantitative measurement for the existence of bias. The next group of methods, bias directions, provides quantitative means for determining if a significant relationship exists between the entities and the attribute. One can leverage these methods to answer the question: “does attribute association bias exist?”

If a clear attribute relationship has been identified, one may investigate the problem further by measuring the level of bias present. This type of level setting and direct measurement can be done by implementing bias evaluation metrics and classification techniques to test for significant levels of attribute association bias and create statistical baselines for evaluating if mitigations are successful. Practitioners can implement these methods to address the question: “how strong is the level of attribute association bias in my system?” When describing implementation details, we refer to the attribute-defining entity sets as 𝐴 and 𝐵. Each entity, 𝑎 ∈ and 𝑏 ∈ 𝐵, is assigned a binary label representing the attribute, with the label of set 𝐴 being one and that of 𝐵 being zero, or vice versa. These two entity sets can be considered opposing if their labeled attribute is mutually exclusive. Entity sets used to test for attribute association bias will be referred to as 𝐸 and 𝑃. It is assumed that one entity set is hypothesized to show heavier stereotyping towards one of the opposing attribute entity sets.

4.2.1 Latent Space Visualization. Latent space visualization has been shown to be a valuable technique for qualitatively evaluating entity relationships [35]. This method can be an effective first step before implementing quantitative methodologies for bias measurement. In past work, dimension reduction techniques such as principal component analysis (PCA) and t-distributed stochastic neighbor embedding (t-SNE [56]) have been used to visualize latent variables in high dimensional data. In one such example, Gonen and Goldberg [30] use t-SNE to visualize gender clusters in word embeddings. We adapt this methodology to understand the latent representations of attribute associations in our feature embeddings. However, unlike earlier work, we suggest using PCA instead of t-SNE to visualize feature embeddings when evaluating grouping. t-SNE representations privilege representations of local, rather than global, structures in high-dimensional data, thereby obfuscating the relationships between attribute clusters in visualizations [56]. Since we recommend dimensionality reduction visualization techniques as a qualitative method to understand the extent of clustering around attributes, t-SNE’s distortion of global geometry is consequential. We also use PCA to apply learned mapping to features not used in training, which would not have been possible with t-SNE. We recommend using PCA to visualize the first and second principal components, which capture the directions of most variability in feature embeddings [13].

4.2.2 Bias Directions. Calculating attribute bias directions can serve as another method for exploring the existence of bias in one’s system. These attribute association bias direction vectors represent how the attribute is distinguished as a vector direction between 𝐴 and 𝐵 within the trained latent space. These vectors can be used for: (a) exploring individual entities by identifying users or items whose embeddings have high similarity with a particular attribute association bias vector for further examination; (b) comparing recommendation systems by using the bias vectors to calculate association bias metrics (§4.2.3) for each system; and (c) exploring classification scenarios (§4.2.4). We present three methods for computing attribute association bias direction vectors: centroid difference, SVC vector direction, and PCA vector direction. Unlike related work in NLP association bias research (e.g., [13]), the centroid difference and SVC vector direction calculations do not require practitioners to have distinct representation embedding pairings between entities in 𝐴 and 𝐵, making them suitable for recommendation systems and data

Centroid Difference. The simplest method for computing an attribute’s association bias vector is to take the difference between the centroid of 𝐴 and the centroid of 𝐵 (also referred to as attribute vector mapping [35]). This method is best used for capturing differences in average attribute behavior. The centroid method is the most readily interpretable of the three due to its simple calculation and thus serves as a good starting point for exploring the attribute space. However, it is essential to note that this method tends to be more conservative in estimating bias due to variance being averaged out in the process. It may not adequately capture significant nuances in behavior within the space, and other direction techniques may be required to reflect more complex attribute bias behavior.

SVC Vector Direction. Our second approach computes the association bias vector using parameters from a linear support classification (SVC) model trained to predict the attribute. We draw inspiration for this technique from past NLP research that trained SVC models to predict grammatical gender in word embeddings [45, 66]. The entity vectors and labels in sets 𝐴 and 𝐵 are used as training data for the SVC model. The attribute bias direction is created from the final attribute layer of the model to capture the subspace representing significant attribute meaning. The selection and assignment of entities to 𝐴 and 𝐵 can substantially impact the computed bias direction; in our case study (§5), we compare bias directions computed on random samples of users versus most stereotypically-gendered ones. This direction methodology is best used to capture more distinct nuances of attribute bias that may be lost when entity vectors are averaged in the centroid difference method.

PCA Vector Direction. The final method calculates the bias direction by using the general methodology introduced by Bolukbasi et al. [13], which is based on conducting principal component analysis (PCA) on parallel attribute pair vectors {(𝑎0, 𝑏0), . . . , (𝑎|𝐴| , 𝑏|𝐵|)}. The final attribute bias direction is the first eigenvector of these vectors, capturing the majority of the variance found describing the group of vector pairs. Similar to the methods above, two groups of opposing vectors must be defined to create the final pairing of vectors. However, unlike the two methods above, implementing PCA for vector direction creation requires distinct attribute pair vectors. This better enables visualization for transparency, but should only be used if the practitioner is confident in their entity pairings for defining the attribute. Therefore, this may not be a good starting point for bias exploration. We show this caveat in our case study by presenting the downfalls of randomly selecting attribute entity pairs for creating a PCA vector direction.

4.2.3 Bias Evaluation Metrics. We propose two metric methods for capturing attribute association bias culminating from LFR algorithms. Two NLP techniques inspire these evaluation methods for evaluating latent gender bias in word embeddings: Word Embedding Association Test (WEAT [15]) and Relational Inner Product Association (RIPA [27]). We chose these methods based on their acceptance within the NLP community as reliable metrics for measuring bias in vector representations of words [21, 61]. In this section, we present adaptations of WEAT and RIPA for use in recommendation settings.

Entity Attribute Association Metrics & Test. This set of metrics, inspired by WEAT [15], can be used to understand how attribute association bias manifests in user-user and user-item comparisons by computing vector similarity between entities of interest and members of the two attribute groups defined previously (𝐴 and 𝐵). These metrics require two sets of users or items to evaluate the attribute association bias (𝐸 and 𝑃), where one entity set is hypothesized to show heavier stereotyping than the other. There are three interrelated metrics: entity attribute association (EAA), group entity attribute association (GEAA), and differential entity attribute association (DEAA)

EAA measures the attribute association bias for a single entity 𝜀 ∈ {𝐸∪𝑃}, calculated as the difference in mean cosine similarity of 𝜀 to attribute entities in 𝐴 and 𝐵. Positive EAA scores represent a higher association with attribute 𝐴 while negative scores signal higher association with attribute 𝐵:

Recommendation Relational Inner Product Association (R-RIPA). We also provide a metric, R-RIPA, that is similar to the prior metrics, but parameterized by a userdefined attribute bias direction. This provides more flexibility for the practitioner to use computed attribute association bias vectors based on SVC and PCA, or other user-defined attribute directions in general. Additionally, this metric may be more robust to fluctuations or outliers that can affect metrics heavily reliant on group averages over entities. We base this metric on RIPA [27], which is calculated with a relation vector representing the first principal component of the difference between word pairings in an attributedefining set. We modify RIPA to require a relation vector 𝜓 that represents a user-defined attribute bias direction between 𝐴 and 𝐵. R-RIPA for an entity set 𝐸 is computed as:

4.2.4 Classification for Explaining Bias. Past NLP research has used classification models to show how heavily word embeddings capture bias and demonstrate possible downstream effects, particularly along the lines of binary gender [8, 30]. We propose similar use of classifiers on entity representations to explore attribute association bias in recommendation settings. More specifically, we propose training a classifier on user or item embeddings and their target attribute, meaning the model is trained on entity sets 𝐴 and 𝐵 and their associated attribute labels. This classifier can then be leveraged to explore how attribute bias and stereotypes are captured within the trained latent space, e.g., by comparing predictions of new entities not in𝐴 and 𝐵. This method is especially advantageous when assessing the potential for amplifying representation harm when using item or user embeddings in models downstream from the original recommendation system; we demonstrate its utility in our case study (§5).

4.3 Flag

When implementing an audit or evaluation of bias, one of the most important steps is determining if bias exists, requires mitigation, or has fluctuated significantly due to a system change or recently completed mitigation [9]. We introduce significance testing methods for the quantitative methods described in the previous section (bias directions, bias evaluation metrics, and classification results). Note that we do not suggest specific baselines for flagging required mitigation since baselines should be determined on a case-by-case basis [9]. This section does not address interpreting visualizations given their qualitative nature, but in general, the practitioner should look for clear delineation between entity attribute and test sets. A widening or narrowing of the linear separation would signal changes in latent visualization.

4.3.1 Bias Directions. Unlike testing significance for metrics, testing the significance of a latent direction requires validating that the direction captures attribute behavior. We aim to determine that the direction is not capturing a random relationship between entity vectors but a distinct attributerelated relationship. We suggest the following comparisons for significance testing:

• Cosine similarities between the bias direction and entities in opposing entity sets 𝐴 and 𝐵: This test determines if the two sets have significantly different relationships with the bias direction. Suppose the cosine similarities for 𝐴 and 𝐵 are not statistically significant. In that case, one can assume that the direction is not capturing a significant attribute difference between the two entity groups or that there is no attribute difference to be captured.

• Cosine similarity between the bias direction and entities in 𝐴 and 𝐵 versus the entities’ cosine similarity with a randomly-sampled vector: This test examines if the entity sets have a statistically significant relationship with the calculated bias direction versus a random direction. If testing the attribute cosine similarity and random cosine similarity results in significance, one can assume that the attribute direction vector captures behavior that does not occur randomly.

• Cosine similarity of entity vectors from 𝐴 (or 𝐵) with the bias direction and that of random vectors with the bias direction: This test builds upon the previous test to determine if the relationship between the entity sets and the bias direction is significant. It specifies that the entity has a significant relationship with the bias direction in comparison to random entities and the direction. This further validates that the relationship between the entities and computed bias direction is not random

All three tests must show statistical significance for one to determine that the bias direction captures a non-random relationship between the two attribute-defining entity sets and the calculated bias directions. Given the number of statistical tests conducted, one may wish to leverage a p-value with the Bonferroni correction or other techniques to account for multiple tests for significance [51].

4.3.2 Bias Evaluation Metrics. Permutation tests can be used to determine the significance of the attribute association bias evaluation metrics [15, 51]. When evaluating EAA metrics, one can test for the significance of the entity-specific metric, GEAA, and the entity-difference metric, DEAA. A significant GEAA means a biased relationship exists between the entity group and the defined attribute. A significant DEAA represents a significant difference in the level of attribute association bias between the two entity test sets. For R-RIPA, permutation tests can be used to determine if specific entity set association attribute bias is significant. One can test for a significant difference in attribute association bias between entity test sets by comparing the two populations’ cosine similarity scores with the bias direction leveraged when calculating R-RIPA. An alternative method is to calculate R-RIPA for smaller samples of the test entity sets and apply the Wilcoxon rank-sum test or similar [51].

4.3.3 Classification Results. Analyzing the results of classification scenarios should account for how the scenario was scoped and the goal of the bias evaluation. For example, suppose a practitioner is analyzing the potential for attribute association bias in a downstream model that uses learned item embeddings from a recommendation system. In that case, they should first measure overall accuracy to determine if the embeddings relay the attribute correctly. If the practitioner is also concerned with unfair levels of attribute association bias across items in specific stereotype groups, one could leverage classification fairness metrics to compare performance across specified groups, such as, demographic parity or equalized odds[39]. We refer to Mehrabi et al. [39] for an overview of classification fairness or bias metrics in such scenarios.

Authors:

This paper is available on arxiv under CC by 4.0 Deed (Attribution 4.0 International) license.

A Practical Framework for Auditing Bias in Recommendation Algorithms

Table of links

4 Methodology