This story on HackerNoon has a decentralized backup on Sia.
Transaction ID: 4_2q_u1pGSAWlUIS_L9OHfbys-Z27Luz_y9ti_9XP0Q
Cover

Enhancing Data Quality in Federated Fine-Tuning of Foundation Models: Heterogeneity Settings

Written by @computational | Published on 2024/8/29

TL;DR
In this study, researchers propose a data quality control pipeline for federated fine-tuning of foundation models.

Authors:

(1) Wanru Zhao, University of Cambridge, Shanghai AI Laboratory with Equal contribution;

(2) Yaxin Du, Shanghai Jiao Tong University with Equal contribution;

(3) Nicholas D. Lane, University of Cambridge and Flower Labs;

(4) Siheng Chen, Shanghai AI Laboratory and Shanghai Jiao Tong University;

(5) Yanfeng Wang, Shanghai AI Laboratory and Shanghai Jiao Tong University.

B HETEROGENEITY SETTINGS

To model real-world scenario, we designed two heterogeneous settings: NIID-1 and NIID-2. NIID-1 replicates a typical scenario in federated learning classification tasks (Yurochkin et al., 2019; Wang et al., 2020a;b; Li et al., 2021; Shi et al., 2022), where the distribution of low-quality data among clients follows a Dirichlet distribution with parameter β = 1, while ensuring that the volume of data processed by each client remains equal. In contrast, NIID-2 addresses a skewed classification task scenario within FL (McMahan et al., 2017; Li et al., 2020), assigning 70% of low-quality data to half of the clients and 90% to the other half, yet maintaining an equal size of training data across all clients. The distributions for these settings are illustrated in Figure 3. Table2 shows the low-quality data traing and data quality control federated NIID-2 setting.

Table 2: Comparison of four data quality control methods on federated NIID-2 settings with three evaluation metrics.

Figure 3: Data compromisation of high-quality and low-quality data with NIID-1 and NIID-2

This paper is available on arxiv under CC BY 4.0 DEED license.

[story continues]


Written by
@computational
Computational: We take random inputs, follow complex steps, and hope the output makes sense. And then blog about it.

Topics and
tags
machine-learning|federated-fine-tuning|foundation-models|large-language-models|ai-model-training|data-quality-control|fine-tuning-llms|foundation-model-training
This story on HackerNoon has a decentralized backup on Sia.
Transaction ID: 4_2q_u1pGSAWlUIS_L9OHfbys-Z27Luz_y9ti_9XP0Q