Global warming is on the rise due to the presence of the highest levels of carbon dioxide, methane, and nitrous oxide levels compared to the past. Data Scientists, data engineers, and cloud experts all have come forward to create a more sustainable environment by following the best practices in Machine Learning.
Machine Learning models create a detrimental effect on the environment when using substantial computational resources and energy while getting trained for thousands of hours on specialized hardware accelerators in data centers.
The average temperature rise has been increasing steadily over the last 3 decades (from 1980), as illustrated in the figure below. All popular Meteorological agencies/bodies show similar trends, which have made environmentalists, geologists, and technology experts in different domains come forward and set certain standards for controlling the temperature rise.
Global average temperature anomaly from 1880 to 2012, compared to the 1951-1980 long-term average. Source: NASA Earth Observatory.
Research in curbing down the energy expenditures from ML models has led to using “state-of-the-art” models that differ from conventional Machine Learning approaches by following a decentralized training. Instead of a Centralized ML with the server responsible for handling all ML training tasks, in Federated Learning, individual devices train their own local data and send the updated model to the cloud/server, which aggregates the model from different devices and pushes the updated model, back to the devices.
With gradual advancements of Federated Learning (FL), the importance of FL in Sustainability has been realized, particularly when rechargeable devices can collect energy from the ambient environment, saving energy cost in networked environments in both wireless and edge networks.

Federated Learning and Client Data and its importance

Federated Learning (FL) settings can be applied to as either cross-silo or cross-device. In a cross-silo scenario, clients are generally few, with high availability during all rounds, and are likely to have similar data distribution for training, e.g., hospitals. This scenario serves more as a use case to consider Independent and Identically Distributed (IID) distributions.

For the 2nd use-case, we can consider a cross-device system that will likely encompass thousands of clients having very different data distributions (non-IID) participating in just a few rounds, e.g. training of next-word prediction models on mobile devices.
Thus FL can be known to serve two different partition schemes: a uniform partition (IID) where each client has approximately the same proportion of each class from the original dataset, and a heterogeneous partition (non-IID) for which each client has an unbalanced and different proportion of each class.
In addition to handling different data distributions, it has been possible to lay the analytical Carbon Footprint Model for FL, which can provide a first-of-its-kind quantitative CO2e emissions estimation method. This can give a detailed study on emissions resulting from both hardware training and communication between servers and clients. This gives a solid foundation base to show the roadmap for future environmentally-friendly federated learning.
Moreover, the FL setup enables researchers to conduct carbon sensitivity analysis on real FL hardware under different settings, strategies, and tasks. The studies and experiments have proposed that CO2e emissions depend on a wide range of hyper-parameters, and emissions from communication between clients and servers can range from 0.4% of total emissions to more than 95%, and efficient strategies can reduce CO2e emissions up to 60%.
FL will continue to cast its long-lasting impact on the total CO2e emission. This might be further facilitated by including sustainable physical location, relevant deep learning tasks, model architecture, FL aggregation strategy, and hardware efficiency.

Why Sustainability in Federated Learning?

One of the most important factors for consideration in FL is quantifying carbon emissions. As research has already demonstrated
proper design of the FL setup leads to a decrease of these emissions, the integration of the released CO2e serves as a crucial metric to the FL deployment.
FL is known to converge quicker with fewer FL rounds on increasing the number of local epochs. However, this does not guarantee
a smaller overall energy consumption.
The below figure illustrates how Federated Learning can have a long-lasting impact on the environment by having efficient algorithms that reduce device to server communications on the one hand and the use of advanced hardware with better processing capabilities and greater transparency on energy consumption.
In comparison to centralized systems, where we see cooling in datacenters accounts for up to 40% of the total energy consumed, FL does not need or use this parameter. On the other hand, FL can use the Power Usage Effectiveness (PUE) ratio.

Use of Renewable Energy availability during training in devices

There are different initiations to compensate Carbon emissions by carbon offsetting or with the purchases of Renewable Energy Credits (RECs, in the USA) or Tradable Green Certificates (TGCs, in the EU). Carbon offsetting is an action initiated to compensate polluting actions via different investments in environment-friendly projects, such as renewable energies or massive tree planting Anderson, (2012).
Devices can also depend on renewable energy resources for their own energy generation, which can be accomplished primarily in 2 ways and that strategizes how devices can send updates to the central server in FL setup.
In the first use-case, as illustrated from the below figure, we see that clients are opportunist about using their energy during the training process, which causes a degradation in performance. Some of the main characteristics of this process are: