Federated Learning 101

11 min readMay 23, 2022

Have you ever noticed how GBoard predicts the next word or emoji for you? Or how about writing half a word in your browser’s address bar to get a list of websites based on your browsing history? Sounds familiar, right?

But what does that have to do with Federated Learning (FL)?

Federated learning (FL) is the technique used to bring all the above to life. This article aims to give you a sneak peek into federated learning and go over concepts like FL definition, the current industry trends, motivation behind it, benefits and challenges of federated learning (FL), and much more.

Industrial trends and know-tos

Before we delve any deeper, it is often motivational to know the market trends or who’s using the technology.

Google introduced the term “Federated Learning” in this research paper in 2016. It has since become an active area of research. The idea, as stated in the research paper, was:

“The data can typically be private, huge, or both, making it difficult to log into the data center and train there. The researchers proposed a method that leaves training data distributed on mobile devices and trains a shared model by aggregating local updates. This decentralized approach was termed ‘federated learning.’”

The Google Assistant uses Federated Learning to keep your voice and audio data private while the Google Assistant advances. Similarly, Apple uses Federated Learning and Differential Privacy to personalize Siri and other applications such as QuickType (Apple’s personalized keyboard) and the ‘Found In Apps’ feature, which searches your calendar and mail apps for unknown callers or texters.

Many companies have applied for patents for federated learning technologies, including Google, Apple, and IBM. In 2021, Bairong Inc gained two national patents for FL.

This article states that the two patents are for “a prediction method and system based on an isolated forest training of Vertical Federation” and “a mobile device credit anti-fraud prediction method and system based on Federated Learning.”

It is a rapidly growing area of interest for big companies and startups. A few are already using it in real life, while others are on their way. Some companies using or researching federated learning are IBM, Microsoft, Oracle, OWKIN, and SHERPA.AI.

Motivation for Federated Learning

If you’re interested in machine learning, you’ve probably come across “centralized machine learning,” all data from each connected device is uploaded to the cloud to train a generic model to distribute and apply to all devices.

Using the generic model for another compatible device is advantageous, but it also limits communication and privacy.

Centralized ML models require data to be shared over a stable connection, which is tough when multiple devices connect to the same server/cloud due to network speed, internet connection availability, network area, battery life, etc. In addition, users cannot share all of their data due to sensitivity and confidentiality concerns.

A classic example of centralized ML would be the recommendation algorithm for query suggestion, search results on e-commerce websites, or Facebook/Instagram’s top content recommendation list.

Another way is to apply machine learning on the Edge!

First, user data is shared with the server, and the model trains on it. After that, the end-user device receives the new model, where it infers from user input and makes appropriate predictions. It reduces latency as the trained model resides on the device.

However, the user must still share their data.

Federated learning helps solve that by training the model on-device. Only anonymous model updates go to the central server, creating an updated global model to send back to the devices. These updates make reasonable predictions based on usage history, but they also make good predictions if you search for something you’ve never searched for before. The new gradients improve suggestions.

FL-trained models can help provide strong suggestions without your usage history based on global recommendations. For example, Android’s “Select & Copy” feature offers tips over the selected content like synonyms, words with similar spellings, links, (and phone) numbers, taking notes, and browsing.

So, what is Federated Learning after all?

Federated Learning (FL), according to McMahan et al. (2017), is a distributed machine learning technique that permits training on a vast corpus of non-IID data stored on IoT devices. Instead of sending data to the server, it brings the model to the data.

Hypothetically, in today’s times, if you were to deploy an application to scale up the users from 100 million to 200 million, just running that infrastructure in the cloud could cost you around 2–3 million dollars per year! The device becomes a compute engine with federated learning, training the models locally. It can both solve the scaling problem and save the cloud’s cost. You can run a recommendation engine for 100 million users at practically zero cloud cost.

Things to remember before using FL:

Data from the device is more relevant than data from the server.
On-device data is either sensitive to privacy or has a lot of labels that could be inferred.
Challenges with limited device availability, i.e., only a subset of devices may be able to participate at any given time. It could be because certain conditions have to be met to send and receive updates, such as the phone should be plugged-in, a stable WiFi connection, etc.
Device-related assumptions should include: devices are anonymous, limited availability in a day, unreliable network, cannot guarantee repeated computation, and no independent and identical distribution (IID) data on devices as users can differ.

How does Federated Learning work?

It all begins with downloading an app on your mobile phone that uses FL, such as GBoard. Once you download the app, the generic ML model is now present on your device. This model gets trained based on your usage data and personalizes itself to make good predictions.

Likewise, many other mobile phones using the same app, such as GBoard, update and personalize the app based on their usage. These personalized model summaries are later sent anonymously to the central server.

The central server then unifies the update summaries received by all the devices and updates the previously sent generic model to the new aggregated (global) model, which is sent back to the devices as an update, and the cycle goes on.

You can keep using the app as usual while also getting the updated revised global model to make the app run better.

How to Evaluate FL models?

The evaluation process for FL models is the same as any other ML model. For example, the following factors could be used to evaluate an FL model:

Model performance: Same as ML model performance evaluation like the area under the curve (AUC), F1-score, root mean squared error (RMSE), cross-entropy, precision, recall, prediction error, mean absolute error, dice coefficient, and perplexity value, etc.
Communication Cost: The number of clients, model size, cost of data interchange between the server and client, data transfer rate, bandwidth, and latency.
Processing Cost: Zero or meager cloud costs as explained above.
Convergence Rate and Dropout Ratio: The accuracy vs. communication rounds ratio, system running time, or training data can help calculate the convergence ratio. The computation overhead against dropout ratios can help compute the epoch dropout ratio.
System Run-time Evaluation: The entire execution time might take a long time, with each FL cycle taking at least 1 week or more to train the model.
Data Security Metrics: It analyzes data security based on gradient confidentiality and updates.

An industrial use-case of Federated Learning

Below are the few industries that could benefit from the use of FL.

Social Media: Federated learning can provide personalized recommendations to users based on their search and viewing histories, such as posting view recommendations when logging into their accounts, friend suggestions, and more.
FinTech: FL can assist the FinTech (Finance and Technology) sector in training models to identify data breaches, the ATO (Account Takeover) frauds, assess credit ratings, understand a user’s footprint to avoid fraudulent actions, and detect financial crime.
eCommerce: eCommerce platforms can utilize federated learning to analyze user behaviors and purchase habits such as items viewed but not bought, items viewed and bought, and train a model to create a personalized shopping experience for the users.
Travel: Federated Learning can predict the taxi travel time to set taxi fares or predict GPS travel mode to assist the user accordingly.
Healthcare Industry: Hospitals have rigorous privacy policies and are bound by legal or ethical constraints that necessitate data to be kept on-premise. Federated learning is a good fit; it reduces data traffic and permits private learning between different devices/organizations. However, differential privacy can provide for improved privacy. Some use cases include the “application of adverse event detection in bulk vaccination programs” and “multi-institutional medical image segmentation.”
Banking and Insurance: Like FinTech, a corporation might recognize its users’ behaviors without breaking the data privacy clause, avoiding fraudulent or unlawful conduct, for example, detecting credit card frauds.
IoT: It enables on-device ML without sharing the user’s data with a central system. Here is a comprehensive survey on federated learning for the Internet of Things.

Real-world applications of Federated Learning

On-Device Item Ranking: Suggest top search results on your phone.
Google Keyboard: Predicts the next word based on previous observations and uses FL for emoji prediction.
Firefox’s Address Bar: Half a word can show a thoroughly ranked website list based on browsing history.
Google Assistant, Apple’s Siri: Personalizes user experience.
Predict Clinical Outcomes or diseases such as tumor detection, Cancer Detection, brain functionality, etc.
Fraudulent Credit Card Detection

Benefits of Federated Learning

Data Security: On-device training datasets eliminate the need to access external data pools. All of the data needed to train the model remains on-device with federated learning. It also reduces the exposure of the data and the attack surface to just one device. Organizations, such as hospitals, can use it for sensitive data computations.
Hyper-Personalization: Instead of training 1 model for all users, you can host 1 model per user. For example, eCommerce platforms can use FL to make product recommendations.
Low Latency Real-time Predictions: FL does not require the transmission of local raw data. Instead, both the model and data are present on your device. Your device models are continuously updated using your input history. As a result, better models can be deployed and tested faster with low latency.
Privacy Awareness: Only the model updates are shared, not the raw data. When combined with Differential Privacy and Secure MultiParty Computations (SMPC), it becomes non-identifiable personal data, thus removing any GDPR and CCPA constraints.
Bluntly Cheap: FL can help run user infrastructures with 100s of users at a cheaper cost when compared to the cloud.
Hardware Efficiency: You don’t need a complicated central server. Smartphones are capable of processing data.
No Internet Required: Your device stores the data and the model. Thus, prediction does not require an internet connection, i.e., to run local models. But, the internet is required to send and receive updates from the central server.

Fundamental Challenges in Federated Learning

Handling Model Bias: If not done correctly, device selection can bias model updates towards faster and more powerful devices. For example, costly high-end smartphones have better performance when compared to low-end smartphones. Since smartphones with better performance can train the model faster, more updates will come from high-end phones than from other phones when sending an update to the central server. It creates a model bias.
Model Security-Poisoning Attacks: Two types of poisoning can happen, namely, data and model poisoning. It’s tough to detect/prevent malicious clients from delivering harmful/fake data during an FL training process, which leads to data poisoning. In contrast, they may modify the model before sending it back to the central server. Hence, it leads to model poisoning. Poisoning attacks in automated vehicles are a classic example of ML poisoning. In these attacks, the attacker adds noise to the image input to train the ML model, causing the automated vehicle system to misclassify the traffic sign. Adding a small square-shaped piece of black tape to the stop signal board, for example, can be used to cause accidents.
No Large Scale Simulations: There are no excellent libraries to test and tune FL algorithms at scale before production. As a result, the diversity of the devices shoots up so high that it becomes difficult for the developers, especially in startups and small tech companies, to optimize them.
System’s Heterogeneity: As edge devices regularly fall off due to connectivity or energy restrictions, any device may be unreliable. As a result, fault tolerance is essential, as appliances may drop out before the training cycle completes. In addition, devices in a federated network may have vastly different storage, processing, and communication capabilities. Federated learning systems must accommodate low participation, accept various technologies, and endure network device failures.
Statistical Heterogeneity: Mobile devices produce and collect data non-identically distributed throughout the network. Different users, for example, utilize other emojis for various purposes. As a result, there’s a chance to misinterpret the interaction between devices and their associated data distributions. It contradicts typical IID assumptions and can complicate modeling, analysis, and assessment.
Data Privacy: Federated learning protects user data by sharing model updates (for example, gradient information) rather than raw data. However, reporting model updates to a third party or the central server may disclose sensitive information to a third party or the central server during the training process. Other privacy techniques, such as Differential Privacy (simple algorithm), Homomorphic Encryption (computation on encrypted data), and Secure Multiparty Computation (SMPC) to keep the original accuracy while providing a high level of anonymity can solve the above problem.

Conclusion

AI is now part of our daily lives in AI keyboards, maps, navigation, self-driving cars, recommendation engines, user activity trackers, etc. However, the massive misuse of user data has now drawn our attention to user privacy and the techniques that could help solve it.

Federated learning is one such privacy-preserving technology that can address user privacy concerns. As discussed throughout this article, it can aid user privacy and provide a more tailored user experience without the user data ever leaving their devices. FL can also assist in resolving the scalability issue followed by the cloud cost. It also reduces the attack surface to only one device, i.e., when one user’s device (say, a phone) is compromised, it affects only that user’s device and not all the devices in the network. Yes, there are workarounds, such as the poisoning attack mentioned above. However, reducing the attack surface can reduce the risks of user data disclosure.

Federated learning allows you to take full advantage of machine learning while keeping your data safe. It is a fast-growing research field, with new advancements made every week. For example, you might be worried about sending your gradient summaries back to the central server. Still, when used with privacy techniques like differential privacy, homomorphic encryption, and SMPC, your data is protected.

To summarize, FL, like any other technology, has pros and cons. The purpose of this post was to familiarize you with it. Undoubtedly, we will be using personalized and optimized applications driven by FL technology in the future. But, as previously mentioned, we already use quite a few of them.

Check out NimbleEdge Blogs for more content like this.

What next?

Look at how basic federated learning algorithms like FL-Avg are implemented and then try to use them in our next article.
You can participate in FL research groups and communities.
You can explore the NimbleEdge community. For example, join the research paper reading group.

Author: Shaistha Fatima