Exploring Flower: A Federated Learning Framework
In this blog post, I introduce Flower, an innovative federated learning framework developed at the University of Oxford, now thriving as a startup named Flower Labs. Federated learning is crucial because it enables models to be trained directly where data is stored, eliminating the need to transfer sensitive information. Federated learning approach is particularly valuable in industries like healthcare and finance, where data privacy is paramount. By keeping data local, it reduces network strain and allows for the use of diverse datasets that are challenging to centralize.
Flower stands out by addressing a significant challenge in federated learning: supporting scalable execution of federated learning on mobile and edge devices. While Flower excels in addressing the challenge, other frameworks like TensorFlow Federated have enabled experimentation with federated learning, but without supporting for running federated learning on edge devices.
Flower simplifies the training of machine learning models by providing insights into federated learning and offering metrics for model evaluation. It is open-source on GitHub, allowing users to handle various tasks and utilize their preferred machine learning frameworks. The key design principle of Flower is to sit on top of existing deep learning frameworks and be compatible with these frameworks like PyTorch. The figure below depicts Flower components that I will explain later in this blog.
The Flower abstracts away the complexities of implementing federated learning algorithms and system configurations at a low level. Instead, it provides an interface that researchers can use to define, configure, and execute federated learning experiments. Flower includes built-in tools for simulating tough conditions in a cloud setup, ensuring FL algorithms can be realistically evaluated. It is also built to scale, supporting research with many connected clients and enabling concurrent training across a large number of devices.
The main intention of Flower is to provide a framework which would (a) allow to perform similar research using a common framework and (b) enable to run those experiments on a large number of heterogeneous devices.
To recap, Flower is an end-to-end federated learning framework that enables transition from experimental research conducted in simulations to practical system research on a large scale on real edge devices. The Flower has provided four use cases:
- Scale experiments to large cohorts: Flower supports both a large client pool size and a high number of clients training concurrently, enabling researchers to effectively understand how their algorithms generalize.
- Experiment on heterogeneous devices: Flower provides tools to simulate and execute federated learning on heterogeneous edge devices, allowing researchers to quantify the impact of system heterogeneity.
- Transition from simulation to real devices: Flower enables the transition from simulated environments to real-world applications. It also supports mixed simulations and real device deployments with varying levels of realism in compute and network conditions.
- Multi-framework workloads: Flower enables diverse machine learning frameworks to allow the integration of updates from clients using different frameworks within the same federated learning workload like PyTorch, TensorFlow, and custom device-specific algorithms.
Flower’s core architecture divides into two levels known as global and local computations:
1. Global Computations are handled on the server side, which includes selecting clients, configuring parameters, aggregating updates, and evaluating models. Each strategy within Flower represents a specific federated learning (FL) algorithm like FedAvg.
2. Local Computations are tasks focus on training and evaluating models. Flower supports a variety of machine learning pipelines by offering methods that are agnostic to specific frameworks, whether through its protocol or high-level client management. This setup ensures flexibility and scalability in FL experiments, as depicted in the Figure 2 below. architecture.
3. Server Components are like the ClientManager, responsible for managing a collection of ClientProxy objects. These proxies represent individual clients connected to the server. The server also has the FL Loop, which coordinates the overall federated learning process and interacts with the strategy to configure rounds, aggregate results, and manage evaluations.
Now that we have defined the core architecture of Flower, let’s know about the virtual client engine(VCE). VCE plays a crucial role in optimizing hardware utilization and enhancing scalability in federated learning experiments. Each client’s compute and memory requirements, along with FL-specific parameters such as the number of clients per round, the VCE launches Flower clients in a resource-aware manner. It handles scheduling, instantiation, and execution transparently for both users and the Flower server. This capability ensures that hardware resources are fully utilized across setups ranging from desktop machines to single GPU racks or multi-node GPU clusters, without needing reconfiguration.
Flower’s implementation focuses on key components essential for federated learning (FL), including a bi-directional gRPC for efficient message exchange. Flower’s architecture is centered on the ClientProxy abstraction that enables seamless interaction with clients for training and evaluation, regardless of communication protocol.
Flower has evaluated scalability, heterogeneity, and realism through comprehensive experiments.
- Scalability: 15 million users, focusing on predicting Amazon book ratings from textual reviews. Experimentation involved varying the number of clients per round from 10 to 1000, employing the FedAvg aggregation method and testing on a fixed set of 1 million clients. Results indicated initial faster convergence with fewer clients per round, up to 500, but slower convergence with 1000 clients due to diverse client data distributions challenging aggregation strategies like FedAvg.
2. Single Machine Experiments: Flower is evaluated against four other federated learning (FL) frameworks, focusing on training times across different FL setups using the FEMNIST dataset. The experiment maintains consistency in the total number of rounds and clients while varying the number of clients per round and local epochs per round.
Flower’s slightly slower performance to overhead from its VCE with a small experiment. When there is an increasing clients per round (c=35) while maintaining one local epoch, Flower remains competitive due to the VCE’s efficient GPU memory allocation. When increasing the workload per client to 100 local epochs with three active clients, Flower significantly outperforms all others, completing the task in about 80 minutes compared to FedJax’s over 173 minutes.
3. Flower enables FL evaluation on real devices: Flower was deployed on various real-world devices like Raspberry Pi, and Android smartphones to assess federated learning (FL) feasibility and system costs.
Fine-Grained Profiling: On android devices, a detailed profiling of FL operations like local SGD, communication, evaluation, and framework overheads was conducted. The overhead includes converting model gradients to GRPC compatible buffers and vice-versa, to enable communication between Java FL clients and a Python FL server.
In evaluating computational heterogeneity across devices, the varying capabilities of devices significantly impact federated learning (FL) performance. For instance, the FedAvg approach assigns each client device a cutoff time (τ) to transmit its model parameters to the server, regardless of local epoch completion. However, the key advantage of Flower’s on-device training capabilities is that we can accurately measure and assign a realistic processor-specific cutoff time for each client.
In evaluating heterogeneity in network speeds, deployed 40 clients on a cloud platform simulating varied download and upload speeds based on global network speeds data. Training time varied significantly for example training completed in 8.9 mins for high-speed clients (Canada) and 108 mins for low-speed clients (Iraq).
In the next blog, I will explain the life cycle of the model and reference the corresponding code functions along the way.
References:
- Flower website https://flower.ai/
- Google TechTalks https://www.youtube.com/watch?v=NaOVX-lp5Fo
- Flower Youtube Channel https://www.youtube.com/@flowerlabs
- Flower white paper https://arxiv.org/pdf/2007.14390