Paper Review Of Mesos: A Platform for Fine-Grained Resource Sharing in the Data Center

Salem Alqahtani
4 min readJul 3, 2020

Mesos had launched from AMPlab at Berkeley in 2009 and it became apache project in 2013. Mesos consider popular among other cluster managers. Mesos built based on dynamic scheduling platform to manage cluster resources for many frameworks to drive up resources utilization. Mesos flexibility makes it easy to run many frameworks on top of it, frameworks from different versions, or completely different.

One of the Mesos strengths is data locality, which enables tasks to operate on the same machine that stored input data. Mesos in terms of data locality supports near optimal data locality. Mesos scheduler is distributed and works in two level resource scheduling technique. Basically, mesos decide how much to offer of resources for each framework, and framework scheduler will accept/reject the offering.

In dynamic scenario, there are two types of schedulers at work, the schedulers in frameworks and the global schedulers. The combination of both types of schedulers are called Two-level Schedulers (TLSs).

This gives flexibility for cluster administrators to provide for example, multiple instances of the same framework which can be beneficial for multiple reasons: • Performance isolation, running multiple workloads on frameworks, such that the workloads do not interfere with each other in terms of achieved performance. • Data isolation, end-users may want to use different instances of frameworks for different data-sets, for instance, for security purposes. • Version isolation, running multiple versions of the same framework, can allow the endusers to gradually migrate to a new version of the framework, but keep current, tested, programs running on the old version.

The paper introducing Mesos [20] shows that multiple frameworks can be run simultaneously and execute their workload faster than when those same frameworks are run in a cluster without Mesos, i.e., when each framework has a static partition of nodes in the cluster. The speedup is achieved due to statistical multiplexing: frameworks are allowed to grow beyond their initital partition of the cluster when other frameworks are not using (part of) their partition. However, the experiments and results in the paper do not show how similar frameworks which only differ in workload intensity can achieve a performance balance relative to each other.

It is the responsibility of the cluster scheduler to schedule a job’s tasks according to the job’s requirements as well as placing the tasks on appropriate cluster resources.

The challengings are each framework will have different scheduling needs, based on its programming model, communication pattern, task dependencies, and data placement. Second the scheduling system must scale to clusters of tens of thousands of nodes running hundreds of jobs with millions of tasks. Finally, because all the applications in the cluster depend on mesos the system must be fault tolerant and highly available.

Mesos a new way of dealing with difficulties and complexity of current challenging, is to came up with resource offer.

Why mesos become popular?

One framework can use Mesos to run multiple instances of that framework in the same cluster or multiple versions of the framework. Second, easy of development and immediate experiment with new frameworks.

Mesos achieved isolation, scalability, and fault tolerance.

Mesos placed the control and scheduling in the framework bound for two reasons. First, it allows frameworks to implement diverse approaches to various problems in the cluster. Second, it keeps Mesos simple and minimizes the rate of change required of the system, which makes it easier to keep mesos scalable and robust.

The master implements fine-grained sharing across frameworks uaing resource offers. The master decides how many resources to offer to each framework according to an organizational policy such as fair sharing or priority.

Each framework running on Mesos consists of two components: a scheduler that registers with the master to be offered resources, and an executor process that is launched on slave nodes to run the frameworks tasks.

References:

https://www.slideshare.net/SparkSummit/wampler-chen

https://www.slideshare.net/hadoopusergroup/hug-august-2010-mesos

Scheduling the Spark Framework under the Mesos Resource Manager

https://xduan7.com/2016/02/23/paper-review-mesos-a-platform-for-fine-grained-resource-sharing-in-the-data-center/

http://muratbuffalo.blogspot.com/2017/02/mesos-platform-for-fine-grained.html

http://mesos.apache.org/documentation/latest/

--

--