Recent from talks
Nothing was collected or created yet.
Apache Mesos
View on Wikipedia| Apache Mesos | |
|---|---|
| Developer | Apache Software Foundation |
| Stable release | 1.11.0
/ November 24, 2020[1] |
| Written in | C++ |
| Type | Cluster management software |
| License | Apache License 2.0 |
| Website | mesos |
| Repository | Mesos Repository |
Apache Mesos is an open-source project to manage computer clusters. It was developed at the University of California, Berkeley.
History
[edit]Mesos began as a research project in the UC Berkeley RAD Lab by then PhD students Benjamin Hindman, Andy Konwinski, and Matei Zaharia, as well as professor Ion Stoica. The students started working on the project as part of a course taught by David Culler. It was originally named Nexus but due to a conflict with another university's project, was renamed to Mesos.[2]
Mesos was first presented in 2009 (while still named Nexus) by Andy Konwinski at HotCloud '09 in a talk accompanying the first paper published about the project.[3] In 2011 a more developed version was presented in a talk by Zaharia at the Usenix Symposium on Networked Systems Design and Implementation. Specifically, he presented the paper on Mesos authored by Benjamin Hindman, Andy Konwinski Ali Ghodsi, Anthony D. Joseph, Randy Katz, Scott Shenker, Ion Stoica and Zaharia himself.[4]
On July 27, 2016, the Apache Software Foundation announced version 1.[5] It added the ability to centrally supply Docker, rkt and appc instances.[6]
On April 5, 2021, it was voted to move Mesos to the Apache Attic,[7] however the vote was cancelled two days later due to increased interest.[8]
As of August 2025, this project was retired [9] and placed into the attic October 2025.[10] As such, all development on this project has ceased.
Technology
[edit]Mesos uses Linux cgroups to provide isolation for CPU, memory, I/O and file system.[11] Mesos is comparable to Google's Borg scheduler, a platform used internally to manage and distribute Google's services.[12]
| Apache Aurora | |
|---|---|
| Developer | Apache Software Foundation |
| Final release | 0.22.0
/ December 12, 2019[13] |
| Written in | Java, Python |
| Type | Mesos Framework |
| License | Apache License 2.0 |
| Website | aurora |
| Repository | Aurora Repository |
Apache Aurora
[edit]Apache Aurora is a Mesos framework for both long-running services and cron jobs, originally developed by Twitter starting in 2010 and open sourced in late 2013.[14] It can scale to tens of thousands of servers, and holds many similarities to Borg[15][16] including its rich domain-specific language (DSL) for configuring services. As of February 2020 the project was retired to the Attic.[17] A fork of the project was maintained by former members, hosted on GitHub under the name Aurora Scheduler.[18]
Chronos
[edit]Chronos is a distributed cron-like system which is elastic and capable of expressing dependencies between jobs.[19]
Marathon
[edit]Marathon is promoted for platform as a service or container orchestration system scaling to thousands of physical servers. It is fully REST-based and allows canary-style deployments and deployment topologies. It is written in the programming language Scala.[20]
Users
[edit]The social networking site Twitter began using Mesos and Apache Aurora in 2010, after Hindman gave a presentation to a group of Twitter engineers.[12]
Airbnb said in July 2013 that it uses Mesos to run data processing systems like Apache Hadoop and Apache Spark.[21]
The Internet auction website eBay stated in April 2014 that it used Mesos to run continuous integration on a per-developer basis. They accomplish this by using a custom Mesos plugin that allows developers to launch their own private Jenkins instance.[22]
In April 2015, it was announced that Apple service Siri is using its own Mesos framework called Jarvis.[23]
In August 2015, it was announced that Verizon selected Mesosphere's DC/OS, which is based on open source Apache Mesos, for data center service orchestration.[24]
In November 2015, Yelp announced they had been using Mesos and Marathon for a year and a half for production services.[25]
Commercial support
[edit]Software startup Mesosphere, Inc. sells the Datacenter Operating System, a distributed operating system, based on Apache Mesos.[26] In September 2015, Microsoft announced a commercial partnership with Mesosphere to build container scheduling and orchestration services for Microsoft Azure.[27] In October 2015, Oracle announced support for Mesos through Oracle Container Cloud Service.[28]
See also
[edit]- Dominant resource fairness - the resource-sharing policy used in Mesos.
- List of cluster management software
- Comparison of cluster software
References
[edit]- ^ "ASF Git Repos - mesos.git/tag". Retrieved 27 September 2022.
- ^ Zaharia, Matei (31 August 2010). "HUG Meetup August 2010: Mesos: A Flexible Cluster Resource manager - Part 1". YouTube. Retrieved 13 January 2015.
- ^ "A Common Substrate for Cluster Computing" (PDF).
- ^ Hindman, Benjamin; Konwinski, Andy; Zaharia, Matei; Ghodsi, Ali; Joseph, Anthony; Katz, Randy; Shenker, Scott; Stoica, Ion (2011). "Mesos: A Platform for Fine-Grained Resource Sharing in the Data Center" (PDF). NSDI. 11: 22-22. Retrieved 12 January 2015.
- ^ "The Apache Software Foundation Announces Apache Mesos v1.0". Press release. July 27, 2016. Retrieved February 24, 2017.
- ^ "Mesos 1.0 brings a new container runtime and more third party integrations". July 27, 2016.
- ^ "[VOTE] Move Apache Mesos to Attic". lists.apache.org. Archived from the original on 2021-04-06. Retrieved 2021-04-07.
- ^ "Re: [VOTE] Move Apache Mesos to Attic". lists.apache.org. Archived from the original on 2021-04-09. Retrieved 2021-04-09.
- ^ "The Apache Software Foundation Board of Directors Meeting Minutes August 20, 2025".
- ^ "ATTIC-245".
- ^ Bappalige, Sachin P. (2014-09-15). "Open-Source Datacenter Computing with Apache Mesos". OpenSource.com. Red Hat. Retrieved 2016-12-10.
- ^ a b Metz, Cade. "Return of the Borg: How Twitter Rebuilt Google's Secret Weapon". Wired. Retrieved 12 January 2015.
- ^ "Apache Aurora Blog". Retrieved 16 March 2021.
- ^ "All about Apache Aurora". Twitter. Retrieved 20 May 2015.
- ^ "Large-scale cluster management at Google with Borg" (PDF). Retrieved 20 May 2015.
- ^ "Twitter's Aurora and How It Relates to Google's Borg". 18 February 2015. Retrieved 20 May 2015.
- ^ "Apache Aurora - Apache Attic". attic.apache.org. Retrieved 2021-02-18.
- ^ "Aurora Scheduler". GitHub. Retrieved 2023-04-02.
- ^ "Chronos". GitHub.com. GitHub. Retrieved 30 March 2015.
- ^ "Marathon". Mesosphere.GitHub.io. Mesosphere. 2014. Retrieved 30 March 2015.
- ^ Harris, Derrick. "Airbnb is engineering itself into a data-driven company". gigaom.com. Archived from the original on January 18, 2015. Retrieved 12 January 2015.
- ^ The eBay PAAS Team (4 April 2014). "Delivering eBay's CI Solution with Apache Mesos - Part I". EbayTechBlog.com. eBay. Retrieved 12 January 2015.
- ^ Harris, Derrick (2015-04-23). "Apple Details How It Rebuilt Siri on Mesos". Mesosphere.com. Mesosphere. Archived from the original on 2015-04-29. Retrieved 2015-04-27.
- ^ "Verizon selects Mesosphere DCOS as nationwide platform for data center service orchestration". Verizon. 21 August 2015. Retrieved 21 August 2015.
- ^ "Introducing PaaSTA: An Open, Distributed, Platform as a Service". engineeringblog.yelp.com. Retrieved 2016-07-12.
- ^ "The Mesosphere DCOS". mesosphere.com. Retrieved 13 January 2015.
- ^ Mary Jo Foley (September 29, 2015). "New Azure Container Service to bring together Mesos, Docker and Azure cloud". ZDNet.
- ^ "Oracle Updates Oracle Cloud Infrastructure Services". oracle.com. Retrieved 2018-02-06.
External links
[edit]Apache Mesos
View on GrokipediaHistory
Origins and Development
Apache Mesos originated as a research project in 2009 at the University of California, Berkeley, developed by Benjamin Hindman, Andy Konwinski, Matei Zaharia, and Ion Stoica, along with collaborators including Ali Ghodsi, Anthony D. Joseph, Randy Katz, and Scott Shenker.[7][4] The project emerged from efforts to address the growing challenges of managing large-scale data centers, where commodity clusters were increasingly underutilized due to the silos created by specialized frameworks.[1] The initial motivation stemmed from the inefficiencies in resource sharing across diverse workloads, such as Hadoop for data processing and MPI for high-performance computing, which often led to low cluster utilization—typically around 10-20% in practice—because each framework monopolized entire nodes.[7] Inspired by operating system kernels that abstract hardware for multiple applications, the team aimed to create a platform for fine-grained resource sharing that improved utilization while preserving data locality and avoiding costly data replication across frameworks.[1] Early prototypes focused on enabling multiple frameworks to coexist on shared clusters without interference, demonstrating up to 2.1-fold improvements in job completion times for Hadoop workloads in evaluations on a 50-node cluster.[7] These prototypes culminated in the seminal 2011 NSDI paper, "Mesos: A Platform for Fine-Grained Resource Sharing in the Data Center," which detailed the system's architecture and empirical results from real-world deployments, including integration with Hadoop and MPI.[7] In 2010, Mesos entered the Apache Incubator as an open-source project, marking its shift from academic research to broader community development.[4] It graduated to become a top-level Apache project in July 2013, reflecting its maturity and adoption by organizations like Twitter for production-scale cluster management.[3] At its core, Mesos introduced two-level scheduling to decouple resource allocation from task placement: a central Mesos scheduler offers available resources to framework-specific schedulers, which then decide how to utilize them, enabling flexible policies like fair sharing or capacity guarantees.[1] Resource isolation was achieved through OS-level mechanisms, such as Linux cgroups, to ensure tasks from different frameworks do not interfere with each other's performance on shared nodes.[7] These principles laid the foundation for Mesos as a distributed kernel-like layer, prioritizing scalability and adaptability for multi-framework environments.[1]Key Milestones and Releases
Apache Mesos entered the Apache Incubator in 2010, with its initial development stemming from a research project at the University of California, Berkeley. The project's first incubator release occurred in 2012, marking the beginning of its formal open-source evolution under Apache governance.[8][4] A significant milestone came on July 24, 2013, when Mesos graduated to become a top-level Apache project, recognizing its maturity and growing community adoption for resource management in large-scale clusters.[3] In September 2014, Mesos 0.20.0 introduced native support for Docker containers, allowing frameworks to launch tasks using Docker images and a subset of Docker options, which broadened its appeal for containerized workloads.[9] The integration of Apache Spark with Mesos around 2013 enabled efficient resource sharing for data processing frameworks, with Spark 0.5.0 explicitly supporting Mesos 0.9 for running analytics workloads on shared clusters.[10] Mesos 1.0.0, released on July 27, 2016, represented a major maturation point, featuring a new HTTP API for improved interoperability, a unified containerizer supporting multiple runtimes including Docker and AppC, and enhanced high-availability for the master process through ZooKeeper integration for leader election and state replication. This version solidified Mesos as a production-ready platform for fault-tolerant distributed systems.[11][4]| Version | Release Date | Key Features |
|---|---|---|
| 0.20.0 | September 3, 2014 | Native Docker container support[9] |
| 1.0.0 | July 27, 2016 | HTTP API, unified containerizer, ZooKeeper-based HA master[11] |
| 1.4.0 | September 18, 2017 | Enhanced GPU resource isolation and disk isolation for better support of compute-intensive tasks[12][13] |
| 1.9.0 | September 2019 | Improvements to persistent volumes, agent draining, and quota limits for more reliable stateful workloads[14] |
Retirement
In July 2025, the Apache Mesos Project Management Committee (PMC) initiated and concluded a formal binding vote to retire the project on July 22, 2025, citing prolonged inactivity and a lack of active maintainers as primary reasons.[19][17] This decision followed years of declining community contributions, with GitHub commit activity dropping significantly after 2019 and no substantial updates since then, as well as an earlier unsuccessful retirement vote in April 2021 that was cancelled after two days due to renewed interest.[17][20] The retirement reflected broader industry shifts toward Kubernetes for container orchestration, which had gained dominance in managing distributed systems.[20] A key factor in Mesos' decline was the strategic pivot by its primary backer, Mesosphere (rebranded as D2iQ in 2019), which shifted focus to Kubernetes-based solutions like Konvoy starting in 2019, effectively ending support for Mesos-centric products such as DC/OS by 2021.[21][22] This commercial redirection reduced funding and development resources for the open-source project, exacerbating the maintainer shortage.[23] The retirement process continued with Apache Board approval on August 20, 2025, moving Mesos to the Apache Attic for archival purposes.[19][17] Project resources, including mailing lists, JIRA issue tracker, and the Git repository, were subsequently made read-only to preserve historical data while preventing further changes.[24] The official announcement of the retirement was issued on October 17, 2025.[25] Mesos had no new feature releases after version 1.11.0, issued on November 24, 2020, though minor security patches were applied sporadically until early 2021.[17][18] In the immediate aftermath, the Mesos website was redirected to its Apache Attic page, providing read-only access to documentation and archives.[6] The retirement notice encouraged users to consider community forks, such as Clusterd, an active continuation of Mesos maintained on GitHub since early 2025, for ongoing needs in resource isolation and cluster management.[26][27]Architecture
Core Components
Apache Mesos is built around a distributed architecture comprising master nodes, agent nodes, and framework-specific components that enable efficient resource sharing across clusters. The system employs a two-level scheduling model where the Mesos master allocates resources to frameworks, which then manage their own task scheduling. This design allows multiple diverse frameworks to coexist on the same physical infrastructure while providing fine-grained resource isolation.[7][5] The master node serves as the central coordinator in the Mesos cluster, responsible for managing agent daemons, tracking the overall state of resources, and offering available resources to registered frameworks based on configurable allocation policies such as fair sharing or strict priority. Masters support high availability through a replicated setup with leader election, ensuring fault tolerance by allowing backup masters to take over seamlessly if the active leader fails. This replication is orchestrated via Apache ZooKeeper, which handles leader election, configuration management, and state synchronization across multiple masters, agents, and schedulers.[5][28] Agent nodes (previously known as slave nodes) operate on each machine in the cluster, reporting available resources—such as CPUs, memory, disk, and ports—to the master and enforcing resource isolation for tasks launched on that node. Agents execute tasks through framework-provided executors and utilize pluggable isolators to manage and limit resource usage, including CPU shares, memory limits, disk volumes, network ports, and GPU allocation, primarily leveraging Linux control groups (cgroups) and namespaces for isolation on supported platforms. This modular isolation mechanism allows operators to customize enforcement for specific environments without altering the core Mesos codebase.[5][29] Framework-specific schedulers register with the master to receive resource offers and decide how to allocate those resources to tasks, enabling frameworks to implement their own scheduling logic independently of Mesos. Once resources are allocated, executors—also framework-defined—run on agent nodes to launch and monitor individual tasks, handling the actual execution and reporting status back through the agent to the scheduler. These components decouple resource allocation from task execution, allowing Mesos to support a wide variety of workloads efficiently.[5] Mesos provides HTTP APIs for programmatic interaction with the cluster, including operator endpoints for managing masters and agents, as well as monitoring endpoints to query tasks, resources, and cluster state; these APIs form the basis for developing distributed applications and integrating with external tools. A web-based dashboard is accessible via the master's HTTP interface, offering a visual overview of cluster utilization, active tasks, and resource distribution to aid in monitoring and debugging.[30][2] Mesos demonstrates cross-platform compatibility, running on Linux (64-bit), macOS (64-bit), and Windows (experimental support for agents only, requiring Windows 10 Creators Update or Windows Server 2016 and later). This support is facilitated by the pluggable isolators and containerizers, which adapt to platform-specific mechanisms for resource isolation, such as POSIX compliance on Unix-like systems and experimental features on Windows.[2][31][32]Resource Management and Scheduling
Apache Mesos abstracts cluster resources such as CPU, memory, disk, and ports as commoditized units that can be offered to frameworks in a fine-grained manner.[1] These resources are represented using three types: scalars for floating-point values like 1.5 CPUs or 8192 MB of memory (with three decimal places of precision), ranges for continuous intervals such as port numbers (e.g., [21000-24000]), and sets for discrete items like custom resource identifiers.[33] Predefined scalar resources includecpus, mem (in MB), disk (in MB), and gpus (whole numbers only), while ports use ranges; frameworks receive these abstractions via JSON or key-value pairs to enable efficient allocation across diverse workloads.[33]
Mesos employs a two-level scheduling model to facilitate multi-tenant cluster operation, where the Mesos master allocates resources to registered frameworks, and the frameworks' schedulers make acceptance decisions based on their specific needs.[1] In this model, the master periodically detects unused resources on agents and issues resource offers—bundles containing available units like 4 CPUs and 4 GB of memory—to subscribed frameworks.[34] The offer cycle operates continuously: upon receiving an offer via the SUBSCRIBE call, a framework's scheduler can accept it using an ACCEPT call with the offer ID, applying filters to reject insufficient or unsuitable resources (e.g., based on location or attributes), and specifying operations to launch tasks either as individual processes or grouped containers.[34] Accepted offers trigger task launches, where tasks execute via executors on agents, supporting data locality optimizations like delay scheduling to achieve up to 95% locality with minimal wait times.[1]
To ensure secure multi-tenancy, Mesos implements resource isolation through Linux-specific mechanisms, including control groups (cgroups) for limiting CPU and memory usage, namespaces for process and network isolation, and seccomp filters to enforce security policies by restricting system calls.[29] These isolators are modular, allowing operators to enable or customize them, and integrate with container runtimes such as Docker for image-based launches or AppC for composable isolation layers.[29] This setup prevents interference between tasks from different frameworks while maintaining lightweight overhead.
Mesos demonstrates strong scalability, supporting clusters with over 10,000 nodes through its distributed architecture and low-latency operations, such as task launches under 1 second even at 50,000 emulated nodes.[2] Fault tolerance is achieved via periodic agent reregistration with the master (every 10 seconds by default) and automatic task relaunch upon recovery, complemented by ZooKeeper for replicated master election with 4-8 second failover times.[1] Agents handle disconnections gracefully by buffering updates and resynchronizing state, ensuring minimal disruption in large-scale environments.[1]
For guaranteed allocation in multi-tenant settings, Mesos supports resource reservations tied to roles, which represent groups of frameworks or users.[35] Static reservations, configured at agent startup via flags like --resources='cpus([role](/page/Role)):8;mem([role](/page/Role)):4096', dedicate resources to specific roles and require restarts to modify.[36] Dynamic reservations, introduced in version 0.23.0, allow runtime adjustments through framework operations (e.g., Offer::Operation::Reserve) or operator HTTP endpoints, enabling partial unreservations without interrupting active tasks.[36] Hierarchical roles, such as eng/backend, facilitate delegation and refinement of reservations, while role-based quotas enforce upper limits on total allocatable resources per role to prevent overcommitment.[36] Fair sharing among roles uses weighted Dominant Resource Fairness (wDRF), where weights (default 1) determine proportional allocation, configurable via the /weights endpoint.[35]
