Hubbry Logo
DatabricksDatabricksMain
Open search
Databricks
Community hub
Databricks
logo
7 pages, 0 posts
0 subscribers
Be the first to start a discussion here.
Be the first to start a discussion here.
Contribute something
Databricks
Databricks
from Wikipedia

Databricks, Inc. is a San Francisco-based software company.[4] It was founded in 2013 by the original creators of Apache Spark.[1][5] It offers a cloud-based platform for data analytics and artificial intelligence.[6]

Key Information

Databricks promotes the concept of a 'data lakehouse', which combines elements of data warehouses and data lakes to enable management and analysis of both structured and unstructured data for business analytics and AI applications.[7] The company similarly develops Delta Lake, an open-source project to improve the reliability of data lakes for data science use cases.[8]

History

[edit]

2013-2021

[edit]
Databricks booth (2023)

Databricks grew out of the AMPLab project at University of California, Berkeley that was involved in making Apache Spark, an open-source distributed computing framework built atop Scala.[9] The company was founded by Ali Ghodsi, Andy Konwinski, Arsalan Tavakoli-Shiraji, Ion Stoica, Matei Zaharia, Patrick Wendell, and Reynold Xin.[10]

In November 2017, the company was announced as a first-party service on Microsoft Azure via integration Azure Databricks.[11]

In February 2021, together with Google Cloud, Databricks provided integration with the Google Kubernetes Engine and Google's BigQuery platform.[12] At this point in time, the company said more than 5,000 organizations used its products.[13]

Fortune ranked Databricks as one of the "Best Large Workplaces for Millennials" in 2021.[14]

2022-Present

[edit]

In November 2023, Databricks unveiled the Databricks Data Intelligence Platform, a new offering that combines the unification benefits of the lakehouse with MosaicML’s Generative AI technology to enable customers to better understand and use their own proprietary data.[15]

The firm was valued at $62 billion in December 2024,[16] following a funding round that raised one of the largest amounts in history, an equivalent to the largest single AI investment ever made.[17]

In early March 2025, Databricks announced it would invest $1 billion in San Francisco's downtown.[18]

In March 2025, Databricks entered a five-year partnership with Anthropic, incorporating Anthropic's AI products into the Databricks Data Intelligence Platform in a deal valued at $100 million.[19][20] Ali Ghodsi remains CEO of Databricks.[19] The company has partnered with Tech Mahindra, Microsoft, and Optus to build a Unified Data Platform (UDP) for cloud migration.[21]

Acquisitions

[edit]

In June 2020, Databricks bought Redash, an open-source tool for data visualization and building of interactive dashboards.[22] In 2021, it bought German no-code company 8080 Labs whose product, bamboolib, allowed data exploration without any coding.[23] In May 2023, Databricks bought data security group Okera, extending Databricks data governance capabilities.[24] In June, it bought the open-source generative AI startup MosaicML for $1.4 billion.[25][26] In October, Databricks bought data replication startup Arcion for $100 million.[27] In 2024, Databricks bought Tabular, a data-management system used by open source AI, for over $1 billion.[28]

In March 2023, in response to the popularity of OpenAI's ChatGPT, the company introduced an open-source language model, named Dolly after Dolly the sheep, that allowed developers to create chatbots. Dolly uses fewer parameters to produce similar results as ChatGPT, but Databricks had not released formal benchmark tests to show whether its bot actually matched the performance of ChatGPT.[29][30][31]

Databricks reported $1.6 billion in revenue for the 2023 fiscal year, representing a significant increase from the previous year.[32]

In 2025, Databricks acquired a serverless database startup, Neon,[33] for around $1 billion.[34]

Funding

[edit]

In September 2013, Databricks announced it raised $13.9 million from Andreessen Horowitz and said it aimed to offer an alternative to Google's MapReduce system.[35][36] Microsoft was a noted investor of Databricks in 2019, participating in the company's Series E at an unspecified amount.[37][38] The company has raised $1.9 billion in funding, including a $1 billion Series G led by Franklin Templeton at a $28 billion post-money valuation in February 2021. Other investors include Amazon Web Services, CapitalG (a growth equity firm under Alphabet Inc.) and Salesforce Ventures.[13] In August 2021, Databricks finished its eighth round of funding by raising $1.6 billion and valuing the company at $38 billion.[39] In December 2024, Databricks announced a $10 billion financing at a valuation of $62 billion.[16] In August 2025, Databricks announced a $1 billion Series K funding round, raising their valuation to over $100 billion.[40]

Funding rounds
Series Date Amount (million $) Lead investors
A 2013 13.9[35] Andreessen Horowitz
B 2014 33[41] New Enterprise Associates
C 2016 60[42] New Enterprise Associates
D 2017 140[43] Andreessen Horowitz
E Feb. 2019 250[44] Andreessen Horowitz
F Oct. 2019 400[45] Andreessen Horowitz
G Jan. 2021 1,000[46] Franklin Templeton Investments
H Aug. 2021 1,600[47] Morgan Stanley
I Sep. 2023 500[48] Capital One Ventures, Nvidia
J Dec. 2024 10,000[49] Thrive Capital
K Aug. 2025 1,000[40] Thrive Capital, Insight Partners

Products

[edit]

Databricks develops a cloud data platform referred to as a 'lakehouse', combining features of data warehouses and data lakes.[50] The platform is built on the open-source Apache Spark framework, enabling analytical queries on semi-structured data without requiring a traditional database schema.[51] In October 2022, Lakehouse received FedRAMP authorized status for use with the U.S. federal government and contractors.[52]

The company has also created Delta Lake, MLflow and Koalas, open source projects that span data engineering, data science and machine learning.[53][54]

In June 2020, Databricks launched Delta Engine, a fast query engine for Delta Lake,[55] compatible with Apache Spark and MLflow.[56]

In November 2020, Databricks introduced Databricks SQL (previously called SQL Analytics) for running business intelligence and analytics reporting on top of data lakes. Analysts can query data sets with standard SQL or use connectors to integrate with business intelligence tools like Holistics, Tableau, Qlik, SigmaComputing, Looker, and ThoughtSpot.[57]

Databricks offers a platform for other workloads, including machine learning, data storage and processing, streaming analytics, and business intelligence.[58]

In early 2024, Databricks released the Mosaic set of tools for customizing, fine-tuning and building AI systems. It includes AI Vector Search for building RAG models; AI Model Serving, a service for deploying, governing, querying and monitoring models fine-tuned or pre-deployed by Databricks; and AI Pretraining, a platform for enterprises to create their own LLMs.[59]

In March 2024, Databricks released its DBRX foundation model under the Databricks Open Model License.[60] It has a mixture-of-experts architecture and is built on the MegaBlocks open-source project.[61] DBRX cost $10 million to create. At the time of launch, it was the fastest open-source LLM,[citation needed] based on commonly used industry benchmarks. It beat other models like Llama 2 at solving logic puzzles and answering general knowledge questions, among other tasks. And while it has 136 billion parameters, it only uses 36 billion, on average, to generate outputs.[62] DBRX also serves as a foundation for companies to build or customize their own AI models. Companies can also use proprietary data to generate higher-quality outputs for specific use cases.[63]

In addition to building the Databricks platform, the company has co-organized massive open online courses about Spark[64] and a conference for the Spark community called the Data + AI Summit,[65] formerly known as Spark Summit.[66]

Collaborations

[edit]

In December 2024, Databricks along with Wiz and Workday has decided to run their products on top of AWS via the new button called "Buy with AWS button".[67]

In June 2025, Databricks announced a partnership with Google Cloud to integrate its Data Intelligence Platform with Google Cloud services.[68]


References

[edit]
Revisions and contributorsEdit on WikipediaRead on Wikipedia
from Grokipedia
Databricks, Inc. is an American software company headquartered in , California, founded in 2013 by seven UC Berkeley researchers—, , , Patrick Wendell, Reynold , Andy Konwinski, and Arsalan Tavakoli-Shiraji—who are the original creators of the open-source project. The company provides the Databricks Data Intelligence Platform, a unified, cloud-based analytics solution that integrates , , and AI capabilities on an open lakehouse architecture, combining the reliability of data warehouses with the flexibility of data lakes. This platform leverages foundational open-source technologies developed by its founders, including Delta Lake for reliable data lakes, MLflow for machine learning lifecycle management, and Unity Catalog for . The platform extends to business intelligence analysis and generative AI, enabling comprehensive data and AI workflows. Since its inception, Databricks has grown rapidly, launching its cloud platform in and expanding to serve over 20,000 organizations worldwide, including more than 60% of the Fortune 500 companies such as Block, , and Shell. Many of these organizations have migrated legacy Hadoop-based systems to Databricks for enhanced scalability, cost savings, and advanced AI capabilities. Notable examples include AT&T, which migrated on-premises Hadoop workloads to Azure Databricks, ingesting over 10 PB of data daily, achieving a 300% ROI over five years, accelerating data science cycles by 3x, and retiring 40% of prior infrastructure; Freshworks, which migrated over 500 TB of data and 40+ sources from self-managed Cloudera Hadoop in seven months, reducing maintenance costs by 75%, accelerating data processing by 4-5x, and increasing data team productivity by over 60%; CVS Health, which transitioned from on-premises Hadoop to Azure Databricks to scale personalization efforts, overcoming initial limitations and improving medication adherence by 1.6%; Johnson & Johnson, which migrated from legacy Hadoop to Databricks on Azure, reducing data engineering costs by 45-50% and decreasing data delivery times from 24 hours to under 10 minutes for supply chain optimization; and Devon Energy, which adopted Azure Databricks to unify analytics for oil exploration, replacing legacy Hadoop and ETL systems to significantly accelerate processing times. The company's mission is to democratize and AI, enabling organizations to simplify complex workflows and accelerate AI-driven insights through features like discovery and automated AI model deployment. As of December 2025, Databricks achieved a $4.8 billion annual revenue run-rate, with AI-specific revenue exceeding $1 billion, reflecting over 55% year-over-year growth and a net retention rate above 140%. In a milestone, Databricks raised over $4 billion in its Series L round in December 2025 at a $134 billion valuation, representing approximately 212% growth from its $43 billion valuation in September 2023, driven by Lakehouse/AI platform growth and over 50% revenue gains to fuel AI innovations such as Agent Bricks for agentic AI applications and Lakebase for AI-optimized databases, while supporting global expansion and acquisitions. With over 5,000 and a focus on open standards, Databricks continues to lead in the data and AI ecosystem, powering enterprise-grade solutions across industries like , healthcare, and .

History

Founding and Early Development (2013-2021)

Databricks was founded in 2013 in by the original creators of from the , Berkeley's AMPLab, including , , , Patrick Wendell, , Andy Konwinski, and Arsalan Tavakoli-Shiraji. The company emerged from efforts to commercialize Spark, an open-source unified analytics engine for large-scale , with an initial emphasis on building a cloud-based platform to simplify , , and workflows. This unified analytics platform, centered on , enabled collaborative environments for data teams to process and analyze massive datasets without managing underlying , while contributing back to the open-source community through enhancements to Spark and related projects. In its early years, Databricks introduced key open-source tools to address challenges in data reliability and operations. Delta Lake, launched in October 2017 as a proprietary storage layer and open-sourced in April 2019, provided transactions, scalable metadata handling, and unified batch and processing to make data lakes more reliable and performant for workloads. Similarly, MLflow was introduced in June 2018 as an open-source platform to manage the end-to-end lifecycle, including experiment tracking, package management, and model deployment, helping teams standardize workflows across diverse environments. Databricks expanded its cloud integrations to broaden accessibility, partnering with in November 2017 to launch Azure Databricks, a fully managed service integrating Spark-based directly into the Azure ecosystem for enterprise-scale data processing. This was followed by a partnership with Cloud in February 2021, enabling customers to run Databricks workloads on Kubernetes Engine and integrate with services like for seamless data lakehouse architectures. By 2021, the platform served more than 5,000 organizations worldwide, reflecting rapid adoption among enterprises tackling complex data challenges. That same year, Databricks was ranked #59 on Fortune's Best Large Workplaces for list, based on employee feedback highlighting its inclusive culture and innovative environment.

Expansion and Innovation (2022-Present)

In 2022, Databricks accelerated its growth by deepening its focus on AI integration and enterprise-scale solutions, building on its foundational technology to address emerging demands in generative AI and unified analytics. The company achieved significant valuation milestones, reaching $43 billion in September 2023 following a Series I funding round that raised over $500 million, led by with participation from and . This valuation reflected Databricks' expanding role in the AI ecosystem, as enterprises increasingly adopted its platform for data-driven AI applications. By December 2024, a $10 billion Series J funding round—primarily non-dilutive financing for employee liquidity and strategic investments—elevated the company's valuation to $62 billion, underscoring investor confidence in its AI momentum amid a booming market for tools. A pivotal innovation came in November 2023 with the launch of the Data Intelligence Platform, which unified , AI capabilities, and into a single lakehouse-based , enabling organizations to build and deploy AI agents securely over enterprise . This platform incorporated advanced generative AI features, such as semantic understanding of assets, to streamline workflows from to model serving. In March 2024, Databricks released DBRX, an open-source developed using its Mosaic AI tools, which set new benchmarks for efficiency in mixture-of-experts while outperforming models like Llama 2 in key evaluations. These advancements were bolstered by strategic partnerships, including a March 2025 multi-year collaboration with to integrate Claude models natively into the platform, allowing over 10,000 customers to develop AI agents with enhanced reasoning and safety features directly on their . Databricks' expansion extended to substantial investments in infrastructure and talent, exemplified by a $1 billion commitment in March 2025 to bolster San Francisco's economy through expanded headquarters at One Sansome Street and multi-year hosting of its Data + AI Summit, projected to draw up to 50,000 attendees by 2030. Revenue growth highlighted this trajectory, with $1.6 billion in revenue for 2024 (ended January 31, 2024) and reaching an annual run-rate of $3 billion by December 2024, driven by over 50% year-over-year expansion in AI and adoption. By September 2025, the company surpassed a $4 billion annual recurring revenue run-rate, with more than $1 billion attributed to AI products, while targeting net revenue retention above 140% and serving over 650 customers spending more than $1 million annually. In September 2025, a $1 billion Series K round further propelled its valuation beyond $100 billion, funding AI , acquisitions, and global scaling to meet surging enterprise demand.

Business Developments

Funding and Valuation

Databricks has secured substantial financing since its , amassing over $22 billion in total capital through equity rounds and debt facilities by late 2025. This funding has supported the company's expansion in and AI technologies, with investments reflecting strong investor confidence in its lakehouse architecture and AI-driven growth. The company's funding history includes several landmark equity rounds, detailed in the following table:
DateRoundAmount RaisedPost-Money ValuationKey Investors
September 2013Series A$14 millionNot disclosed
October 2019Series F$400 million$6.2 billion, Tiger Global
February 2021Series G$1 billion$28 billionFranklin Templeton,
September 2023Series I$500 million$43 billion,
December 2024Series J$10 billion$62 billion, ,
September 2025Series K$1 billionOver $100 billion, GIC
December 2025Series LOver $4 billion$134 billionAndreessen Horowitz, Thrive Capital, GIC, Insight Partners, Fidelity Management & Research Company
These rounds represent pivotal milestones, with early funding enabling platform development and later investments accelerating AI integrations. Prominent investors across these rounds include , which led multiple early and late-stage investments; , a key participant in recent mega-rounds; , contributing strategic AI expertise starting in 2023; , which joined in the 2019 Series E round; and , anchoring the 2023 Series I. These backers have provided not only capital but also ecosystem synergies, such as cloud integrations and hardware optimizations. Databricks' valuations have escalated dramatically, from under $1 billion in early rounds to $134 billion in December 2025, representing approximately 212% growth from the $43 billion valuation in September 2023. This growth has been driven primarily by surging adoption of its data lakehouse paradigm and AI capabilities amid the global , along with over 50% year-over-year revenue gains. This growth trajectory aligns with the company's achievement of a $4.8 billion annualized run-rate by December 2025, underscoring the commercial impact of its AI revenue exceeding $1 billion. Databricks' primary source of revenue is from its consumption-based SaaS subscriptions, where customers pay based on usage of compute, storage, and data processing on the platform. In addition to equity financing, Databricks obtained a $5.25 billion credit facility in January 2025, comprising a $2.75 billion and a $2.5 billion line, to scale operations and pursue AI talent acquisition. This non-dilutive debt, led by with participation from , Citi, , and , marked one of the largest such arrangements for a tech firm and complemented its equity raises for flexible capital deployment.

Acquisitions

Databricks has pursued an aggressive acquisition strategy to enhance its and AI platform, focusing on technologies that integrate seamlessly into its lakehouse architecture. Since 2020, the company has completed several key acquisitions, targeting areas such as data visualization, , real-time pipelines, generative AI, open table formats, and serverless databases. These moves have bolstered Databricks' capabilities in , , and scalable AI development, often with an emphasis on open-source integrations. In June 2020, Databricks acquired Redash, an open-source tool, for an undisclosed amount. Redash provides advanced visualization and dashboarding features, enabling data teams to query, visualize, and share insights from diverse data sources. The acquisition aimed to strengthen Databricks' analytics offerings by embedding these capabilities directly into its platform, reducing reliance on external tools and improving collaboration for data scientists and analysts. Databricks expanded its low-code/no-code capabilities in October 2021 with the acquisition of 8080 Labs, a German startup behind the bamboolib tool, for an undisclosed sum. Bamboolib offers a user-friendly interface for exploration and transformation using Python's library, targeting non-technical users or "citizen data scientists." This move sought to democratize within the lakehouse platform, allowing broader organizational access to AI and ML workflows without deep coding expertise. In May 2023, Databricks acquired Okera, a platform, for an undisclosed amount. Okera specializes in fine-grained access controls and policy enforcement, using AI to manage permissions across large-scale data environments. The integration enhanced Databricks' Unity Catalog with AI-centric governance features, ensuring secure while complying with regulatory requirements in enterprise settings. Databricks made a significant push into generative AI in June 2023 by acquiring MosaicML for $1.3 billion in a mostly deal. MosaicML develops tools for efficient training and deployment of large models (LLMs), reducing costs from millions to thousands of dollars per model. This acquisition integrated Mosaic's and platforms into Databricks, enabling customers to build and fine-tune foundation models directly on their data lakehouse, accelerating enterprise AI adoption. To improve real-time data ingestion, Databricks acquired Arcion in October 2023 for $100 million. Arcion provides log-based (CDC) technology for high-throughput, low-latency data pipelines from databases to platforms. The deal introduced native, scalable CDC tools to Databricks, simplifying data movement for AI applications and reducing operational costs compared to traditional ETL methods. In June 2024, Databricks acquired Tabular, a company founded by the original creators of , for more than $1 billion (reports estimate between $1 billion and $2 billion). Tabular offers managed services for open table formats like , focusing on interoperability and performance in data lakes. This acquisition reinforced Databricks' commitment to open standards, enhancing Delta Lake compatibility and positioning the platform as a leader in unified for AI workloads. Databricks continued its expansion in February 2025 with the acquisition of BladeBridge, an AI-powered migration provider, for an undisclosed amount. BladeBridge specializes in code assessment and automated conversion tools to facilitate migrations from legacy data warehouses to modern platforms like Databricks SQL. The acquisition aims to streamline enterprise migrations, reducing time and complexity for customers transitioning to the lakehouse architecture. In May 2025, Databricks acquired , a serverless Postgres database provider, for approximately $1 billion. Neon's architecture separates compute and storage for elastic scaling, supporting developer-friendly Postgres in cloud environments. The move aimed to embed serverless relational capabilities into the lakehouse, facilitating AI agent development and real-time querying for production AI systems. Databricks further advanced its AI agent capabilities in August 2025 by acquiring Tecton, a real-time machine learning feature platform, for approximately $900 million. Tecton enables the management and serving of features for ML models at scale, providing low-latency data for personalized AI applications. This integration enhances Databricks' support for real-time AI agents by combining Tecton's feature store with the lakehouse for faster model deployment and inference. In October 2025, Databricks acquired Labs, a startup developing cloud-native OLTP database technologies, for an undisclosed amount. Labs focuses on Postgres-based solutions optimized for AI workloads, contributing to Databricks' Lakebase initiative for integrated transactional and analytical processing. The acquisition accelerates the development of agentic AI systems requiring high-performance, scalable databases within the lakehouse ecosystem. These acquisitions, with a total disclosed spend exceeding $5 billion by November 2025, have significantly accelerated Databricks' AI portfolio by filling critical gaps in , real-time , model , and database . By prioritizing open-source and AI-native technologies, Databricks has created a more unified ecosystem, enabling enterprises to operationalize and AI at scale while maintaining flexibility across hybrid environments.

Products and Technology

Core Platform Components

The Databricks platform is built on a unified foundation that leverages open-source technologies to enable scalable and . At its core is the integration of , which serves as the primary compute engine for distributed across batch, streaming, and interactive workloads. Spark's DataFrame and SQL engine allow users to perform complex transformations and queries on large datasets using familiar languages like Python, Scala, , and SQL. This integration optimizes Spark for cloud environments, providing fault-tolerant execution and in-memory processing to handle petabyte-scale data efficiently. A key storage innovation is Delta Lake, an open-source layer developed by Databricks that adds transaction capabilities to data lakes built on files. Delta Lake introduces a that ensures data reliability, supports enforcement, and enables features like for querying historical table versions and scalable metadata handling. These capabilities facilitate reliable extract, transform, and load (ETL) pipelines, merging batch and without duplication or loss, and have been adopted widely since its open-sourcing in 2019. Databricks Runtime provides the optimized execution environment that bundles , Delta Lake, and additional enhancements for performance and security. It includes pre-configured libraries, automatic scaling, and security features such as and credential passthrough, ensuring seamless operation across multi-cloud deployments. Released in versions with long-term support (LTS), such as 17.3 LTS incorporating Spark 4.0, the runtime simplifies cluster management while delivering up to 3x faster query performance through optimizations like adaptive query execution. Databricks SQL, launched in November 2020, is a serverless query service designed for (BI) and ad-hoc analytics. It leverages the engine for high-concurrency SQL queries on lakehouse data, supporting integrations with tools like Tableau and Power BI, and offers predictive optimization to reduce costs by up to 50% compared to traditional warehouses. This service enables analysts to explore Delta tables directly without managing infrastructure, focusing on insights from structured and semi-structured data. In June 2025, Databricks introduced Lakebase, a fully managed, serverless PostgreSQL-compatible OLTP integrated into the Data Intelligence Platform. Lakebase supports real-time transactional workloads for data applications and AI agents, with features like database branching, instant scaling, and seamless connectivity to lakehouse storage, enabling developers to build AI-optimized applications without managing infrastructure. The platform embodies the lakehouse architecture, which serves as its core and integrates the scalability and cost-efficiency of data lakes with the reliability and performance of data warehouses. This hybrid paradigm, promoted by Databricks since 2019, is based on open-source technologies such as Apache Spark and Delta Lake, extending to data engineering, business intelligence (BI) analysis, machine learning, and generative AI. By using open formats like Delta Lake on , the lakehouse eliminates data silos, supports unified batch and streaming analytics, and provides guarantees without proprietary lock-in. This approach enables a single system for ETL, BI, and , reducing while scaling to exabytes. Unified governance is achieved through Unity Catalog, a centralized metastore that manages metadata, access controls, and lineage across multi-cloud environments. It supports fine-grained permissions on assets, models, and volumes in formats like Delta Lake and , with features for auditing, data sharing via Delta Sharing, and AI-driven discovery. Unity Catalog ensures compliance and collaboration by providing a three-level namespace (metastore, catalog, schema) that spans workspaces, preventing governance fragmentation in distributed setups.

AI and Analytics Tools

Databricks provides a range of specialized AI and analytics tools designed to streamline machine learning workflows, generative AI development, and advanced data analysis within its unified platform. These tools emphasize end-to-end management of AI models, from experimentation to deployment, while supporting scalable operations on large datasets. By integrating with the underlying Apache Spark and Delta Lake foundation, they enable efficient handling of big data for AI applications. MLflow is an open-source platform developed by Databricks for managing the complete lifecycle, encompassing experiment tracking, model packaging, reproduction, and serving. It allows users to log parameters, metrics, and artifacts during training, facilitating collaboration and across teams. On Databricks, MLflow is fully managed, supporting both traditional ML and generative AI workflows, including evaluation of large language models (LLMs) and agents. Koalas, introduced by Databricks, offers a Python API that enables scalable pandas operations on , allowing data scientists to apply familiar DataFrame manipulations to without significant code changes. Originally released as an open-source project, Koalas has evolved into the Pandas API on Spark, integrated into PySpark since Apache Spark 3.2, bridging the gap between single-node pandas workflows and . This tool supports operations like grouping, joining, and statistical computations on massive datasets, enhancing productivity for and tasks. Mosaic AI is a comprehensive suite for building and deploying production-quality generative AI systems. It includes tools for fine-tuning LLMs with custom , performing efficient vector search for retrieval-augmented , and governing AI models through a centralized platform. It empowers organizations to create compound AI agents with built-in , , and scalability, addressing enterprise needs for customized generative applications. Mosaic AI Vector Search is a fully managed vector database service in Databricks that enables fast, scalable similarity search on vector embeddings stored in Delta tables. Users create a vector search index from a Delta table containing embeddings and metadata, then query it for applications like retrieval-augmented generation (RAG), semantic search, and recommendation systems. It supports serverless endpoints, SQL querying via the vector_search() function, and optimization for performance and retrieval quality. In June 2025, Databricks launched Agent Bricks, a toolkit for developing and optimizing domain-specific AI agents grounded in enterprise . Agent Bricks automates agent evaluation, tuning, and deployment for use cases like , assistance, and multi-agent systems, with updates in November 2025 enhancing cross-industry accelerators and integration. It simplifies building high-quality, scalable agents using natural language descriptions, integrated with Mosaic AI and Unity Catalog. A key output of Mosaic AI is the DBRX model, an open-weight released on March 27, 2024, under the Databricks Open Model License. DBRX employs a mixture-of-experts with 132 billion parameters, activating only 36 billion during inference for high efficiency, and excels in reasoning, coding, and long-context tasks, outperforming models like Llama 2 70B on benchmarks such as HumanEval and MMLU. Trained on a diverse dataset excluding certain proprietary sources, it supports fine-tuning for enterprise use cases while promoting open-source in efficient AI. Databricks Assistant, launched in July 2023, serves as an AI copilot integrated into notebooks, SQL editors, dashboards, and workflows, enabling interactions for querying data, generating code, and troubleshooting. It provides context-aware suggestions, such as writing Python or SQL snippets, explaining query results, or automating routine data tasks, thereby accelerating productivity for users at all skill levels. Powered by foundation models like those from Mosaic AI, it ensures responses are grounded in workspace-specific data and metadata. For advanced analytics, Databricks offers AutoML, which automates the process of building models by selecting algorithms, tuning hyperparameters, and generating deployable pipelines for , regression, and forecasting tasks. Complementing this, the Databricks Feature Store acts as a centralized repository for storing, discovering, and reusing ML features across projects, integrating with Unity Catalog for and supporting both batch and real-time serving. These tools reduce manual effort in model development while maintaining for production environments. Data analytics in Databricks SQL is performed by accessing the SQL Query Editor or creating a SQL warehouse, then executing queries on Delta tables using standard SQL syntax. For example, users can run a query such as SELECT region, SUM(amount) AS total_sales FROM sales_gold GROUP BY region ORDER BY total_sales DESC; to aggregate and analyze sales data. Visualizations can be added to query results, and the output can be shared as an interactive dashboard for business intelligence purposes.

Training and Certification

Databricks Academy, accessible at academy.databricks.com, offers free self-paced courses for Data Engineer certifications. The "Data Engineering with Databricks" course serves as a core resource for the Associate certification, covering topics including ETL, Lakeflow, Unity Catalog, Auto Loader, and Jobs through videos, readings, and hands-on labs. The "Advanced Data Engineering with Databricks" course supports preparation for the Professional certification.

Partnerships and Integrations

Databricks has maintained native support for (AWS) since its inception in 2013, leveraging the platform's scalability for its unified analytics offerings and serving thousands of joint customers. The company deepened its collaboration with starting in 2017, when it announced Azure Databricks as a first-party service, enabling seamless integration of Apache Spark-based analytics within Azure's ecosystem. Databricks expanded to (GCP) in 2021, launching a jointly developed service that incorporates GCP-native tools like for , science, and workloads. In December 2024, Databricks and AWS highlighted their ongoing partnership at AWS re:Invent, emphasizing advancements in lakehouse architecture to enhance and AI innovation for enterprise users. More recently, in June 2025, Databricks formed a strategic AI partnership with Google Cloud to natively integrate Gemini models into its Data Intelligence Platform, facilitating secure AI applications over enterprise via Vertex AI capabilities. Databricks has forged key AI-focused alliances, including a landmark multi-year deal with in March 2025 to bring Claude models to its platform, allowing over 10,000 customers to build and deploy AI agents on private . The company has also strengthened ties with , adding native support for GPU acceleration in June 2024 and further integrating AI technologies in December 2024 to optimize , model training, and generative AI development on the Databricks platform. The Databricks ecosystem features numerous integrations with third-party tools to support end-to-end workflows, including business intelligence platforms like Tableau for visualization, data warehousing solutions like via Delta Sharing for interoperability, and ETL tools like dbt for analytics engineering. This , comprising over 6,000 , enables seamless data connectivity across diverse stacks. Databricks' multi-cloud strategy across AWS, Azure, and GCP has driven widespread adoption, supporting a customer base of over 20,000 organizations as of September 2025 and allowing enterprises to avoid while leveraging specialized cloud strengths for data and AI initiatives.

Industry Recognition

In May 2025, Gartner published its Magic Quadrant for Data Science and Machine Learning Platforms (transitioning to AI Platforms for Data Science and Machine Learning), positioning Databricks as a Leader with the highest ranking in Ability to Execute and the furthest in Completeness of Vision. Other Leaders included Microsoft, IBM, AWS, DataRobot, Dataiku, and Altair. As of early 2026, no 2026 edition had been published.

Operations

Leadership and Governance

has served as CEO and co-founder of Databricks since 2013, guiding the company's strategic direction with a strong emphasis on advancing AI initiatives and data intelligence platforms. Under his leadership, Databricks has prioritized the integration of AI into enterprise data workflows, leveraging his background in distributed systems from UC Berkeley. Key executives include , a co-founder and current Executive Chairman, who maintains close ties to UC Berkeley as a professor of and computer sciences, influencing the company's research-driven approach to AI and open-source projects. , another co-founder and Chief Architect, oversees technical architecture and contributes significantly to development, ensuring robust open-source foundations for Databricks' platform. serves as CTO and co-founder, focusing on technological innovation, particularly in AI and analytics tools. The board of directors features prominent figures such as , co-founder and general partner at , providing expertise in scaling technology ventures, alongside independent members like Elena Donio and Jonathan Chadwick, who bring diverse backgrounds in and technology governance. This composition emphasizes a blend of investor insight and specialized tech knowledge to steer Databricks' growth. As a private company, Databricks maintains a governance structure centered on ethical AI practices through its AI Governance Framework, which addresses risks across the AI lifecycle including data privacy and security. The company commits to open-source contributions, notably via ongoing enhancements to projects like Apache Spark, while ensuring compliance with data privacy standards such as SOC 2 Type II and GDPR to protect customer data. Leadership milestones include the expansion of the C-suite following significant funding rounds after 2023, such as the addition of specialized roles to bolster global operations and AI product development amid surging demand.

Global Presence and Workforce

Databricks is headquartered in , California, where it announced a new headquarters at One Sansome Street in March 2025, along with a commitment to invest more than $1 billion in the city's operations over the next three years to support local job creation and economic growth. This expansion underscores the company's deep roots in the Bay Area, where it continues to operate from its original Spear Street location during renovations. The company maintains a robust global footprint, with major office hubs in cities including (Netherlands), (United Kingdom), (France), Bangalore (India), Singapore, Sydney (Australia), Tokyo (Japan), and São Paulo (Brazil), among others across , , , and . By 2025, Databricks operates in 23 countries spanning five continents, enabling it to serve a diverse international customer base and foster regional innovation in data and AI. As of 2025, Databricks employs approximately 8,000 people worldwide, reflecting more than 50% growth in its workforce since 2023 amid aggressive hiring in engineering, sales, and customer-facing roles to meet surging demand for its platform. This expansion includes plans to add 3,000 new positions in 2025, with a focus on diverse talent to drive technical expertise and global . Databricks cultivates a company culture centered on innovation, inclusion, and employee empowerment, earning top rankings as one of Glassdoor's Best Places to Work in 2025 and a Fortune Best Workplaces in for the second consecutive year. Employees highlight transparent , collaborative environments, and opportunities for professional growth, with 90% reporting positive experiences in and . Post-pandemic, the company has adopted a flexible hybrid work model, allowing most roles to blend remote and in-office arrangements to support work-life balance and global collaboration. Operationally, Databricks leverages data centers hosted by its primary cloud partners—Amazon Web Services (AWS), Microsoft Azure, and Google Cloud Platform—to deliver scalable infrastructure without owning physical facilities, ensuring high availability across regions. The company also advances sustainability through initiatives that promote energy-efficient cloud usage and support customers in tracking carbon footprints via its platform, aligning with broader industry efforts toward net-zero emissions.

References

Add your contribution
Related Hubs
Contribute something
User Avatar
No comments yet.