Recent from talks
Contribute something
Nothing was collected or created yet.
Databricks
View on WikipediaThis article contains promotional content. (July 2025) |
Databricks, Inc. is a San Francisco-based software company.[4] It was founded in 2013 by the original creators of Apache Spark.[1][5] It offers a cloud-based platform for data analytics and artificial intelligence.[6]
Key Information
Databricks promotes the concept of a 'data lakehouse', which combines elements of data warehouses and data lakes to enable management and analysis of both structured and unstructured data for business analytics and AI applications.[7] The company similarly develops Delta Lake, an open-source project to improve the reliability of data lakes for data science use cases.[8]
History
[edit]2013-2021
[edit]Databricks grew out of the AMPLab project at University of California, Berkeley that was involved in making Apache Spark, an open-source distributed computing framework built atop Scala.[9] The company was founded by Ali Ghodsi, Andy Konwinski, Arsalan Tavakoli-Shiraji, Ion Stoica, Matei Zaharia, Patrick Wendell, and Reynold Xin.[10]
In November 2017, the company was announced as a first-party service on Microsoft Azure via integration Azure Databricks.[11]
In February 2021, together with Google Cloud, Databricks provided integration with the Google Kubernetes Engine and Google's BigQuery platform.[12] At this point in time, the company said more than 5,000 organizations used its products.[13]
Fortune ranked Databricks as one of the "Best Large Workplaces for Millennials" in 2021.[14]
2022-Present
[edit]In November 2023, Databricks unveiled the Databricks Data Intelligence Platform, a new offering that combines the unification benefits of the lakehouse with MosaicML’s Generative AI technology to enable customers to better understand and use their own proprietary data.[15]
The firm was valued at $62 billion in December 2024,[16] following a funding round that raised one of the largest amounts in history, an equivalent to the largest single AI investment ever made.[17]
In early March 2025, Databricks announced it would invest $1 billion in San Francisco's downtown.[18]
In March 2025, Databricks entered a five-year partnership with Anthropic, incorporating Anthropic's AI products into the Databricks Data Intelligence Platform in a deal valued at $100 million.[19][20] Ali Ghodsi remains CEO of Databricks.[19] The company has partnered with Tech Mahindra, Microsoft, and Optus to build a Unified Data Platform (UDP) for cloud migration.[21]
Acquisitions
[edit]In June 2020, Databricks bought Redash, an open-source tool for data visualization and building of interactive dashboards.[22] In 2021, it bought German no-code company 8080 Labs whose product, bamboolib, allowed data exploration without any coding.[23] In May 2023, Databricks bought data security group Okera, extending Databricks data governance capabilities.[24] In June, it bought the open-source generative AI startup MosaicML for $1.4 billion.[25][26] In October, Databricks bought data replication startup Arcion for $100 million.[27] In 2024, Databricks bought Tabular, a data-management system used by open source AI, for over $1 billion.[28]
In March 2023, in response to the popularity of OpenAI's ChatGPT, the company introduced an open-source language model, named Dolly after Dolly the sheep, that allowed developers to create chatbots. Dolly uses fewer parameters to produce similar results as ChatGPT, but Databricks had not released formal benchmark tests to show whether its bot actually matched the performance of ChatGPT.[29][30][31]
Databricks reported $1.6 billion in revenue for the 2023 fiscal year, representing a significant increase from the previous year.[32]
In 2025, Databricks acquired a serverless database startup, Neon,[33] for around $1 billion.[34]
Funding
[edit]In September 2013, Databricks announced it raised $13.9 million from Andreessen Horowitz and said it aimed to offer an alternative to Google's MapReduce system.[35][36] Microsoft was a noted investor of Databricks in 2019, participating in the company's Series E at an unspecified amount.[37][38] The company has raised $1.9 billion in funding, including a $1 billion Series G led by Franklin Templeton at a $28 billion post-money valuation in February 2021. Other investors include Amazon Web Services, CapitalG (a growth equity firm under Alphabet Inc.) and Salesforce Ventures.[13] In August 2021, Databricks finished its eighth round of funding by raising $1.6 billion and valuing the company at $38 billion.[39] In December 2024, Databricks announced a $10 billion financing at a valuation of $62 billion.[16] In August 2025, Databricks announced a $1 billion Series K funding round, raising their valuation to over $100 billion.[40]
| Series | Date | Amount (million $) | Lead investors |
|---|---|---|---|
| A | 2013 | 13.9[35] | Andreessen Horowitz |
| B | 2014 | 33[41] | New Enterprise Associates |
| C | 2016 | 60[42] | New Enterprise Associates |
| D | 2017 | 140[43] | Andreessen Horowitz |
| E | Feb. 2019 | 250[44] | Andreessen Horowitz |
| F | Oct. 2019 | 400[45] | Andreessen Horowitz |
| G | Jan. 2021 | 1,000[46] | Franklin Templeton Investments |
| H | Aug. 2021 | 1,600[47] | Morgan Stanley |
| I | Sep. 2023 | 500[48] | Capital One Ventures, Nvidia |
| J | Dec. 2024 | 10,000[49] | Thrive Capital |
| K | Aug. 2025 | 1,000[40] | Thrive Capital, Insight Partners |
Products
[edit]Databricks develops a cloud data platform referred to as a 'lakehouse', combining features of data warehouses and data lakes.[50] The platform is built on the open-source Apache Spark framework, enabling analytical queries on semi-structured data without requiring a traditional database schema.[51] In October 2022, Lakehouse received FedRAMP authorized status for use with the U.S. federal government and contractors.[52]
The company has also created Delta Lake, MLflow and Koalas, open source projects that span data engineering, data science and machine learning.[53][54]
In June 2020, Databricks launched Delta Engine, a fast query engine for Delta Lake,[55] compatible with Apache Spark and MLflow.[56]
In November 2020, Databricks introduced Databricks SQL (previously called SQL Analytics) for running business intelligence and analytics reporting on top of data lakes. Analysts can query data sets with standard SQL or use connectors to integrate with business intelligence tools like Holistics, Tableau, Qlik, SigmaComputing, Looker, and ThoughtSpot.[57]
Databricks offers a platform for other workloads, including machine learning, data storage and processing, streaming analytics, and business intelligence.[58]
In early 2024, Databricks released the Mosaic set of tools for customizing, fine-tuning and building AI systems. It includes AI Vector Search for building RAG models; AI Model Serving, a service for deploying, governing, querying and monitoring models fine-tuned or pre-deployed by Databricks; and AI Pretraining, a platform for enterprises to create their own LLMs.[59]
In March 2024, Databricks released its DBRX foundation model under the Databricks Open Model License.[60] It has a mixture-of-experts architecture and is built on the MegaBlocks open-source project.[61] DBRX cost $10 million to create. At the time of launch, it was the fastest open-source LLM,[citation needed] based on commonly used industry benchmarks. It beat other models like Llama 2 at solving logic puzzles and answering general knowledge questions, among other tasks. And while it has 136 billion parameters, it only uses 36 billion, on average, to generate outputs.[62] DBRX also serves as a foundation for companies to build or customize their own AI models. Companies can also use proprietary data to generate higher-quality outputs for specific use cases.[63]
In addition to building the Databricks platform, the company has co-organized massive open online courses about Spark[64] and a conference for the Spark community called the Data + AI Summit,[65] formerly known as Spark Summit.[66]
Collaborations
[edit]In December 2024, Databricks along with Wiz and Workday has decided to run their products on top of AWS via the new button called "Buy with AWS button".[67]
In June 2025, Databricks announced a partnership with Google Cloud to integrate its Data Intelligence Platform with Google Cloud services.[68]
References
[edit]- ^ a b Krystal Hu; Kenrick Cai; Echo Wang (December 13, 2024). "Exclusive: Databricks nears record $9.5 billion VC raise, eyes extra $4.5 billion debt". Reuters. Retrieved December 13, 2024.
- ^ Jin, Berber (December 17, 2024). "Exclusive: Databricks Is Finalizing a $10 Billion Funding Haul". The Wall Street Journal. Archived from the original on December 17, 2024.
- ^ Jordan, Novet (January 22, 2025). "Meta backs Databricks as the data analytics startup inches toward IPO". CNBC. Archived from the original on January 22, 2025.
- ^ staff, CNBC com (June 16, 2020). "36. Databricks". CNBC. Archived from the original on December 24, 2022. Retrieved April 8, 2021.
- ^ Saul, Derek (September 14, 2023). "Top IPO Prospect Databricks Scores $43 Billion Valuation Thanks To $500 Million Funding Round Including AI Titan Nvidia". Forbes. Archived from the original on September 4, 2024. Retrieved March 26, 2024.
- ^ Sullivan, Mark (March 19, 2024). "How Databricks is helping customers develop their own customized AI models". Fast Company. Retrieved March 19, 2024.
- ^ Clark, Lindsay (November 16, 2023). "Databricks' lakehouse becomes foundation under fresh layer of AI dreams". The Register. Archived from the original on September 4, 2024. Retrieved November 16, 2023.
- ^ "Databricks launches Delta Lake, an open source data lake reliability project". VentureBeat. April 24, 2019. Archived from the original on March 24, 2022. Retrieved April 6, 2021.
- ^ "Databricks, SiFive, and Anyscale founders explain how they all built their red-hot startups out of a legendary UC Berkeley lab". Business Insider. September 8, 2021. Retrieved May 18, 2025.
- ^ "Founders". Databricks. March 3, 2023. Retrieved May 18, 2025.
- ^ "Microsoft makes Databricks a first-party service on Azure". TechCrunch. November 15, 2017. Archived from the original on September 4, 2024. Retrieved April 6, 2021.
- ^ "Databricks brings its lakehouse to Google Cloud". TechCrunch. February 17, 2021. Archived from the original on September 4, 2024. Retrieved February 18, 2021.
- ^ a b Konrad, Alex (February 2, 2021). "Databricks Raises $1 Billion At $28 Billion Valuation, With The Cloud's Elite All Buying In". Forbes. Archived from the original on February 1, 2021. Retrieved July 29, 2021.
- ^ "100 Best Large Workplaces for Millennials". Fortune. June 16, 2021. Archived from the original on March 24, 2022. Retrieved July 16, 2021.
- ^ Cai, Kenrick (November 16, 2023). "Databricks' New AI Product Adds A ChatGPT-Like Interface To Its Software". Forbes. Archived from the original on September 4, 2024. Retrieved November 16, 2023.
- ^ a b Griffith, Erin (December 17, 2024). "Databricks Is Raising $10 Billion, in One of the Largest Venture Capital Deals". The New York Times. Archived from the original on December 18, 2024. Retrieved December 19, 2024.
- ^ "Why AI company Databricks just scored one of the biggest funding rounds in history". Fast Company. December 17, 2024.
- ^ Waxmann, Laura (March 5, 2025), "San Francisco tech company Databricks to invest $1 billion in city", San Francisco Chronicle, San Francisco Chronicle, retrieved March 30, 2025
- ^ a b "Databricks and Anthropic partner to help companies build AI agents", The Hindu, March 27, 2025, retrieved March 30, 2025
- ^ Lin, Belle (March 26, 2025), Anthropic, Databricks Team Up in Scramble for AI Revenue, The Wall Street Journal, retrieved March 30, 2025
- ^ Butler, Georgia (March 14, 2025). "Australia's Optus creates unified data platform for migration to the cloud". www.datacenterdynamics.com. Retrieved October 31, 2025.
- ^ "Databricks acquires Redash, a visualizations service for data scientists". TechCrunch. June 24, 2020. Retrieved April 6, 2021.
- ^ Eric Rosenbaum (October 6, 2021). "$38 billion software start-up Databricks makes acquisition to leave code behind". CNBC. Archived from the original on October 6, 2021. Retrieved February 20, 2022.
- ^ Palazzolo, Stephanie (May 3, 2023). "Exclusive: $38 billion data and AI darling Databricks acquires security startup Okera". Business Insider. Archived from the original on May 3, 2023.
- ^ Datta, Tiyashi; Hu, Krystal (June 26, 2023). "Databricks strikes $1.3 billion deal for generative AI startup MosaicML". Reuters. Archived from the original on June 26, 2023. Retrieved June 27, 2023.
- ^ Council, Stephen (June 26, 2023). "SF tech firm Databricks to buy 2-year-old startup for $21 million per employee". SFGATE. Archived from the original on June 26, 2023. Retrieved June 27, 2023.
- ^ "After $43B valuation, Databricks acquires data replication startup Arcion for $100M". TechCrunch. October 23, 2023. Retrieved October 23, 2023.
- ^ Galloni, Allessandra, ed. (June 5, 2024). "Databricks to buy data management firm Tabular for over $1 bln". Reuters.
- ^ Hu, Krystal; Nellis, Stephen (March 24, 2023). "Databricks pushes open-source chatbot as cheaper ChatGPT alternative". Reuters. Archived from the original on March 25, 2023.
- ^ Loften, Angus (March 24, 2023). "Databricks Launches 'Dolly,' Another ChatGPT Rival". The Wall Street Journal. Archived from the original on March 24, 2023.
- ^ Goldman, Sharon (March 24, 2023). "Databricks debuts ChatGPT-like Dolly, a clone any enterprise can own". VentureBeat. Archived from the original on April 11, 2023.
- ^ Wilhelm, Ron Miller and Alex (March 7, 2024). "Databricks keeps marching forward with $1.6B in revenue". TechCrunch. Archived from the original on March 12, 2024. Retrieved March 8, 2024.
- ^ "Databricks Agrees to Acquire Neon to Deliver Serverless Postgres for Developers + AI Agents". Databricks. May 13, 2025. Retrieved May 16, 2025.
- ^ Novet, Jordan (May 14, 2025). "Databricks is buying database startup Neon for about $1 billion". CNBC. Retrieved May 16, 2025.
- ^ a b Harris, Derrick (September 25, 2013). "Databricks raises $14M from Andreessen Horowitz, wants to take on MapReduce with Spark". Archived from the original on January 15, 2022. Retrieved September 28, 2014.
- ^ Lorica, Ben (September 25, 2013). "Databricks aims to build next-generation analytic tools for Big Data". O'Reilly Media. Archived from the original on July 4, 2014. Retrieved September 28, 2014.
- ^ "Databricks raises $250M at a $2.75B valuation for its analytics platform". TechCrunch. February 5, 2019. Archived from the original on September 4, 2024. Retrieved April 8, 2021.
- ^ Novet, Jordan (February 5, 2019). "Microsoft used to scare start-ups but is now an 'outstandingly good partner,' says Silicon Valley investor Ben Horowitz". CNBC. Archived from the original on February 5, 2019. Retrieved April 6, 2021.
- ^ Mellor, Chris (September 1, 2021). "Databricks raises data lake of cash at monstrous $380bn valuation". Blocks & Files. Archived from the original on September 1, 2021. Retrieved September 4, 2021.
- ^ a b "Databricks Surpasses $4B Revenue Run-Rate, Exceeding $1B AI Revenue Run-Rate". Databricks. September 5, 2025. Retrieved September 10, 2025.
- ^ Miller, Ron (June 30, 2014). "Databricks Snags $33M In Series B And Debuts Cloud Platform For Processing Big Data". TechCrunch. Archived from the original on July 1, 2014. Retrieved September 28, 2014.
- ^ Shieber, Jonathan (December 15, 2016). "Databricks raises $60 million to be big data's next great leap forward". TechCrunch. Archived from the original on December 15, 2016. Retrieved December 16, 2016.
- ^ "Databricks Secures $140 Million to Accelerate Analytics and Artificial Intelligence in the Enterprise". Databricks. August 22, 2017. Archived from the original on January 13, 2022. Retrieved May 16, 2019.
- ^ "Databricks' $250 Million Funding Supports Explosive Growth and Global Demand for Unified Analytics; Brings Valuation to $2.75 Billion". Databricks. February 5, 2019. Archived from the original on January 15, 2022. Retrieved February 5, 2019.
- ^ "Databricks announces $400M round on $6.2B valuation as analytics platform continues to grow". TechCrunch. October 22, 2019. Archived from the original on September 4, 2024. Retrieved October 24, 2019.
- ^ "Databricks raises $1B at $28B valuation as it reaches $425M ARR". Tech Crunch. February 2021. Archived from the original on November 3, 2021. Retrieved February 14, 2021.
- ^ "Databricks raises $1.6B at $38B valuation as it blasts past $600M ARR". Tech Crunch. Archived from the original on December 30, 2021. Retrieved July 1, 2021.
- ^ Nishant, Niket; Hu, Krystal (September 14, 2023). "Databricks raises over $500 mln at $43 bln valuation". Reuters. Retrieved September 20, 2023.
- ^ Tan, Huileng (December 18, 2024). "Databricks is raising a gigantic funding round". Business Insider.
- ^ Michael, Armbrust; Ghodsi, Ali; Xin, Reynold; Zaharia, Matei (January 2021). "Lakehouse: A New Generation of Open Platforms that Unify Data Warehousing and Advanced Analytics" (PDF). Conference on Innovative Data Systems Research. Archived (PDF) from the original on December 22, 2020. Retrieved July 29, 2021.
- ^ "With massive $1B infusion, Databricks takes aim at IPO and rival Snowflake". SiliconANGLE. February 1, 2021. Archived from the original on April 6, 2023. Retrieved April 8, 2021.
- ^ Simone, Stephanie (October 17, 2022). "Databricks achieves FedRAMP Authorized status". KMWorld. Information Today. Archived from the original on October 20, 2022. Retrieved October 20, 2022.
- ^ "The Two Sigma Ventures Open Source Index". Two Sigma Ventures. Archived from the original on November 29, 2022. Retrieved April 8, 2021.
- ^ "MLOps Tools - Ranking. OSS Insight". OSS Insight. Archived from the original on September 4, 2024. Retrieved April 3, 2024.
- ^ "Databricks Cranks Delta Lake Performance, Nabs Redash for SQL Viz". Datanami. June 24, 2020. Archived from the original on July 9, 2020. Retrieved April 8, 2021.
- ^ "Databricks launches Delta Lake, an open source data lake reliability project". VentureBeat. April 24, 2019. Archived from the original on March 24, 2022. Retrieved April 8, 2021.
- ^ "Databricks launches SQL Analytics". TechCrunch. November 12, 2020. Archived from the original on September 4, 2024. Retrieved April 8, 2021.
- ^ Brust, Andrew. "Databricks, champion of data "lakehouse" model, closes $1B series G funding round". ZDNet. Archived from the original on February 1, 2021. Retrieved April 8, 2021.
- ^ "Riding the data-powered AI wave: Inside Databricks' unified stack solution". Databricks. March 14, 2024. Archived from the original on September 4, 2024. Retrieved April 5, 2024.
- ^ "Databricks Open Model License". Databricks. March 27, 2024. Retrieved August 6, 2025.
- ^ "Databricks open-sources its own large language model, DBRX". Databricks. March 27, 2024. Archived from the original on April 5, 2024. Retrieved April 5, 2024.
- ^ "Inside the Creation of the World's Most Powerful Open Source AI Model". Databricks. March 27, 2024. Archived from the original on September 4, 2024. Retrieved April 5, 2024.
- ^ "Databricks' new open-source AI model could offer enterprises a leaner alternative to OpenAI's GPT-3.5". Databricks. March 27, 2024. Archived from the original on September 4, 2024. Retrieved April 5, 2024.
- ^ "Databricks to run two massive online courses on Apache Spark". Databricks. December 2, 2014. Archived from the original on January 13, 2022. Retrieved December 16, 2016.
- ^ "Data + AI Summit". Databricks. Archived from the original on April 23, 2022. Retrieved April 8, 2021.
- ^ Highlights from DATA+AI Summit 2021 Towards Data Science. June 27, 2021
- ^ Novet, Jordan (December 4, 2024). "Amazon rolls out Buy with AWS button to let software vendors more easily sell to its cloud customers". CNBC. Retrieved December 8, 2024.
- ^ "Databricks Announces Strategic AI Partnership with Google Cloud to Bring Data Intelligence Platform to Customers Worldwide". Databricks. June 11, 2025. Retrieved June 13, 2025.
Databricks
View on GrokipediaHistory
Founding and Early Development (2013-2021)
Databricks was founded in 2013 in San Francisco by the original creators of Apache Spark from the University of California, Berkeley's AMPLab, including Ali Ghodsi, Ion Stoica, Matei Zaharia, Patrick Wendell, Reynold Xin, Andy Konwinski, and Arsalan Tavakoli-Shiraji.[1][3] The company emerged from efforts to commercialize Spark, an open-source unified analytics engine for large-scale data processing, with an initial emphasis on building a cloud-based platform to simplify data engineering, analytics, and machine learning workflows.[15][16] This unified analytics platform, centered on Apache Spark, enabled collaborative environments for data teams to process and analyze massive datasets without managing underlying infrastructure, while contributing back to the open-source community through enhancements to Spark and related projects.[15] In its early years, Databricks introduced key open-source tools to address challenges in data reliability and machine learning operations. Delta Lake, launched in October 2017 as a proprietary storage layer and open-sourced in April 2019, provided ACID transactions, scalable metadata handling, and unified batch and streaming data processing to make data lakes more reliable and performant for analytics workloads.[17][18] Similarly, MLflow was introduced in June 2018 as an open-source platform to manage the end-to-end machine learning lifecycle, including experiment tracking, package management, and model deployment, helping teams standardize workflows across diverse environments.[19] Databricks expanded its cloud integrations to broaden accessibility, partnering with Microsoft in November 2017 to launch Azure Databricks, a fully managed service integrating Spark-based analytics directly into the Azure ecosystem for enterprise-scale data processing.[20] This was followed by a partnership with Google Cloud in February 2021, enabling customers to run Databricks workloads on Google Kubernetes Engine and integrate with services like BigQuery for seamless data lakehouse architectures.[21] By 2021, the platform served more than 5,000 organizations worldwide, reflecting rapid adoption among enterprises tackling complex data challenges.[22] That same year, Databricks was ranked #59 on Fortune's Best Large Workplaces for Millennials list, based on employee feedback highlighting its inclusive culture and innovative environment.[23]Expansion and Innovation (2022-Present)
In 2022, Databricks accelerated its growth by deepening its focus on AI integration and enterprise-scale data solutions, building on its foundational Apache Spark technology to address emerging demands in generative AI and unified analytics. The company achieved significant valuation milestones, reaching $43 billion in September 2023 following a Series I funding round that raised over $500 million, led by T. Rowe Price with participation from Nvidia and Capital One. This valuation reflected Databricks' expanding role in the AI ecosystem, as enterprises increasingly adopted its platform for data-driven AI applications. By December 2024, a $10 billion Series J funding round—primarily non-dilutive financing for employee liquidity and strategic investments—elevated the company's valuation to $62 billion, underscoring investor confidence in its AI momentum amid a booming market for data intelligence tools. A pivotal innovation came in November 2023 with the launch of the Data Intelligence Platform, which unified data management, AI capabilities, and governance into a single lakehouse-based architecture, enabling organizations to build and deploy AI agents securely over enterprise data. This platform incorporated advanced generative AI features, such as semantic understanding of data assets, to streamline workflows from data ingestion to model serving. In March 2024, Databricks released DBRX, an open-source large language model developed using its Mosaic AI tools, which set new benchmarks for efficiency in mixture-of-experts architectures while outperforming models like Llama 2 in key evaluations. These advancements were bolstered by strategic partnerships, including a March 2025 multi-year collaboration with Anthropic to integrate Claude models natively into the platform, allowing over 10,000 customers to develop AI agents with enhanced reasoning and safety features directly on their data. Databricks' expansion extended to substantial investments in infrastructure and talent, exemplified by a $1 billion commitment in March 2025 to bolster San Francisco's economy through expanded headquarters at One Sansome Street and multi-year hosting of its Data + AI Summit, projected to draw up to 50,000 attendees by 2030. Revenue growth highlighted this trajectory, with $1.6 billion in revenue for fiscal year 2024 (ended January 31, 2024) and reaching an annual run-rate of $3 billion by December 2024, driven by over 50% year-over-year expansion in AI and analytics adoption.[24][25] By September 2025, the company surpassed a $4 billion annual recurring revenue run-rate, with more than $1 billion attributed to AI products, while targeting net revenue retention above 140% and serving over 650 customers spending more than $1 million annually. In September 2025, a $1 billion Series K round further propelled its valuation beyond $100 billion, funding AI research, acquisitions, and global scaling to meet surging enterprise demand.[5]Business Developments
Funding and Valuation
Databricks has secured substantial financing since its inception, amassing over $22 billion in total capital through equity rounds and debt facilities by late 2025.[26] This funding has supported the company's expansion in data analytics and AI technologies, with investments reflecting strong investor confidence in its lakehouse architecture and AI-driven growth.[13] The company's funding history includes several landmark equity rounds, detailed in the following table:| Date | Round | Amount Raised | Post-Money Valuation | Key Investors |
|---|---|---|---|---|
| September 2013 | Series A | $14 million | Not disclosed | Andreessen Horowitz |
| October 2019 | Series F | $400 million | $6.2 billion | Andreessen Horowitz, Tiger Global |
| February 2021 | Series G | $1 billion | $28 billion | Franklin Templeton, Amazon Web Services |
| September 2023 | Series I | $500 million | $43 billion | T. Rowe Price, NVIDIA |
| December 2024 | Series J | $10 billion | $62 billion | Thrive Capital, Andreessen Horowitz, NVIDIA |
| September 2025 | Series K | $1 billion | Over $100 billion | Thrive Capital, GIC |
| December 2025 | Series L | Over $4 billion | $134 billion | Andreessen Horowitz, Thrive Capital, GIC, Insight Partners, Fidelity Management & Research Company |
Acquisitions
Databricks has pursued an aggressive acquisition strategy to enhance its data and AI platform, focusing on technologies that integrate seamlessly into its lakehouse architecture. Since 2020, the company has completed several key acquisitions, targeting areas such as data visualization, governance, real-time pipelines, generative AI, open table formats, and serverless databases. These moves have bolstered Databricks' capabilities in analytics, security, and scalable AI development, often with an emphasis on open-source integrations. In June 2020, Databricks acquired Redash, an open-source business intelligence tool, for an undisclosed amount. Redash provides advanced visualization and dashboarding features, enabling data teams to query, visualize, and share insights from diverse data sources. The acquisition aimed to strengthen Databricks' analytics offerings by embedding these capabilities directly into its platform, reducing reliance on external tools and improving collaboration for data scientists and analysts.[35] Databricks expanded its low-code/no-code capabilities in October 2021 with the acquisition of 8080 Labs, a German startup behind the bamboolib tool, for an undisclosed sum. Bamboolib offers a user-friendly interface for data exploration and transformation using Python's Pandas library, targeting non-technical users or "citizen data scientists." This move sought to democratize data science within the lakehouse platform, allowing broader organizational access to AI and ML workflows without deep coding expertise.[36] In May 2023, Databricks acquired Okera, a data governance platform, for an undisclosed amount. Okera specializes in fine-grained access controls and policy enforcement, using AI to manage permissions across large-scale data environments. The integration enhanced Databricks' Unity Catalog with AI-centric governance features, ensuring secure data sharing while complying with regulatory requirements in enterprise settings.[37] Databricks made a significant push into generative AI in June 2023 by acquiring MosaicML for $1.3 billion in a mostly stock deal. MosaicML develops tools for efficient training and deployment of large language models (LLMs), reducing costs from millions to thousands of dollars per model. This acquisition integrated Mosaic's Composer and Inference platforms into Databricks, enabling customers to build and fine-tune foundation models directly on their data lakehouse, accelerating enterprise AI adoption.[38] To improve real-time data ingestion, Databricks acquired Arcion in October 2023 for $100 million. Arcion provides log-based change data capture (CDC) technology for high-throughput, low-latency data pipelines from databases to analytics platforms. The deal introduced native, scalable CDC tools to Databricks, simplifying data movement for AI applications and reducing operational costs compared to traditional ETL methods.[39] In June 2024, Databricks acquired Tabular, a data management company founded by the original creators of Apache Iceberg, for more than $1 billion (reports estimate between $1 billion and $2 billion). Tabular offers managed services for open table formats like Iceberg, focusing on interoperability and performance in data lakes. This acquisition reinforced Databricks' commitment to open standards, enhancing Delta Lake compatibility and positioning the platform as a leader in unified data management for AI workloads.[40] Databricks continued its expansion in February 2025 with the acquisition of BladeBridge, an AI-powered data warehouse migration provider, for an undisclosed amount. BladeBridge specializes in code assessment and automated conversion tools to facilitate migrations from legacy data warehouses to modern platforms like Databricks SQL. The acquisition aims to streamline enterprise migrations, reducing time and complexity for customers transitioning to the lakehouse architecture.[41] In May 2025, Databricks acquired Neon, a serverless Postgres database provider, for approximately $1 billion. Neon's architecture separates compute and storage for elastic scaling, supporting developer-friendly Postgres in cloud environments. The move aimed to embed serverless relational capabilities into the lakehouse, facilitating AI agent development and real-time querying for production AI systems.[42] Databricks further advanced its AI agent capabilities in August 2025 by acquiring Tecton, a real-time machine learning feature platform, for approximately $900 million. Tecton enables the management and serving of features for ML models at scale, providing low-latency data for personalized AI applications. This integration enhances Databricks' support for real-time AI agents by combining Tecton's feature store with the lakehouse for faster model deployment and inference.[43][44] In October 2025, Databricks acquired Mooncake Labs, a startup developing cloud-native OLTP database technologies, for an undisclosed amount. Mooncake Labs focuses on Postgres-based solutions optimized for AI workloads, contributing to Databricks' Lakebase initiative for integrated transactional and analytical processing. The acquisition accelerates the development of agentic AI systems requiring high-performance, scalable databases within the lakehouse ecosystem.[45] These acquisitions, with a total disclosed spend exceeding $5 billion by November 2025, have significantly accelerated Databricks' AI portfolio by filling critical gaps in governance, real-time processing, model training, and database scalability. By prioritizing open-source and AI-native technologies, Databricks has created a more unified ecosystem, enabling enterprises to operationalize data and AI at scale while maintaining flexibility across hybrid environments.Products and Technology
Core Platform Components
The Databricks platform is built on a unified analytics foundation that leverages open-source technologies to enable scalable data processing and management. At its core is the integration of Apache Spark, which serves as the primary compute engine for distributed data processing across batch, streaming, and interactive workloads. Spark's DataFrame API and SQL engine allow users to perform complex transformations and queries on large datasets using familiar languages like Python, Scala, R, and SQL. This integration optimizes Spark for cloud environments, providing fault-tolerant execution and in-memory processing to handle petabyte-scale data efficiently. A key storage innovation is Delta Lake, an open-source layer developed by Databricks that adds ACID transaction capabilities to data lakes built on Parquet files. Delta Lake introduces a transaction log that ensures data reliability, supports schema enforcement, and enables features like time travel for querying historical table versions and scalable metadata handling. These capabilities facilitate reliable extract, transform, and load (ETL) pipelines, merging batch and streaming data without duplication or loss, and have been adopted widely since its open-sourcing in 2019.[46] Databricks Runtime provides the optimized execution environment that bundles Apache Spark, Delta Lake, and additional enhancements for performance and security. It includes pre-configured libraries, automatic scaling, and security features such as table access control and credential passthrough, ensuring seamless operation across multi-cloud deployments. Released in versions with long-term support (LTS), such as 17.3 LTS incorporating Spark 4.0, the runtime simplifies cluster management while delivering up to 3x faster query performance through optimizations like adaptive query execution.[47] Databricks SQL, launched in November 2020, is a serverless query service designed for business intelligence (BI) and ad-hoc analytics. It leverages the Photon engine for high-concurrency SQL queries on lakehouse data, supporting integrations with tools like Tableau and Power BI, and offers predictive optimization to reduce costs by up to 50% compared to traditional warehouses. This service enables analysts to explore Delta tables directly without managing infrastructure, focusing on insights from structured and semi-structured data.[48][49] In June 2025, Databricks introduced Lakebase, a fully managed, serverless PostgreSQL-compatible OLTP database engine integrated into the Data Intelligence Platform. Lakebase supports real-time transactional workloads for data applications and AI agents, with features like database branching, instant scaling, and seamless connectivity to lakehouse storage, enabling developers to build AI-optimized applications without managing infrastructure.[50][51] The platform embodies the lakehouse architecture, which serves as its core and integrates the scalability and cost-efficiency of data lakes with the reliability and performance of data warehouses. This hybrid paradigm, promoted by Databricks since 2019, is based on open-source technologies such as Apache Spark and Delta Lake, extending to data engineering, business intelligence (BI) analysis, machine learning, and generative AI. By using open formats like Delta Lake on object storage, the lakehouse eliminates data silos, supports unified batch and streaming analytics, and provides ACID guarantees without proprietary lock-in. This approach enables a single system for ETL, BI, and machine learning, reducing total cost of ownership while scaling to exabytes.[52][3] Unified governance is achieved through Unity Catalog, a centralized metastore that manages metadata, access controls, and lineage across multi-cloud environments. It supports fine-grained permissions on data assets, models, and volumes in formats like Delta Lake and Apache Iceberg, with features for auditing, data sharing via Delta Sharing, and AI-driven discovery. Unity Catalog ensures compliance and collaboration by providing a three-level namespace (metastore, catalog, schema) that spans workspaces, preventing governance fragmentation in distributed setups.[53]AI and Analytics Tools
Databricks provides a range of specialized AI and analytics tools designed to streamline machine learning workflows, generative AI development, and advanced data analysis within its unified platform. These tools emphasize end-to-end management of AI models, from experimentation to deployment, while supporting scalable operations on large datasets. By integrating with the underlying Apache Spark and Delta Lake foundation, they enable efficient handling of big data for AI applications. MLflow is an open-source platform developed by Databricks for managing the complete machine learning lifecycle, encompassing experiment tracking, model packaging, reproduction, and serving. It allows users to log parameters, metrics, and artifacts during training, facilitating collaboration and reproducibility across teams. On Databricks, MLflow is fully managed, supporting both traditional ML and generative AI workflows, including evaluation of large language models (LLMs) and agents.[54] Koalas, introduced by Databricks, offers a Python API that enables scalable pandas operations on Apache Spark, allowing data scientists to apply familiar DataFrame manipulations to big data without significant code changes. Originally released as an open-source project, Koalas has evolved into the Pandas API on Spark, integrated into PySpark since Apache Spark 3.2, bridging the gap between single-node pandas workflows and distributed computing. This tool supports operations like grouping, joining, and statistical computations on massive datasets, enhancing productivity for analytics and feature engineering tasks.[55] Mosaic AI is a comprehensive suite for building and deploying production-quality generative AI systems. It includes tools for fine-tuning LLMs with custom data, performing efficient vector search for retrieval-augmented generation, and governing AI models through a centralized platform. It empowers organizations to create compound AI agents with built-in observability, security, and scalability, addressing enterprise needs for customized generative applications.[56] Mosaic AI Vector Search is a fully managed vector database service in Databricks that enables fast, scalable similarity search on vector embeddings stored in Delta tables. Users create a vector search index from a Delta table containing embeddings and metadata, then query it for applications like retrieval-augmented generation (RAG), semantic search, and recommendation systems. It supports serverless endpoints, SQL querying via the vector_search() function, and optimization for performance and retrieval quality.[57] In June 2025, Databricks launched Agent Bricks, a toolkit for developing and optimizing domain-specific AI agents grounded in enterprise data. Agent Bricks automates agent evaluation, tuning, and deployment for use cases like information extraction, knowledge assistance, and multi-agent systems, with updates in November 2025 enhancing cross-industry accelerators and governance integration. It simplifies building high-quality, scalable agents using natural language descriptions, integrated with Mosaic AI and Unity Catalog.[58][59][60] A key output of Mosaic AI is the DBRX model, an open-weight foundation model released on March 27, 2024, under the Databricks Open Model License. DBRX employs a mixture-of-experts architecture with 132 billion parameters, activating only 36 billion during inference for high efficiency, and excels in reasoning, coding, and long-context tasks, outperforming models like Llama 2 70B on benchmarks such as HumanEval and MMLU. Trained on a diverse dataset excluding certain proprietary sources, it supports fine-tuning for enterprise use cases while promoting open-source innovation in efficient AI.[61][62][63] Databricks Assistant, launched in July 2023, serves as an AI copilot integrated into notebooks, SQL editors, dashboards, and workflows, enabling natural language interactions for querying data, generating code, and troubleshooting. It provides context-aware suggestions, such as writing Python or SQL snippets, explaining query results, or automating routine data tasks, thereby accelerating productivity for users at all skill levels. Powered by foundation models like those from Mosaic AI, it ensures responses are grounded in workspace-specific data and metadata.[64][65][66] For advanced analytics, Databricks offers AutoML, which automates the process of building machine learning models by selecting algorithms, tuning hyperparameters, and generating deployable pipelines for classification, regression, and forecasting tasks. Complementing this, the Databricks Feature Store acts as a centralized repository for storing, discovering, and reusing ML features across projects, integrating with Unity Catalog for governance and supporting both batch and real-time serving. These tools reduce manual effort in model development while maintaining scalability for production environments.[67] Data analytics in Databricks SQL is performed by accessing the SQL Query Editor or creating a SQL warehouse, then executing queries on Delta tables using standard SQL syntax. For example, users can run a query such asSELECT region, SUM(amount) AS total_sales FROM sales_gold GROUP BY region ORDER BY total_sales DESC; to aggregate and analyze sales data. Visualizations can be added to query results, and the output can be shared as an interactive dashboard for business intelligence purposes.[68][69]

