Expanso Lands $7.5M Seed Investment Led by General Catalyst and Hetz Ventures to Revolutionize Distributed Data Processing
We are happy to announce a $7.5 million in seed funding led by General Catalyst and Hetz Ventures, along with Array Ventures to support the open source project Bacalhau be the universal compute platform.
While the project got started in 2022, we founded our company in February of 2023 to address what is both a straightforward, but overlooked, challenge:
“Actually making use of enterprise data.”
Distributed big data processing can be complex and challenging. One of the biggest challenges is dealing with the time and cost involved with transferring data between different nodes to a centralized data lake. This can make it difficult to be responsive to new data inflows in real time. Further, many platforms, while powerful, require converting existing code to new frameworks just to access the data, let alone get insights. And distributed big data processing systems are often a rich target for security issues, such as leaking personally identifiable information (PII), regulatory concerns, and data breaches.
The open-source software Bacalhau, developed and backed by Expanso, is built on the principle of "Compute Over Data," which means that it brings the processing jobs to where the data is, rather than moving the data to the cloud first. This has a number of advantages, including:
Reduced costs: Moving large amounts of data to and from the cloud is expensive.
Enhanced speed: Bacalhau processes data locally, removing cloud transfer latency and boosting performance for data-heavy applications.
Increased security: Not moving the data reduces the risk of data breaches and other security incidents.
Further, with Bacalhau, users can streamline their existing workflows without the need of extensive rewriting by running arbitrary Docker containers and WebAssembly (WASM) images as tasks. The software can run on-premises, or inside of any cloud including Amazon Web Services (AWS), Microsoft Azure, Google Cloud, Oracle Cloud, and many more.
“Infrastructure built to meet data where it is, even if distributed around the world, is long overdue. What Expanso is building with Bacalhau is intended to revolutionize the way big data is processed and global compute jobs are executed, while unlocking an entirely new class of applications.”
"Expanso brings compute to the data, enabling businesses to operate securely at their operational pace and maximize the utility of valuable data. In less than a year, Dave and his team of exceptional technologists and entrepreneurs, have achieved significant milestones, with the platform now in use with various sectors, including some of the world's largest defense organizations. We are proud to support Expanso as they work to enhance the impact of distributed data for businesses worldwide," said Quentin Clark, Managing Director of General Catalyst.
Developers can use the tools they already know and enjoy using, like Python, R and Duck DB - with almost no changes. Nearly anything that can be containerized, can run on their network. "A missing part of the modern data stack is the ability to process data where it is being created rather than have to centralize everything first," said Jordan Tigani, CEO and co-founder of MotherDuck. "Bacalhau fills in that missing link, allowing large numbers of remote workers to use DuckDB to filter, summarize, and transform data at the edge before communicating results to MotherDuck in the cloud."
Bacalhau offers a free demo network which has been live for nearly six months. Since launching, their network has handled more than 1,5 million jobs for design partners like the University of Maryland, BOINC, New Atlantis Foundation, and many more.
Bacalhau is available today as Open Source Software. To download, visit the website here. The public GitHub repo can be found here.