ETH Global Paris 2023: Developing CLI Tooling to Run Compute over Data on Geospatial Data Operations
Guest post by Dhruv Malik and Charlie Durand, Extra Labs
Report on a hackathon working with Bacalhau to build a decentralized Google Maps.
It was an indeed thrilling experience to participate in ETH Global Paris hackathon, where we hacked on our project for two continuous days. We also got to meet with representatives of great projects, including Protocol Labs.
Our team consisted of two members :
Dhruv Malik (working as a developer and web3 researcher at Extra labs)
Charlie Durand (founder of Extra labs, expert in tokenomics, and product manager).
With the aim of building a simple developer tool, we wanted to allow anyone to generate maps from our algorithms by providing inputs such as map coordinates. All the while, the entire pipeline would be running on the Bacalhau framework.
At this point, we had been working together for a few months, building the foundations at Extra Labs, a company that focuses on building developer tools and frameworks to host large-scale, near-real-time 3D maps in the world using decentralized physical infrastructure network (DePIN) infrastructure.
Problem statement
Charlie has been working on apps that rely on maps for many years and has encountered challenges when trying to deploy 3D maps. She observed that existing solutions for 3D maps were lacking in several dimensions, namely level of detail, recency, and accessibility. In terms of maps, the ecosystem suffers from a steep monopoly on 2D maps held by Google Maps. Their massive investments since the early 2000s has brought a great deal of products to life, but this has also led to huge price increases. In 2018 alone, the price for map-related Google APIs increased by an average of 1,400%.
As 3D maps become the next frontier for solving pressing challenges, such as sustainable town planning, protecting heritage monuments, and mapping during natural disasters, we need to build a platform as a public good that helps various stakeholders make sense of various data sources (such as satellite images and sensors like LiDAR or natural photos) and apply data transformations to make high-quality maps available to all at a reasonable cost.
To accomplish this, we require an efficient, scalable, and inexpensive compute framework. Prior to this hackathon, we built a pipeline that connects different open-source algorithms for 3D surface reconstruction (which can be found here) and benchmarked some decentralized computing services. We discovered Compute-over-Data and Bacalhau, which met our needs very well. During ETHGlobal Paris, we explored a demonstration project that would enable us to deploy our pipeline on Bacalhau.
Project Gadus: CLI for running geospatial compute jobs
We named our project Gadus (after the Latin word for fish, similar to Bacalhau). The inspiration of our project was PLEX, developed by LabDAO for running BioML tools on private hosted instances of Bacalhau.
Similarly our CLI tools enable users to orchestrate 3D surface reconstruction operations on Bacalhau and to store them on decentralized storage, simplifying the various intermediate steps in the process: definition of the job specs, querying the requestor node to run the given geospatial transformation container algorithm, then reliably tracking the status, and finally storing the result on local instances/IPFS with a verifier.
We choose Lilypad as the framework on top of Bacalhau, to create a marketplace of the compute resources provider, which hosts the compute jobs of the clients. The resource providers are then compensated on-chain.
Along with that we also tried developing:
The wallet client to allow users to build a data wallet (using DFNS)
Role based access for clients (based on their on-chain authentication using Worldcoin).
A way to define compute jobs and manage payments via an on-chain contract adapter.
Here is the workflow that we have envisioned (first release to be released in coming weeks) using Gadus:
User creates their wallet, which will create an external account for the user and drops some lil-ETH token (which is the payment token for lilypad testnet deployment).
For the data engineering part, we store the input data model of point cloud images together with the pipeline template information on the storage provider.
Next, the user must define the corresponding Docker image in which the job will be executed. To prevent sybil and cyber attacks, access to the compute infrastructure will be restricted to specific compute options.
These parameters are passed via CLI or hosted serverless application, which then relays them to the requester node. The user can then approve the result, which will be stored in their personal space on the decentralized storage system, using web3.storage.
We were able to demonstrate the minimal workflow showing the capabilities of the tool. Many thanks to the Bacalhau and Protocol Lab team members that were present during the hackathon and helped us troubleshoot issues. Various detailed examples written in the documentation were also significantly helpful for new entrants to start building end-to-end compute pipelines of various categories of data.
Challenges and the way forward
We appreciate the significant progress that Bacalhau has made since its inception, such as vertical scaling with support for various compute runtimes and inputs from different storage platforms. After the proposed v1 launch of Gadus, there will be progress in supporting real-time processing. However, Bacalhau has yet to address the significant potential for horizontal scaling, such as reducing latency, increasing bandwidth by running compute jobs in parallel across various compute nodes, and scaling resources based on the needs of client jobs.
In the end, we believe Bacalhau and the Compute-over-Data ecosystem represents a promising layer to support large-scale, near-real time 3D maps. We’ll be happy to keep working with the people involved to solve the existing challenges and benefit the DePIN ecosystem.
In coming weeks, we will launch the MVP of Gadus, along with technical blog series on system design for Compute-over-Data platforms, so follow our Medium, Linkedin and GitHub. We also just published a long article detailing the potential benefits of using a Compute-over-Data framework for geospatial data analysis.
Finally, I would like to express my appreciation for the help provided by the dev-rep team of Bacalhau/Lilypad (Ally and Luke), as well as the members of the Bacalhau Slack community. Kudos to this active community that assists users in overcoming production challenges related to Bacalhau.
If you have any questions or feedback, we’d love to hear about it, you can reach out on our discord here.
Cheers and keep BUILDL!