Your Fast Track to Bacalhau: Local Development via Docker-in-Docker
Docker-in-Docker With Bacalhau For Fast Tests And Low Barrier Entry
Bacalhau is an open-source platform designed for “Compute-Over-Data”, allowing you to run jobs where your data resides. This avoids the costly process of moving massive amounts of data to other centralized locations. While easy to use, getting started and experimenting with a full Bacalhau deployment can involve:
Setting up cloud resources.
Configuring networks.
Managing multiple components.
This process may feel complicated if you just want to try new things on the fly.
This article explains how this process works and how, at Expanso, engineers have worked to make it more straightforward, accommodating it for situations where speed is key.
Let’s dive into it!
The Bacalhau Basics
Bacalhau is a framework that manages distributed computing networks that can operate across diverse infrastructures. It prioritizes keeping data in place, bringing computational power to data. It does so with the following nodes:
Orchestrator nodes: An orchestrator node is a component responsible for coordinating the execution of jobs across the network. Its primary role is to handle job scheduling, deciding which compute nodes are best suited to run a particular task, based on specified constraints.
Compute nodes: A compute node is a worker machine that executes the actual computational tasks assigned by an orchestrator. These nodes run the jobs, process the data, and generate results. Like orchestrator nodes, compute nodes are also instances of the single Bacalhau binary, but configured to operate in compute mode.
Hybrid Nodes: These nodes serve both roles at once. They are often used for local developments or small setups.
So, Bacalhau consists of an orchestrator that schedules jobs and multiple compute nodes that execute those jobs. These jobs run inside containers—via Docker or WASM—on the compute nodes. Setting this up traditionally can involve:
Provisioning Virtual Machines (VMs) or instances for the orchestrator and compute nodes.
Setting up S3-compatible storage for data input/output.
Installing Docker on all compute nodes.
Configuring networking and credentials for all components to communicate.
Let’s be honest: these are barriers you want to eliminate in the case of rapid experimentation. At Expanso, we know it, and our engineers got you covered!
The Local Solution: Docker Compose To The Rescue
The solution engineers wanted to solve responds to a question: ” How do we eliminate these barriers?”
Bacalhau works on containerized applications, but a container allows you to work only at a node level. However, engineers wanted to mimic the whole traditional setup so that it could be replicated on a single machine without:
Provisioning VMs.
Getting credentials and setting up S3 storage.
Installing Docker on multiple compute nodes.
The solution is straightforward: they used Docker Compose, a tool for defining and running multi-container applications. So, engineers at Expanso developed a Docker Compose image that defines all the services and their connections. This way, with docker-compose up, you bring the entire Bacalhau cluster to life on your machine in a few seconds.
The Inception Moment: Docker-in-Docker (DinD) Comes Into The Game
Here's where it gets interesting. In the Docker Compose setup, the compute nodes are themselves already running as Docker containers. So, here’s the challenge: how does a Docker container (the compute node) launch another Docker container (the actual job)?
This is where Docker-in-Docker (DinD) comes in. Docker provides special docker:dind
images. When a container is run using a DinD image, it runs its own independent Docker daemon inside that container.
By basing your Bacalhau compute node containers on a DinD image, you allow them to pull other Docker images and run containers within their own isolated environment.
The workflow looks like this:
You run docker-compose up.
Docker Compose starts the orchestrator, MinIO—an open-source, S3-compatible object storage server—, and compute node containers (which are based on DinD images).
You submit a job to the local Bacalhau orchestrator.
The orchestrator assigns the job to one of the compute node containers.
The compute node container pulls the required Docker image for the job and runs the job inside a new container, nested within itself.
That’s it!
This mimics how a Bacalhau distributed deployment operates, but it is contained within your local Docker environment on your machine without:
Cloud accounts.
Infrastructure management.
Overheads.
Fear of messing things up. You can destroy your container and recreate another one.
Why This Matters: The Benefits
This DinD approach offers you advantages like:
Low entry barrier: No need for cloud accounts or complex infrastructure setup. Just Docker Desktop (or Docker Engine) and the compose
YAML
file.Realistic simulation: Faithfully replicates the multi-component architecture and the container-based job execution of a real Bacalhau deployment.
Safe sandbox: You can experiment, break things, test configurations. If something goes wrong, you can simply docker-compose down and docker-compose up to start a fresh instance.
Rapid iteration: You can quickly test changes to Bacalhau configurations, job specifications, or even Bacalhau itself, if you're contributing to the project.
Offline capability: You can develop and test without an internet connection (once the initial Docker images are pulled).
Typical situations where the DinD approach with Bacalhau comes in handy are when:
You need to create a fast Proof Of Concept (POC) for a customer.
You have to try or add a new feature on the fly.
You just wanted to give Bacalhau a fast try.
How To Implement It
After discussing the theory, let’s see how to get started with DinD and Bacalhau.
Prerequisites
To replicate the following steps, your system must match the following prerequisites:
Docker engine: You need to have Docker installed and running on your system.
Docker Compose: You need Docker Compose itself:
Docker Desktop: If you are using Docker Desktop, Docker Compose is typically included as part of the installation. To verify you have it installed, type docker compose version.
Server/Manual install: If you installed Docker Engine manually, you might need to install Docker Compose separately.
Step 1: Clone The Repository
Clone the repository:
git clone <https://github.com/bacalhau-project/bacalhau-network-setups>
Note that the documentation for using it is in the docker-compose/ folder.
This is the structure of the cloned repository:
bacalhau-network-setups/
├──docker-compose/
│ ├──expanso-cloud/
│ ├──multi-region/
│ ├──single-region/
│
└──README.MD
Step 2: Choose Your Setup
You can choose from different setups:
Single region.
Multi region
Expanso Cloud
Suppose you want a single region. From the main folder, you have to move to the single-region/ folder:
cd docker-compose/single-region
Very well. You are now ready to launch Docker Compose!
Step 3: Launch Docker Compose
From the single-region/ folder, launch Docker Compose:
docker compose up -d
After a few seconds, the process will be completed:
Good. You now have a working single-regional Bacalhau instance.
Step 4: Run Your Jobs
Connect to the client container:
docker compose exec client sh
Create a job list:
bacalhau job list
The expected result is:
CREATED ID JOB TYPE STATE
Now you can run your jobs
bacalhau job run ...
Good! You have launched your jobs with a Bacalhau instance all on your machine.
Step 5: Conclude And Cleanup
When all the jobs are done, after closing the instance, you can clean everything up with:
docker compose down -v --remove-orphans
This will:
Stop all containers.
Remove all containers.
Remove all volumes.
Remove any orphaned containers.
Remove all networks.
Here is the expected result:
Your instance is now clean, and you can start new jobs!
Conclusion
By leveraging the Docker-in-Docker technique, you can create self-contained Bacalhau environments locally. This approach:
Lowers the barrier to entry, providing an efficient environment for you to learn, develop, and test Bacalhau.
Allows you to test new features on the fly, without worrying about infrastructure management. This saves time and overhead.
If you want to know more about the architecture and the setup, read this README file.
What's Next?
To start using Bacalhau, install Bacalhau and give it a shot.
If you don’t have a node network available and would still like to try Bacalhau, you can use Expanso Cloud. You can also set up a cluster on your own (with setup guides for AWS, GCP, Azure, and more 🙂).
Get Involved!
We welcome your involvement in Bacalhau. There are many ways to contribute, and we’d love to hear from you. Reach out at any of the following locations:
Commercial Support
While Bacalhau is open-source software, the Bacalhau binaries go through the security, verification, and signing build process lovingly crafted by Expanso. Read more about the difference between open-source Bacalhau and commercially supported Bacalhau in the FAQ. If you want to use the pre-built binaries and receive commercial support, contact us or get your license on Expanso Cloud!