In our latest release, Bacalhau nodes running in a Docker container now automatically detect GPUs and make them available to executing jobs.
You might expect us to have already had this feature. In fact, we thought we did! Bacalhau running outside of a container is already capable of detecting GPUs and we thought using Docker’s --gpus=all
flag when running the Bacalhau image would have worked seamlessly.
A member of our community pointed out to us that their GPU wasn’t automatically found when running in a container. We promptly investigated – and found that GPU support within Docker containers was a tricky challenge that we’d underestimated.
Running Bacalhau in a Docker container increases security
As we explain in our docs, starting a new Bacalhau node is as simple as:
docker run --gpus=all \
-v /var/run/docker.sock:/var/run/docker.sock \
ghcr.io/bacalhau-project/bacalhau:latest serve
Running Bacalhau inside a Docker container offers several advantages over running it outside of a container:
Increased isolation and security: by encapsulating your Bacalhau node within a container, you can isolate it from the underlying host system and minimize the risk of conflicts or vulnerabilities. This ensures that your data processing workflows remain secure and protected.
Easier and non-permanent installation: with Docker, you can quickly set up and try out a Bacalhau node with only one command. This makes it an ideal choice for those who want to experience the power of Bacalhau without setting up any additional components or opening up network services on their host machine.
Whether you're a researcher, data scientist, or developer, Docker provides a convenient and efficient way to explore the capabilities of Bacalhau.
GPUs are now automatically available to the network
Previously, as our community member pointed out to us, harnessing the power of GPU hardware while running Bacalhau within Docker was a challenge. It required providing a shim tool to work around the lack of management tools within the Bacalhau container.
Docker containers don’t naturally have access to host GPUs. A driver or runtime is needed to expose GPUs inside the container. Whilst it’s easy to ask Docker to do that when we set up our node (which is the meaning of the --gpus=all
flag above), the image also needs to be configured to have the runtime tools installed. Bacalhau previously used the nvidia-container-cli
to detect GPUs which isn’t installed in our Docker image.
There is some magic that happens when one runs a GPU-capable Docker image. If an image has been prepared with driver tools and is then run as GPU-capable by Docker, the nvidia-smi
binary will magically appear in the container. So Bacalhau now uses this binary to detect GPUs, which works both inside and outside of a container.
In the future, we’ll run a Bacalhau job to detect GPUs
Our new system works, but it requires the Bacalhau Docker image to be based on a specific Linux dsitribution with CUDA installed. We can make the Bacalhau image smaller and more portable if we didn’t do this.
Instead, we could get our new node to run a Bacalhau job to detect container support. If the node has GPU support succesfully installed this job will run and print out GPU statistics, and if not it will fail. This allows the Bacalhau node to detect Docker GPU support without needing any binaries to be locally installed. The job is equivalent to:
bacalhau docker run --gpu=1 --entrypoint=nvidia-smi \
nvidia/cuda:12.2.0-base-ubuntu20.04 -- \
--query-gpu=index,gpu_name,memory.total \
--format=csv,noheader,nounits
GPUs are also only one type of specialist hardware. If we want to add support for more types of hardware like TPUs, a scalable way to do this is via adding more Bacalhau jobs to detect them. Running these tasks via jobs also lets us take advantage of the reliability features that Bacalhau is there to provide!
Summary
With this new capability, Bacalhau becomes an even more versatile and indispensable tool for data processing. Whether you are dealing with massive datasets or executing computationally intensive algorithms, deploying Bacalhau in a Docker container with GPU support empowers you to achieve faster and more efficient results.
Be Part of the Evolution
Stay tuned as we continue to enhance and expand Bacalhau's capabilities. We are relentlessly committed to delivering groundbreaking updates and improvements. Don't miss out on being part of this exciting journey.
The Bacalhau project is available today as Open Source Software.The public GitHub repo can be found here. If you like the project, please give us a star ⭐ 🙂
We're looking for help in several areas. If you're interested in helping out, there are several ways to contribute and you can always reach out to us via Slack or Email.
For more information, see: