Discover more from Bacalhau
Distributed Compute Platform Bacalhau Launches Next Release 1.1
5 Days of Bacalhau - Day 1
We’re excited to announce that MotherDuck/DuckDB and Bacalhau/Expanso have partnered to deliver an enterprise-grade logging solution - Unified Data Log Insights: Leveraging Bacalhau and Motherduck for Advanced File Querying Across Distributed Networks
Newly released Bacalhau 1.1, adds several new features designed to solve our customer’s needs. It’s never been easier to deploy Bacalhau in high performance scenarios like the ones listed above.
What's New in 1.1?
Full Fleet Targeting
We learned that our users want to execute single jobs and conduct simultaneous operations across their entire node fleet. With the new
--target=all option, you can execute jobs and queries in parallel on all matching nodes in your network with a single command. This makes it easy to get a comprehensive view of your entire infrastructure, allows for immediate update rollouts, and simplifies the management of edge device fleets.
New Node CLI and APIs
Fleet management has gotten better for enterprises to gain a fuller view of their deployment footprint and provides more detailed node info. . In Bacalhau 1.1, we are adding two new functions to make this much easier:
bacalhau node list- which will output a table of all the nodes in a network
bacalhau node describe- which will output the entire config for a node.
Bacalhau now supports running jobs for extended periods without timing out, enabling long-running intensive computations! Users and node operators can configure custom timeouts if needed. However, by default, there is no execution timeout limit. The two flags to set this are:
--timeout- to set the requested time out for a job.
--max-timeout- to set the maximum time out allowed on a node.
Richer Node Configuration
Bacalhau 1.1.0 offers a wider range of customizable options for your setup, including persistent config files, command flags, and environment variables.
This improved flexibility allows you to tailor Bacalhau to your preferences. For insights into new options like config.yaml and updates from v1.0.3, refer to the latest configuration guide.
Here is a sample configuration file:
Node: ClientAPI: Host: bootstrap.production.bacalhau.org Port: 1234 User: KeyPath: /home/user/.bacalhau/user_id.pem
This method replaces the old way of using many different command line flags, making it easier to deploy Bacalhau to nodes.
⚠️ NOTE: Existing Bacalhau users may need to follow migration steps to retain their previous configurations.
Support for TLS on Public APIs
Bacalhau now supports secure client-server communication using TLS certificates. These certificates help prevent eavesdropping and ensure data remains secure while moving between the client and the Bacalhau network. Setting up TLS is simple, requiring only a few extra lines in your setup, and it offers essential encryption for your jobs and data.
You can make use of free certificates from Let's Encrypt or supply your own certificate and private key.
To enable TLS, specify the certificate and key paths either in the Bacalhau config file or via CLI flags. In a sample configuration file, it looks like this:
Node: ServerAPI: TLS: ServerCertificate: /root/hostname.crt ServerKey: /root/hostname.prv
We ALSO support auto-provisioning of TLS certificates from Let’s Encrypt using the following setting in your configuration file:
Node: ServerAPI: TLS: AutoCert: example.com
Optional External Storage of Jobs and Executions
To date, all job information has been stored in the memory of the running server. This works well for many, but some users wanted this information stored externally to preserve job information across server restarts.
Bacalhau 1.1 adds support for external job storage. Job histories can now outlive the nodes that ran them. The benefits include improved recordkeeping for auditing, the ability to restart interrupted jobs, and better insights from long-term job analytics. Node operators can also configure Bacalhau to save this information on storage solutions such as IPFS, S3, etc. to securely archive job data. Once enabled, all job information will be versioned and stored in the external system and protected from loss even if nodes go offline. This persistence unlocks new use cases and visibility for Bacalhau users.
Learn how to configure persistence.
Improved Error Messages
We’ve heard your feedback suggesting the need for clearer error reporting. We now highlight why jobs fail to make it clear to end-users. For example, previously many errors would report "
not enough nodes to run job" that will now provide details like:
Could not inspect image - could be due to repo/image not existing, or registry needing authorization
Job timeout 1800s exceeds maximum possible value 300s
Let us know if you have other issues where the errors aren’t clear!
Fine-Grained Control Over Image Entrypoint and Parameters
Users now have finer control over the entrypoint and parameters passed to a Docker image. Previously, Bacalhau would ignore the default entrypoint to the image and replace it with the first argument after
bacalhau docker run <image>. Now, the default entrypoint in the image is used and all of the positional arguments are passed as the command to that entrypoint.
The entrypoint can still be explicitly overridden by using the
--entrypoint flag or by setting the
Entrypoint field in a Docker job spec.
GPU Support Inside Docker Containers
When running ML models, nothing beats custom hardware like GPUs. Bacalhau 1.1 now has the capability to automatically utilize GPUs when the Bacalhau node is running inside a Docker container. Ensure that the Bacalhau node is started with a GPU capability by passing
bacalhau docker run.
Support for Private IPFS Clusters
Most enterprise workloads need the privacy enabled by running clusters disconnected from the external world. While IPFS is a terrific protocol for moving data around, the default mechanism for doing so requires moving through public gateways.
Bacalhau 1.1 now enables connecting to existing, private, bacalhau clusters. To connect to a private swarm, pass the path to a swarm key to
--ipfs-swarm-key, set the BACALHAU_IPFS_SWARM_KEY environment variable or configure the
Node.IPFS.SwarmKeyPath configuration property.
When connecting to a private swarm, Bacalhau will no longer bootstrap using or connect to public peers and will rely on the swarm for all data retrieval.
⚠️ NOTE: It will also be necessary to set this environment variable when using a client that uses
bacalhau get to download from a private IPFS swarm.
Setting the environment variable is NOT necessary if using the
--ipfs-connect flag, which already can connect to IPFS nodes running a private swarm.
We are hard at work to develop long running jobs, and pluggable executors and future releases.
To date, the idea of a Bacalhau job was finite - it started, it did some work, and then it finished.
However, in many cases, the cost of starting the job was significant (such as a database, loading a model into memory, etc.), or the response time needed for the job was very short (faster than the time it would take even a fast container to start). That’s why we developed long running jobs.
In Bacalhau 1.1, Bacalhau jobs can now run indefinitely and will automatically restart when nodes come back online, allowing for continuous and uninterrupted processing. Long-running jobs allow compute workloads to process data that arrives continuously and is perfect for tasks such as pre-filtering logs, processing real-time analytics, or working with edge sensors.
With the introduction of long-running jobs, ML inference tasks can now operate in a "warm-boot" environment. This means that the necessary resources and dependencies are already loaded, significantly reducing the time taken to run an inference job.
With this experimental feature, you can now unleash the power of Bacalhau to handle dynamic and ever-changing data streams, ensuring continuous and uninterrupted processing of your computational workloads.
While Bacalhau supports Docker and WASM natively today, in many cases this is an unnecessary abstraction. If people just wanted to execute a curl command, or a simple Python script, it would be very convenient to just specify that, instead of first having to go through container packaging to do so. Or perhaps people want to use Bacalhau to configure and operate host nodes directly and don’t want their job to be entirely contained within Docker. That’s where pluggable executors come in.
In Bacalhau 1.1, you can start to use a simpler interface to specify what to execute by relying on executor plugins. We’re still putting the finishing touches on this feature and it’ll be ready to try in an upcoming release, but once it’s ready you’ll simply detail it all in the job spec:
Engine: Type: python Params: Repository: github.com/me/my-project Script: main.py
Or, you can use the familiar command line interface and just run
bacalhau run python main.py directly.
This will still run the job in appropriate security and isolation context but now the details of that are left to the Bacalhau runtime, which could choose to execute the script inside Docker, in a Python virtual environment on the host machine or even as a WebAssembly binary if appropriate! It is like a remote executor, but better!
Planned Upcoming Release Items
We have LOTS more features in our roadmap:
Moving long running jobs, and pluggable executors to general availability
Hosted clusters and “burstable” clusters
Our new WebUI dashboard
Open telemetry tracing
And lots more! If you’d like something in particular, come tell us!
5 Days of Bacalhau Blog Series
If you’re interested in exploring these features more in depth, check back tomorrow for our 5 Days of Bacalhau.
Day 1 - Bacalhau 1.1 Release
Day 2 - Improved Queuing For Jobs
Day 3 - New Job Types
Day 5 - GPU Support for Docker Nodes
How to Get Involved
We're looking for help in several areas. If you're interested in helping out, there are several ways to contribute. Please reach out to us at any of the following locations.
As always, thank you for reading, and onward!
Your humble Bacalhau team.