We are thrilled to announce the release of Bacalhau v1.3.0, a significant milestone in our quest for helping organizations of all sizes deal with the world of distributed compute. Packed with exciting new features like user access control, local results publishing, and TLS support, this release is built to address the needs of even the largest organizations without the complexity of traditional distributed platforms!
But that's not all! We have also started showing off experimental features for early feedback such as custom job types. Please make sure to give us feedback on what is working and what you would like to see next.
Without further ado, let’s dive in!
New Features
User Access Control
Bacalhau v1.3.0 now supports authentication and authorization of individual users with a flexible and customizable auth system that remains simple for single-node clusters but scales up well to wide enterprise deployments.
Bacalhau auth integrates well with whatever auth systems users already have. Bacalhau can use private keys, shared secrets, usernames, and passwords and 2FA. Additionally, Bacalhau offers OAuth2/OIDC for authentication and can apply access control to single users, groups, teams and can use RBAC or ABAC mechanisms as desired.
The default behaviour is unchanged. Users will authenticate using their private key and will be authorized to manage their own jobs. Read-only information about the cluster will be accessible upon authentication.
To start using user authentication, check out the auth docs and install a custom policy to control user access and their permissions.
Publishing and Serving Results on Local Disks
In Bacalhau v1.3.0 we are introducing a new publisher type that lets users publish to the local disk of the compute node. This will streamline the process of testing the publisher functionality without the need for a remote storage service. This is especially handy for those who are just getting started with Bacalhau.
The local publisher is composed of two parts: the publisher that compresses and moves job outputs to a specified location, and an HTTP server that delivers the content back to the user.
By default, the HTTP server listens on port 6001, but this can be modified using the --local-publisher-port
flag. The server will deliver content from the directory specified by the local-publisher-directory
flag, or, if not set, from a subdirectory of the configured Bacalhau storage directory. The --local-publisher-address
flag can be used to set the address that the HTTP server listens on. Default values for this vary by environment (e.g., localhost
for test and development environments, public
for production environments), but users can set these values in the config if the defaults are not suitable.
We should stress that managing the storage is still the administrator’s responsibility. Because local storage necessarily means storing on a single node, thinking through clean up, persistence, etc. are things you should think through before moving into production!
NATS-Based Networking
In the Bacalhau v1.3.0 release, we are introducing a new transport layer to improve inter-node connectivity. This new layer utilizes NATS, a robust messaging system, instead of the existing libp2p transport.
With the introduction of NATS, we are simplifying the network requirements for Compute nodes. Now, only Orchestrator nodes (also known as Requester nodes) need to be publicly reachable. As a result, Compute nodes only need to know the address of a single Orchestrator node, and they can learn about and connect to other Orchestrators at runtime. This change not only simplifies the setup process but also enhances resilience as it allows Compute nodes to failover and reconnect to other Orchestrators when necessary. This change only affects inter-node communication, and the Bacalhau HTTP API is unchanged.
We acknowledge that adapting to new technologies takes time. In recognition of this, libp2p will continue to be supported as an alternative during this transition period. This ensures that you have the flexibility to migrate at your own pace. Users who wish to continue using libp2p need to specify the Node.Network.Type
config option or --network
flag as libp2p
explicitly when running their network.
Persistent Memory of Connected Nodes
The Bacalhau v1.3.0 release introduces a significant upgrade ensuring the persistence of node information across requester node restarts. This addresses a shortcoming of the previous in-memory store, which would lose all knowledge of compute nodes upon a restart. The new persistent store is a major advancement towards maintaining more accurate node information and tracking compute nodes that may be temporarily inaccessible to the cluster.
The new persistent store is used automatically when NATS-based networking is used.
TLS Support for Bacalhau CLI
Bacalhau v1.3.0 now supports TLS requests to the requester node for all CLI commands. While the default communication remains HTTP, users can activate TLS calls using the command line flag --tls
, setting the Node.ClientAPI.ClientTLS.UseTLS
config option to true
or by exporting the BACALHAU_API_TLS=1
environment variable.
For self-signed certificates, users can either accept insecure requests or provide a CA certificate file. The Node.ClientAPI.ClientTLS.CACert
config option, BACALHAU_API_CACERT
environment variable and --cacert
flag can be used to verify the certificate with a provided CA certificate file. Alternatively, the Node.ClientAPI.ClientTLS.Insecure
config option, --insecure
flag or BACALHAU_API_INSECURE
environment variable can be used to make API requests without certificate verification.
Customizable Node Names
In the Bacalhau v1.3.0 release, we've introduced a new feature that allows users to set their own nodeID. This addition gives users the flexibility to tailor their node names according to their preferences and needs.
Users have the option to manually set the node name, or they can opt for automatic generation using various providers. These providers include puuid
(which is the default option), uuid
, hostname
, aws
, and gcp
.
The puuid
option generates a node name using the n-{uuid}
pattern, such as n-f1bab231-68ad-4c72-bab6-580cd49bf521
. The uuid
option generates a uuid as a node name. The hostname
option uses the hostname as the node id, replacing any .
with -
to ensure compatibility with NATS. The aws
option uses the EC2 instance name if the node is deployed on AWS, and the gcp
option uses the VM's id if the node is deployed on GCP.
It's important to note that these providers will only be called into action if no existing node name is found in config.yaml
, the CLI --name
flag, or environment variables. Once a node name is generated, it will be persisted in config.yaml
, ensuring that the node names are consistent across sessions.
To set the node name manually:
bacalhau serve --name my-custom-name
To use a puuid
as the node name (which is the default):
bacalhau serve
To use the hostname as the node name:
bacalhau serve --name-provider hostname
This new feature is aimed at enhancing user customization and control, making Bacalhau even more user-friendly and adaptable to different user needs.
Improved Telemetry and Metrics
Bacalhau Telemetry Suite
In this update we have introduced a docker-compose based telemetry suite complete with open-telemetry, Prometheus, Grafana, and Jaeger containers for collecting and inspecting telemetry data emitted from bacalhau nodes. For details on running the suite see the respective README.md (https://github.com/bacalhau-project/bacalhau/blob/main/ops/metrics/README.md)
Improved Visibility via New Metrics
In this update we have added new metrics to improve the observability of bacalhau nodes. These metrics include:
job_submitted
: Number of jobs submitted to the Bacalhau node.job_publish_duration_milliseconds
: Duration of publishing a job on the compute node in milliseconds.job_storage_upload_duration_milliseconds
: Duration of uploading job storage input on the compute node in milliseconds.job_storage_prepare_duration_milliseconds
: Duration of preparing job storage input on the compute node in milliseconds.job_storage_cleanup_duration_milliseconds
: Duration of job storage input cleanup on the compute node in milliseconds.job_duration_milliseconds
: Duration of a job on the compute node in milliseconds.docker_active_executions
: Number of active docker executions on the compute node.wasm_active_executions
: Number of active WASM executions on the compute node.bacalhau_node_info
: A static metric with labels describing the bacalhau node.node_id
: ID of bacalhau node emitting metricnode_network_transport
: bacalhau node network transport type (libp2p or NATs)node_is_compute
: true if the node is accepting compute jobsnode_is_requester
: true if the node is serving as a requester nodenode_engines
: list of engines the node supports.node_publishers
: list of publishers the node supports.node_storages
: list of storages the node supports
Improved Out of Memory Handling for Docker Jobs
The Bacalhau CLI will now explain when Docker jobs run out of memory and include links to the Bacalhau documentation showing how to increase the memory limit for a job.
Improved Configuration for IPFS
In this update, we have allowed for the embedded IPFS nodes gateway, API, and swarm listening multi-addresses to be configured, providing users with more control and determinism, particularly when configuring firewall rules.
This update also introduces changes when the --ipfs-serve-path
flag is set, now preserving the content of the embedded IPFS nodes repo across Bacalhau restarts, maintaining any data the embedded IPFS node stored as well as its identity.
Furthermore, we've added a new flag, --ipfs-profile
, to configure the embedded IPFS nodes configuration profile.
New Experimental Features
Custom Job Types
We're delighted to announce the introduction of a new exec
command, designed to facilitate the submission of custom job types. This marks a significant milestone in our push for greater customization and flexibility. In this initial version, we are offering support for both duckdb
and python
job types.
This command allows users to run simpler higher-level commands that will be familiar from the command line but without worrying about how Bacalhau safely orchestrates and executes their code.
bacalhau exec python -p local -- -c 'print("hello world!")'
To accommodate job types that require code to run, we have introduced a --code
flag. This allows users to specify either a single file or a folder containing multiple files. If the total size of the files does not exceed 10Mb, the contents will be added as an inline storage to the job specification.
Up Next
For the next release, Bacalhau's development will focus on delivering a superior user experience. We'll improve job visibility, error reporting, and progress updates. Look for a fully managed orchestrator with multi-tenancy support and a web UI, easier network bootstrapping, and availability in major cloud marketplaces. We're also committed to unwavering reliability, including better job orchestration, failed job detection, compute node failover, and a seamless "fire and forget" experience.
Have something else in mind? Let us know!
If you’re interested in learning more about distributed computing and how it can benefit your work, there are several ways to connect with us. Visit our website, sign up to our bi-weekly office hour, join our Slack or send us a message.
How to Get Involved
We're looking for help in various areas. If you're interested in helping, there are several ways to contribute. Please reach out to us at any of the following locations.
Commercial Support
While Bacalhau is open-source software, the Bacalhau binaries go through the security, verification, and signing build process lovingly crafted by Expanso. You can read more about the difference between open source Bacalhau and commercially supported Bacalhau in our FAQ. If you would like to use our pre-built binaries and receive commercial support, please contact us!