Bacalhau Project Report – March 23, 2023
Log streaming for docker (wasm next), Amplify deployed, retries, private cluster improvements, planning & dashboard.
Some great progress this week in terms of feature work and robustness improvements in advance of our upcoming 1.0 release in May.
Log streaming for Docker 📝
As you can see in this very nice demo, you can now stream live logs from a Bacalhau job! For example, you can run:
$ bacalhau docker run -f myimage
# or
$ bacalhau logs <jobid>
$ bacalhau logs -f <jobid>
So, you can stream logs directly while running `bacalhau docker run`, you can fetch logs so far from a job, or you can connect to the live log stream of a “detached” job.
This is cool because it means that you can run long-running jobs on Bacalhau, for example training an AI model, and see the progress that is streamed from that job while it’s running.
It works by extending the bprotocol, which is our custom libp2p protocol that runs between requester nodes and compute nodes, to support giving the requester node a custom multiaddress to connect to in order to fetch a log stream. The requester then muxes between libp2p and the clients’ websocket connections in order to get the log streams all the way back to the CLI. Nice!
Amplify deployed 🔊
The Amplify API is now deployed at http://amplify.bacalhau.org/api/v0!
As a reminder, Amplify is a service to augment data in IPFS and Filecoin CIDs (content addressible IDs) with automatic analysis of the data, and based on the content type (image, video, audio, tabular data, text, etc) will run conditional follow-on jobs to automatically transcode and interpret that data in interesting ways. This will be a value-add service for anyone using the IPFS and Filecoin networks - the idea is that any new data uploaded will automatically be analysed (subject to being able to scale this efficiently!) It will also be a useful service for folks to run in on-prem Bacalhau clusters, to automatically categorize and augment data flowing through private instances of the system.
The way Amplify is implemented is as a lightweight DAG system with a queue, where each step in the DAG is a Bacalhau job, and data flows between jobs automatically, with the ability to track lineage of all the jobs automatically, as shown in the first section of this video demo:
Retries 🔁
We now retry failed executions on other nodes in the network, including failures to execute the job, verification failures, failure to publish the results, and having the ask for bid rejected by the initially selected nodes.
This is a huge improvement for network performance from users’ perspectives. Previously if a job couldn’t be scheduled, it would fail, but now the requester node will try harder to get a job to execute successfully.
The retry strategy is pluggable, and multiple implementations can be chained. Currently we only have a single strategy that always retries as long as there are available nodes in the network to retry on. More strategies can be implemented in the future, specially after we have better breakdown if execution errors, and be able to differentiate between bad jobs and retriable node failures.
Private cluster improvements 🤫
We’ve made a few more improvements to make it super easy to set up a Bacalhau cluster inside your own network. We’ve simplified the setup for a private cluster from:
bacalhau serve --private-internal-ipfs --node-type compute,requester --peer none
To, wait for it:
bacalhau serve
We’ve also made sure that the “private internal IPFS” mode, which makes Bacalhau run an IPFS daemon in-process, without you needing to run one separately, now persists its identity and state, so that when you restart bacalhau or reboot your servers, you don’t lose results CIDs.
What’s more, bacalhau serve
also helpfully prints out exactly the command you need to run on other nodes in your network to have them use the first node as a “meeting point” to bootstrap both the bprotocol and IPFS mesh networks over libp2p. Good UX FTW! 🚀
Planning is good and useful 🤔
We’ve done a ton of planning and getting organized on GitHub. Just look at our Starmaps! As we continue to develop in the open, you can see all the items we have planned in advance of the May 9 launch, and our progress towards them here:
We are also doing a similar planning exercise for an exciting new Web3-focused stream of work, watch this space!
Dashboard 📈
We now have a 🔥 new dashboard showing live stats on how many jobs are run on the Bacalhau network and also a breakdown of what kind of jobs they are.
What’s next ⏩
Log streaming for WASM jobs as well
Insulated Jobs demo
Real jobs in Amplify
Progress on Octostore
Questions/comments? Let us know!
Thanks for reading!
Your Humble Bacalhau Team