Project Bacalhau

Share this post

Bacalhau Project Report – March 6, 2023

blog.bacalhau.org

Discover more from Project Bacalhau

Compute Over Data
Continue reading
Sign in
Project Report

Bacalhau Project Report – March 6, 2023

Multi-arch! Removing sharding, improving test reliability, and wasm cancellation.

Luke Marsden
Mar 7, 2023
Share this post

Bacalhau Project Report – March 6, 2023

blog.bacalhau.org
Share

More heavy lifting behind the scenes this week, with a ton of improvements to test reliability. Also some feature work landed!

Multi-arch support 🏹

WASM jobs can run anywhere, but Docker jobs can’t necessarily.

If a user has only pushed an ARM Docker image, which is increasingly common now that all the cool kids have M1/M2/M3 Macs (personally I went the other way and got a ThinkPad with a SIM slot so now you can point and laugh at how 2023 is still not the year of Linux on the desktop, but anyway, I digress), then previously Bacalhau would still try to run it on an x86 compute node, but crap out with an obscure error.

Now we have multi-arch support, it means each compute node also broadcasts which architectures it supports, and docker jobs will only be run on nodes that can actually run it!

⭐️ Star us on Github

Removing sharding 💥

This PR removed more code than it added

Bacalhau previously had a feature where you could give a job a CID with lots of files in it, and specify a “glob pattern” which would distribute the files from that CID across multiple executions, as a way to distribute the work across multiple nodes.

It was a neat feature, but no one ever used it. The vast majority of the jobs that run on our network are just single job executions. What’s more, having support for sharding jobs throughout the codebase made the code quite a lot more complicated. We had multiple layers: jobs —> generate many shards, shards —> generate many executions (based on concurrency). As we prepare for a lean, mean & reliable 1.0 release, we decided to strip out this complexity and eventually move the sharding feature up to a higher level once we start seeing user demand for it. This is nice because it means we can have a low level which is relatively simple, like “pods” in Kubernetes, and build more complex systems on top, in the scheduler code.

This has also allowed us to close a bunch of TODO items related to sharding, and it’s made the test suite more reliable too. As a former colleague of mine used to say, code is a liability. The less of it we have — while still delivering tremendous value to users — the better 😅

⭐️ Star us on Github

WASM cancellation 🚫

You Get GIF - You Get Cancelled GIFs
Yes, again (meme still applies)

Last week we added support for cancelling Docker jobs, and by good fortune this week we upgraded our wazero dependency which added new support for cancelling the context for WASM jobs (which run in-process), which means we could extend our support for job cancellation to WASM jobs as well. Nice!

What’s next? ⏩

  • Docs for Stable Airflow Operator

  • Streaming logs

  • Further Station integration

  • Insulated Jobs demo

  • Something very cool for the FVM launch 🤫

    ⭐️ Star us on Github

Questions/comments? Let us know!

  • Our Website

  • Our Google Group

  • Our Slack

  • Our Repo

  • Our Docs

  • Our Build Instructions

  • Our Place To Complain about Missing Features/File an Issue (and in Our Slack)

Thanks for reading!

Your Humble Bacalhau Team

Share this post

Bacalhau Project Report – March 6, 2023

blog.bacalhau.org
Share
Comments
Top
New

No posts

Ready for more?

© 2023 Project Bacalhau
Privacy ∙ Terms ∙ Collection notice
Start WritingGet the app
Substack is the home for great writing