Building a distributed world of WebAssembly
Bacalhau’s mission is to provide the next generation of distributed compute. WebAssembly (WASM) is the secret behind our new platform. We’ve leveraged the power of IPFS to run WASM stored anywhere.
This post is based on a talk I gave at WebAssembly I/O 2023 in Barcelona.
Companies that generate large data volumes spread over a distributed architecture are realising that centralizing their data doesn’t work – it’s expensive, slow and introduces major orchestration challenges.
“At scale” also means scales of distance and time. Edge-deployed devices may have long network latencies back to a central store (scale of distance). Mars, for example, is three light-minutes away from Earth. Edge devices may also be out of contact regularly or mostly have access to only a self-contained local network shard (scale of time). Traditional centralized architectures break down at these scales.
Instead, Bacalhau is built on the understanding that distributed problems need a distributed solution. The next generation of distributed compute needs to run wherever the data is. That’s why we’re building Bacalhau to run on datacentre servers with quad-GPUs, on consumer laptops, on edge Raspberry Pis in remote locations, on satellites, and maybe one day on Mars.
Supporting such a breadth of devices used to be challening. Thankfully, using WebAssembly (WASM) answers this challenge – it is simple enough to be implemented with little overhead across a wide variety of CPUs whilst maintaining security and job isolation. Today, if you can compile your job to WebAssembly, you can run it on any node on the Bacalhau network.
Self-contained WASM is convenient but inefficent
Most language toolchains today will compile a program into a single large WASM blob containing all the code necessary to run the job. This is convenient – a user can submit such a blob to the Bacalhau network and any node can run it successfully.
The downside is that such self-contained blobs are big and inefficient. Most jobs use shared components. When self-contained blobs are transferred, much of what is sent is common code, especially when similar workloads are submitted regularly. If a job is being run on many hundreds of nodes simultaneously, the load to handle transferring the job to all of them can be overwhelming.
Sending the same code again and again adds unnecessary latency and bandwidth usage. In constrainted environments like a Raspberry Pi, a large blob might occupy most of the available RAM or disk space, meaning at most one job can be held and run at once. Making best use of bandwidth and memory is important in any setting but in constrainted environments it is crucial.
Modules make WASM more efficient
In higher-level languages, this is a problem solved by specifying a program’s code along with its common dependencies. The program can be run immediately if the user has all the dependencies or the user can download them if they are missing.
WebAssembly has a similar concept called “modules”. Each WASM program is a module and every module specifies which functions it provides and which it requires. Module imports and exports in WebAssembly are actually more advanced than most languages as memory blocks and data tables can be imported or exported as well as just functions.
Using modules, the user can specify a much smaller WASM blob which lists the required functions. The remote Bacalhau nodes can link in the required WASM modules if they have them or download them if they are missing. Much less data is transferred and multiple small jobs can be held in memory at once.
However, there is a missing peice of the puzzle: where do nodes download their modules from? Unlike in other languages where there are de facto centralised registries, there is no such thing (yet) for WASM modules.
We share WASM modules using distributed tech
So, is the Bacalhau team sponsoring the formation of a central WASM module repository? In short, no. Distributed problems need distributed solutions.
Central repositories are very convenient, but they don’t take advantage of data locality and so introduce the same bandwidth issues discussed earlier. They also place reliability of module delivery outside of the user’s control and provide mechanisms for modules to that are relied on to be yanked.
Instead, Bacalhau nodes use distributed data technology IPFS to retrieve WASM modules using a peer-to-peer protocol. This allows modules to be shared from the nearest nodes, whether they exist in the same data center or a few hops away on a mesh network. As this is independent of any central registry, it also works well for private deployments and ensures that commonly-used modules can be retained for as long as needed.
All the information needed to identify and retrieve modules can be embeded in the module itself. If a CID or URL is given as the import namespace, Bacalhau will automatically retrieve the module and resolve any new dependencies recursively. For the user, efficient modules are as easy to use as just submitting a well-formatted WASM blob.
(module
(import
"QmSyVCMvGauE35qfZVfuHEmDtbYvSom2v5r5c85SdF1LoB"
"do_stuff"
(func $do_stuff (type 3)))
...)
You can generate imports like this by using language-specific flags. In Rust, using the wasm_import_module
parameter generates the above code:
#[link(wasm_import_module="QmSyVCMvGauE35qfZVfuHEmDtbYvSom2v5r5c85SdF1LoBd")]
extern "C" {
pub fn do_stuff() -> i32;
}
Intelligent scheduling of WASM jobs on Bacalhau
Understanding what WASM modules a job requires also allows Bacalhau to use data locality features to intelligently orchestrate the job. For example, Bacalhau can send WASM jobs to nodes that already have required modules, avoiding the need to download dependencies and speeding up job execution.
This also allows nodes to make use of proprietary modules. Service providers who make functionality available via a WASM module can advertise this module from their node, and Bacalhau will only schedule jobs that need the module to these provider-owned nodes.
Providers could, for example, make a machine-learning model available via WebAssembly and allow users to perform arbitrary interaction with the model in a way that is richer and quicker than a web API. Their model is still secure and private as the user’s execution is contained with the WebAssembly runtime. What’s more, load balancing and fault tolerance is provided automatically by the Bacalhau network.
A bright future for WebAssembly on Bacalhau
WebAssembly’s ability to be both secure and portable has already unlocked a lot of new compute environments for Bacalhau. We think peer-to-peer module provision and intelligent scheduling are two powerful features that make running WebAssmebly even simpler, quicker and easier than ever before. We’ve very excited to see what our partners are able to achieve with WebAssembly!