Project Bacalhau

Share this post

Bacalhau Project Report - Feb 10, 2023

blog.bacalhau.org

Discover more from Project Bacalhau

Compute Over Data
Continue reading
Sign in
Project Report

Bacalhau Project Report - Feb 10, 2023

On-prem streaming demo, state of open conference, and some IPFS challenges.

Luke Marsden
Feb 21, 2023
1
Share this post

Bacalhau Project Report - Feb 10, 2023

blog.bacalhau.org
Share

A lot of the team went to the amazing State of Open conference in London this week, and came away with loads of exciting ideas for developing Bacalhau for end users. We also landed a shiny new streaming demo which we’re pretty excited to share…

Bacalhau Streaming Demo 🏠🚀

We’ve been working for a few weeks of a demo of a new streaming layer on top of the Bacalhau job scheduler. Here it is in action, being used to develop an example distributed data app: Wifi based intrusion detection!

Thanks for reading Project Bacalhau! Subscribe for free to receive new posts and support my work.

What did we just see?

For starters, we have a PoC of the new Bacalhau Streaming layer with support for ingesting from local data sources on the nodes, shown in the orange box above. Then, we also have the demo itself, which is the code which reads images from the webcam and log lines from the wireless access point, and streams them to the inference server to run inference on the images when a new connection is detected to the wireless access point. What’s the point of this?

Architecture before

The architecture of such a system before Bacalhau might be to stream every one of 100 security cameras in your building up to the cloud, but that’s expensive…

Architecture after

The architecture afterwards, moves the code to where the data is, and does local processing on the cameras themselves, in GPU nodes local to each building, and thereby saves a lot of ingress costs!

Specific details

We see here the details of how the three jobs are deployed to the three labelled Bacalhau nodes, two of which have the webcam and the access point data sources on, and the third of which has the powerful GPU node.

How does it work?

The Bacalhau job specs are shown below:

There’s a new LocalDirectory storage source which allows you to mount a local directory directly into a job. There’s then long-running jobs on each of the source nodes which read data as it changes in those local directories (reading new JPEGs from the webcam, and tailing the logfile of the access point), and pings the new Bacalhau streaming HTTP endpoint to say `/submit` every time there’s a new event, capturing the data as a CID. Those events are then subscribed to in the inference server job, with a new storage source CIDStream, which every time an event is sent by the first two jobs, downloads the referenced CID and writes it into a new directory.

The user code itself is then simple!

All that the user code is doing is reading and writing files!

IPFS challenges 😬

We've been experiencing issues with a canary that submits a deterministic job to Bacalhau and times out when downloading ~60MB CID from IPFS. This has never been a stable canary, but the problem got really worse starting Jan 23rd that we had to disable alarming on the canary. Observations:

  • The canary either succeeds and downloads the content within 20 seconds, or just hangs and timesout eventually after our 5 minute timeout window

  • DHT showed over 2500 providers for the same CID in the network, which implies previous canaries might've broadcasted themselves as providers before going down

Actions:

  • Fixed a resource leak where the canary IPFS client stays alive even after the test is over

  • Implemented our own lite ipfs node with low power and transient repo configuration, but didn't help

  • Used IPFS peering to have sticky connections with Bacalhau IPFS nodes that have the CID

  • Changed canary to submit random jobs that generate new CID in each run to avoid routing the request to a non-available provider. This was tested but not merged

  • Reduced canary retries attempts to avoid failures making things worse and have concurrent canaries running on parallel

Pending Actions:

  • Upgrade to Kubo 1.81. They just fixed a dependency conflict and we are good to test this out

  • Deploy old Bacalhau and Canary versions in Staging to see if we have introduced a bug lately

We’re hoping to run this down as IPFS is a really key piece of technology in our stack!

What’s next? ⏩

  • Project Frog/Lilypad (FEVM integration with Bacalhau) demo

  • Progress on Filecoin Station integration

  • Start work on PoC of the brand new “Insulated jobs” – running jobs in a secure context with audited inputs/outputs

Questions/comments? Let us know!

  • Our Website

  • Our Google Group

  • Our Slack

  • Our Repo

  • Our Docs

  • Our Build Instructions

  • Our Place To Complain about Missing Features/File an Issue (and in Our Slack)

Thanks for reading!

Your Humble Bacalhau Team

Thanks for reading Project Bacalhau! Subscribe for free to receive new posts and support my work.

1
Share this post

Bacalhau Project Report - Feb 10, 2023

blog.bacalhau.org
Share
Comments
Top
New

No posts

Ready for more?

© 2023 Project Bacalhau
Privacy ∙ Terms ∙ Collection notice
Start WritingGet the app
Substack is the home for great writing