Introducing New Job Types - New Horizons with Bacalhau 1.1.

5 Days of Bacalhau - Day 3

and

Sep 27, 2023

Bacalhau has become a trusted tool for many organizations to manage batch jobs across a wide-reaching network. It's our simplicity, scalability, and resilient network design that has driven our growth and earned us trust.

We are always on the lookout for ways to improve, listen to user feedback, and swiftly adapt to emerging needs. One thing that we have increasingly heard from users is the need for flexibility for new job types. If this sounds like something you needed (and there were a lot of you), read on for more!

Understanding the Need

While our batch job model has been valuable to many, we've identified scenarios where it falls short, especially in terms of cold starts and bootstrapping durations. For example, if a job had a slow start-up but required swift response times, batch was often not the right fit.

Consider a large ML model running in a container. Previously, it would first pull the container, then load the (often) multi-GB model into memory, and only then respond. For in-memory tasks needing to monitor a file handle for changes, this would take far too long.

With this feedback from our community highlighted in hand, we got to work to improve things!

Bacalhau 1.1: What's New

We're thrilled to introduce enhanced offerings in Bacalhau 1.1.
Let's dive into the four job types we've introduced:

1. Daemon Jobs:

These 'set it and forget it' jobs continuously run on every node that meets your set criteria. Should any new nodes join and fit the bill, the Daemon job will automatically start there too.

Practical Uses: Consider them your vigilant guards. Whether it's tracking data on edge devices like cameras and sensors, or aggregating logs. Daemon jobs have it covered. They can efficiently bundle data, stream updates to platforms like Kafka, or relay device results through MQTT.

2. Service Jobs:

Stable and consistent, Service jobs run continuously on nodes that align with your defined criteria. Bacalhau's smart orchestration selects the most suitable nodes for peak performance. Plus, we always have an eye on them to ensure they're functioning as intended and can reallocate them across nodes when necessary.

Practical Uses: Think of them as your round-the-clock workers, ideal for tasks like tapping into streaming feeds, pulling data from queues, or monitoring live events.

3. Batch Jobs

The good old batch jobs remain a staple. Triggered upon request, they run on chosen Bacalhau nodes, wrapping up after completing their designated task or upon hitting a timeout.

Practical Uses: Consider them your data deep-divers. For in-depth analysis or heavy computations on large datasets, batch jobs excel. They cut out continuous processing, focusing on targeted, intensive tasks.

4. Ops Jobs

Think of Ops jobs as supercharged batch jobs. They run on every node that matches your job criteria, offering a wider scope.

Practical Uses: Need instant access to logs across hosts? Or looking to deploy an update across multiple machines at once? Ops jobs come in handy for quick checks, instantaneous log retrievals, and bulk configuration updates. No SSH needed!

Using the New Jobs

With Bacalhau 1.1, we've rolled out fresh APIs and CLI tools to streamline your experience with these job types. Dive into our guide for detailed instructions. Here are some illustrative examples:

1. Deploy Daemon job

Let's say you want to deploy a logging agent across all cameras in the us-east-1 region, processing logs found at /var/log/myapp.log

# job Spec: log-processor.yaml
Name: Log Processor
Type: daemon
Tasks:
  - Name: Main Task
    Engine:
      Type: docker
      Params:
        Image: my-awesome-log-agent:latest
        Parameters:
          - -log-file
          - /input_log
Constraints:
  - Key: type
    Operator: '='
    Values:
      - camera
  - Key: region
    Operator: '='
    Values:
      - us-east-1
InputSources:
  - Source:
      Type: localDirectory
      Params:
        SourcePath: /var/log/myapp.log
    Target: /input_log

# Run the Daemon Job
bacalhau job run log-processor.yaml

2. Deploy Service job

Suppose you need to deploy a consumer for Kinesis Data Streams. This job would be assigned to any three nodes in us-west-2 that have an ARM processor and at least 4GB of available memory.

# job Spec: kinesis-consumer.yaml
Name: Kinesis Consumer
Type: service
Count: 3
Tasks:
  - Name: Main Task
    Engine:
      Type: docker
      Params:
        Image: my-kinesis-consumer:latest
        Parameters:
          - -stream-arn
          - arn:aws:kinesis:us-west-2:123456789012:stream/my-kinesis-stream
          - -shard-iterator
          - TRIM_HORIZON
Resources:
  Memory: 4gb
Constraints:
  - Key: Architecture
    Operator: '='
    Values:
      - arm64
  - Key: region
    Operator: '='
    Values:
      - us-west-2

# Run the Service Job
bacalhau job run kinesis-consumer.yaml

3. Execute Batch Job

Suppose you want to execute an image processor batch job on space images stored in S3. This would be executed on any node tagged 'goku', equipped with GPU support, and deployed on either Dublin or London

# Job Spec: image-processor.yaml
Name: My Image Processor
Type: batch
Count: 1
Tasks:
  - Name: Main Task
    Engine:
      Type: docker
      Params:
        Image: my-image-processor:latest
Resources:
  GPU: 1
Constraints:
  - Key: tag
    Operator: '='
    Values:
      - goku
  - Key: region
    Operator: 'in'
    Values:
      - eu-west-1
      - eu-west-2
InputSources:
  - Source:
      Type: s3
      Params:
        Bucket: sample-datasets
        Key: images/space/*
        Region: eu-west-1
    Target: /my/images

# Run the Batch Job
bacalhau job run image-processor.yaml

4. Execute Ops Job

Imagine you want to monitor for real-time failed login attempts on all your web servers located in Dublin, and you need this info before the logs are transferred elsewhere.

# Job Spec: failed-logins-counter.yaml
Name: Failed Logins
Type: ops
Tasks:
  - Name: Main Task
    Engine:
      Type: docker
      Params:
        Image: failed-login-counter:latest
Constraints:
  - Key: type
    Operator: '='
    Values:
      - web-server
  - Key: region
    Operator: 'in'
    Values:
      - eu-west-1
InputSources:
  - Source:
      Type: localDirectory
      Params:
        SourcePath: /var/log/auth.log
    Target: /auth.log

# Run the Ops Job
bacalhau job run failed-login-counter.yaml

You can also run the same ops job from the CLI using:

bacalhau docker run \
	--input file:///var/log/auth.log:/auth.log \
	--selector "type=web-server,region=eu-west-1" \
	--target all
	failed-login-counter:latest

Looking Ahead

As Bacalhau grows, the challenges and feedback from our users constantly inspire us. These new job types showcase our commitment to adapt and evolve.

While this post highlights our new job offerings, there's more under the hood. We've implemented major architectural shifts in Bacalhau to enable these enhancements. Stay tuned for a deep dive into those changes in our upcoming blog posts.

Be Part of the Evolution

We invite our community to experience the enhanced flexibility of Bacalhau 1.1. Dive into our updated documentation, explore the new job types, and let us know your feedback. Collectively, we're redefining the boundaries of distributed compute frameworks.

Your journey with Bacalhau is just beginning, and the horizon has never looked brighter. The Bacalhau project is available today as Open Source Software.The public GitHub repo can be found here. If you like the project, please give us a star ⭐ 🙂

We're looking for help in several areas. If you're interested in helping out, there are several ways to contribute and you can always reach out to us via Slack or Email.

For more information, see:

⭐ Star on Github

Join Our Slack