Supercharging Kubernetes Deployments with Bacalhau
How to enhance Kubernetes deployments with Bacalhau for improved connectivity, data consistency and real-time communication. Discover step-by-step integration and advanced networking capabilities
Kubernetes provides a powerful open-source platform for infrastructure orchestration and deploying containerized applications. Its ability to be deployed anywhere there are compute resources makes it an exceptionally scalable option for deploying and managing microservices where you need them.
With that in mind, there are still challenges that Kubernetes deployments don’t easily overcome:
Ensuring network connectivity across distributed services
Enabling real-time communication between components
Maintaining data consistency across nodes
And more…
As little as 10 seconds in intermittent connectivity is enough to cause issues across a Kubernetes deployment. While these issues have been tackled by the community with the addition of extensions and third party solutions to fix individual issues, they still require a great deal of manual interaction to setup, test, and maintain over time to ensure stability.
But this is where Bacalhau can help! By integrating with Kubernetes to reduce the impact of intermittent connectivity on workloads. It provides a single control plane for managing workloads across multi-zone, multi-region, multi-cloud, or on-premises environments. With Bacalhau, you can access, process, and complete computation tasks at various deployment sites, regardless of node distance, harsh environments, or spotty connectivity, thanks to its efficient networking capabilities - capabilities which can enable deployments anywhere from the middle of the ocean, to the heights of space.
In this blog, we will show you how to integrate and enhance a Kubernetes deployment with Bacalhau and its advanced networking capabilities.
Step by Step Guide
Step 1: Set Up your Bacalhau Network
You can install Bacalhau on any local or cloud machine within your infrastructure. Setting up private Bacalhau networks allows you to securely run your workloads, eliminating the risks of public nodes and distributing data outside your organization. For best practices and tips on deploying your private network, refer to our detailed blog post.
To quickly start up a multi-region Bacalhau network, we’ll use Andaime, a helpful tool for quickly generating a Bacalhau network with provisioned and connected compute instances.
With one CLI command, you can spin up AWS EC2 instances across regions, generating requester and compute nodes for your Bacalhau network.
./andaime create --target-regions "us-east-1,us-west-2" --compute-nodes 12 --orchestrator-nodes=1
For more information on installing and deploying Andaime, you can read this blog post.
Step 2: Create a Kubernetes Cluster
As we have already chosen AWS for our nodes, we will be using eksctl to build our Kubernetes clusters.
eksctl create cluster --name bacal-cluster --version 1.30 --instance-types t2.medium
Launch your cluster and ensure your Kubectl context is correctly set to interact with the cluster for executing commands.
Run the following command to check your Kubernetes version
kubectl version
Note: This walkthrough was made with Kubernetes v1.30xx. Sidecar containers within Kubernetes are feature gated from v1.29.
Step 3: Define and Deploy a Kubernetes Pod with a Bacalhau Sidecar
With your Kubernetes cluster and Bacalhau network in place, the next step is defining a Kubernetes pod that includes a Bacalhau sidecar container. This sidecar can handle logging data from your main application deployed at the same time.
Using the requester node IP you get from your Andaime run, you can now connect your Kubernetes deployment to the Bacalhau network: (Your IP address will differ)
Orchestrator nodes created with IPs: [54.91.101.203]
Create a configuration file bacalhau-deploy.yaml
and copy in the YAML below. This defines a single pod deployment with two containers: an Alpine instance that would serve as your main application and the Bacalhau side container.
apiVersion: apps/v1
kind: Deployment
metadata:
name: bacalhau-test
labels:
app: testapp
spec:
replicas: 5
selector:
matchLabels:
app: testapp
template:
metadata:
labels:
app: testapp
spec:
containers:
- name: testapp
image: alpine:latest
command: ['sh', '-c', 'while true; do echo "logging" >> /opt/logs.txt; sleep 1; done']
volumeMounts:
- name: data
mountPath: /opt
initContainers:
- name: bacalhau-side
image: ghcr.io/bacalhau-project/bacalhau:v1.4.1-rc5
restartPolicy: Always
command: ['bacalhau']
args: ["serve", "--node-type=compute", “--labels=‘environment=kubernetes’”, "--orchestrators=nats://54.91.101.203:4222"]
volumeMounts:
- name: data
mountPath: /opt
volumes:
- name: data
emptyDir: {}
Deploy the YAML file within your cluster to create your configured pod:
kubectl apply -f bacalhau-deploy.yaml
After deploying the pod, verify it is up and running with no errors that could hinder your Bacalhau job deployment.
kubectl get pods
kubectl logs <pod-name> –all-containers
And you should see something like the following:
NAME READY STATUS RESTARTS AGE
bacalhau-test-54b7887ccf-9szkv 2/2 Running 0 16s
…
Step 4: Verify Bacalhau Connection
Check with the requester node to ensure that the Kubernetes compute node is recognized and integrated into the Bacalhau network.
Use these Bacalhau commands to connect to your network and verify the status of your compute nodes. It would showcase all nodes within your hybrid Bacalhau network including your ec2 instances nodes and your Kubernetes nodes.
bacalhau config set node.clientapi.host "54.91.101.203"
bacalhau node list
Step 5: Deploy Jobs to Kubernetes Cluster
If all has gone well, you should be apple to confirm your Kubernetes compute nodes have become part of the Bacalhau network, and you can deploy jobs (across all instances or to a specific instance) querying directly into the Kubernetes cluster. This can be unified log analysis across your infrastructure, large scale data transformation or even complex machine learning workflows. Bacalhau offers custom job types enabling you to create wholesale python workflows and execute them on your nodes and edge devices.
Here’s a quick example summing prime numbers to a number derived from each instance’s PID that can be run on all your Bacalhau nodes. Create a samplejob.py
file:
import os
def is_prime(num):
if num < 2:
return False
for i in range(2, int(num**0.5) + 1):
if num % i == 0:
return False
return True
unique_limit = os.getpid() % 100 + 1 # Limit based on the process ID
sum_primes = sum(i for i in range(2, unique_limit + 1) if is_prime(i))
print(f"Instance with PID {os.getpid()}: Sum of primes up to {unique_limit} = {sum_primes}")
Use the Bacalhau CLI to submit the job to your network. You can also utilize your node labels to target specific compute nodes.
bacalhau exec python --code samplejob.py --publisher local
With this, you will get the following output detailing your job execution, with helpful prompts on getting more information around your jobs.
Job successfully submitted. Job ID: j-d3a7e2cd-653d-4ce1-90fe-f2befaddb973
Checking job status... (Enter Ctrl+C to exit at any time, your job will continue running):
Communicating with the network ................ done ✅ 0.0s
Creating job for submission ................ done ✅ 0.5s
Job in progress ................ done ✅ 8.0s
To get more details about the run, execute:
bacalhau job describe j-d3a7e2cd-653d-4ce1-90fe-f2befaddb973
To get more details about the run executions, execute:
bacalhau job executions j-d3a7e2cd-653d-4ce1-90fe-f2befaddb973
To download the results, execute:
bacalhau job get j-d3a7e2cd-653d-4ce1-90fe-f2befaddb973
When you input the given prompt, the generated output will resemble a job description as follows:
ID = j-d3a7e2cd-653d-4ce1-90fe-f2befaddb973
Name = Python
Namespace = default
Type = batch
State = Completed
Count = 1
Created Time = 2024-07-31 14:32:37
Modified Time = 2024-07-31 14:32:46
Version = 0
Summary
Completed = 1
Job History
TIME REV. STATE TOPIC EVENT
2024-07-31 14:32:37 1 Pending Submission Job submitted
…
…
Standard Output
Instance with PID 10940: Sum of primes up to 41 = 238
Conclusion
By following these steps, you can effectively integrate Bacalhau into your Kubernetes cluster and manage data processing tasks across your hybrid environments efficiently. With this setup, you can leverage the power and scalability of Kubernetes with the reliability of Bacalhau, ensuring stable, scalable and secure data processing across your distributed infrastructure.
Get Involved with Bacalhau!
We welcome your involvement in Bacalhau. There are many ways to contribute, and we’d love to hear from you. Please reach out to us at any of the following locations.
Commercial Support
While Bacalhau is open-source software, the Bacalhau binaries go through the security, verification, and signing build process lovingly crafted by Expanso. You can read more about the difference between open-source Bacalhau and commercially supported Bacalhau in our FAQ. If you would like to use our pre-built binaries and receive commercial support, please contact us!