Bacalhau v1.8.0 - Day 4: Seamless Result Storage with Managed Publishers with Expanso Cloud
Your workflow just got simpler. Announcing secure, managed, and automatic result storage in Expanso Cloud.
This is part of the 5-days of Bacalhau 1.8 series! Make sure to go back to the start to catch all of them!
Day 1: Announcing Bacalhau v1.8.0: Intelligent Edge Computing Meets Enterprise Integration
Day 4: Seamless Result Storage with Managed Publishers with Expanso Cloud (this post)
Let’s be honest. When you’re focused on a complex data problem, the last thing you want to think about is infrastructure. Yet, for the longest time, a simple question has added friction to almost every distributed computing job: "Where do the results go?"
This question immediately triggers a cascade of others:
“How do I get credentials onto the compute nodes securely?”
“Am I accidentally exposing secrets?”
“How do I even find the output once the job is done?”
This dance of configuring storage, managing credentials, and tracking outputs is a tedious, error-prone distraction from the work that actually matters.
At Expanso, we believe you should focus on your code, not your cloud storage configuration. That’s why we’re thrilled to introduce a fundamental improvement to the Expanso Cloud experience: default managed publishers.
The Old Way: A Trail of Configuration and Credentials
Until now, when running jobs on Expanso Cloud, you could face common dilemmas involving:
Manual setup: Explicitly configuring S3 buckets, or any other storage solutions for every job or pipeline you are using.
Credential juggling: The risky situation of distributing AWS keys or other storage credentials to a fleet of compute nodes and clients.
Operational overhead: Managing the lifecycle, cost, and access policies of yet another piece of infrastructure.
Result scavenger hunts: A lack of a standardized approach to storage often left users digging through buckets to find their outputs.
This meant that even simple jobs required additional configuration and infrastructure planning, creating friction for users who just wanted to run their workloads and get results.
The New Way: Expanso Cloud Managed Publishers
With our latest update, Expanso Cloud now provides managed publishers by default. When you submit a job without explicitly specifying a publisher, Expanso Cloud handles the result storage automatically. No publisher configuration needed. It just works.
For example, consider the following simple data processing job: Notice what’s missing?
# Your job specification - no publisher configuration needed!
Name: data-processing
Type: batch
Tasks:
- Name: data-processing
Engine:
Type: docker
Params:
Image: python:3.9
Parameters:
- python
- -c
- |
import pandas as pd
# Your data processing logic
results = pd.DataFrame({'output': [1, 2, 3, 4, 5]})
results.to_csv('/outputs/results.csv', index=False)
ResultPaths:
- Name: outputs
Path: /outputs
# No Publisher section needed - Expanso Cloud handles it automatically!
When you submit this job to Expanso Cloud, here's what happens behind the scenes:
Automatic publisher assignment: Expanso Cloud automatically assigns its managed publisher to your job.
Secure upload: Compute nodes receive secure, time-limited upload URLs from the orchestrator, eliminating credential management concerns.
Organized storage: Results are automatically organized and stored in Expanso Cloud's managed storage infrastructure.
Seamless retrieval: You can access results through the Expanso Cloud UI or via the standard bacalhau job get command.
This architecture means our developers have eliminated the most common security pitfall: distributing long-term credentials to compute nodes. It’s security by design, not as an afterthought:
Zero credential distribution: Compute nodes never receive long-term storage credentials.
Time-limited access: Upload URLs have configurable expiration times.
Least privilege: Each compute node only receives upload permissions for its specific job results.
Audit trails: All storage operations are logged and traceable.
Getting Your Results: UI or CLI, Your Choice
Accessing your outputs is just as frictionless.
Access your job results directly through the Expanso Cloud dashboard:
Navigate to your completed job
View and download complete result archives
Share results with team members using secure, time-limited links
Results are retained for 30 days, giving you ample time to download and process your outputs.
You can also use the bacalhau job get command:
# Submit your job
bacalhau job run my-job.yaml
# Retrieve results when job completes
bacalhau job get my-job
The results are downloaded to your local machine just like any other Bacalhau job. However, you can now benefit from Expanso Cloud's managed infrastructure, handling all the storage complexity.
Don't Worry, Your Custom Setups Still Work
If you have an existing workflow with your own custom storage, nothing changes. The managed publisher only activates when no explicit publisher is specified, ensuring 100% backward compatibility. You have the freedom to choose, but the default is now effortless:
# Custom storage publisher still works
Tasks:
- Name: main
Publisher:
Type: s3
Params:
Bucket: my-custom-bucket
Key: my-results
# ... rest of job spec
The managed publisher only activates when no explicit publisher is specified, ensuring complete backward compatibility with existing workflows.
A Better Workflow for Everyone
Why did we do all this? Well, the reality is that this workflow improves everyone’s jobs:
For data scientists: Focus on your analysis. Run your Python scripts, R analyses, or Jupyter notebooks and get results back automatically.
For dev teams: Prototype and test new pipelines without the friction of provisioning and securing storage for every experiment.
For production workloads: Leverage enterprise-grade, secure, and durable storage without having to manage it yourself.
Conclusion
Expanso Cloud's managed publishers eliminate the complexity of result storage while maintaining the security and reliability your production workloads demand. Whether you're running simple data processing jobs or complex ML pipelines, your results are now just a click away.
Ready to experience frictionless distributed computing? Sign up for Expanso Cloud and see how managed publishers can simplify your workflow today.
What's Next?
Ready to experience the enhanced daemon job capabilities? Upgrade to Bacalhau 1.8 and see the difference in your distributed workloads
Get Involved!
We welcome your involvement in Bacalhau. There are many ways to contribute, and we’d love to hear from you. Reach out at any of the following locations:
Commercial Support
While Bacalhau is open-source software, the Bacalhau binaries go through the security, verification, and signing build process lovingly crafted by Expanso. Read more about the difference between open-source Bacalhau and commercially supported Bacalhau in the FAQ. If you want to use the pre-built binaries and receive commercial support, contact us or get your license on Expanso Cloud!