Introduction
Great news for Bacalhau users! We're excited to announce a new feature that streamlines your workflows S3 Publisher Documentation : direct downloads of S3-published results using the bacalhau get <job_id>
command. Say goodbye to complex workarounds and embrace simplicity!
Understanding the New Feature
Before this update, retrieving results from S3 was a bit tricky. Users would encounter the message “No supported downloader found for the published results. You will have to download the results differently.” leading to additional, and sometimes confusing steps.
Now, Bacalhau makes retrieval a breeze. Leveraging pre-signed URLs, which remain valid for 30 minutes, you can now securely and directly download from S3 using the bacalhau get <job_id> command.
Quick Start Guide
Setting the Stage
Activate Your Network: Need help? Check our Quick Start Guide.
AWS Credentials: Verify that your Bacalhau nodes have the necessary AWS credentials configured. See S3 Publisher Specification for guidance.
Publishing to S3
Now, let’s set up a job to publish to S3. Here's an example job.yaml that echos 'hello world' to an S3 bucket
Name: Docker Job with S3
Type: batch
Count: 1
Tasks:
- Name: main
Engine:
Type: docker
Params:
Image: ubuntu:latest
Entrypoint:
- /bin/bash
Parameters:
- -c
- echo hello world
Publisher:
Type: s3
Params:
Bucket: <Bucket>
Key: published-result-{jobID}
Region: <Region>
Compress: true
Remember to replace <Bucket>
and <Region>
with your S3 bucket info. Run your job with:
bacalhau job run job.yaml
Expect a response like:
Job successfully submitted. Job ID: j-8ef62f9b-ca13-442e-9056-dc8ffbb7f778
...
Downloading Results
To download your results:
# Replace with your job ID
bacalhau get j-8ef62f9b-ca13-442e-9056-dc8ffbb7f778
Your results will be decompressed and neatly organized:
→ tree job-j-8ef62f9b
job-j-8ef62f9b
├── exitCode
├── stderr
└── stdout
View the output:
→ cat job-j-8ef62f9b/stdout
hello world
What You Need to Know
Requirements
To make this work, the requester node needs s3.GetObject
permission to generate pre-signed URLs so that clients can download the published results. Here's the necessary policy:
{
"Version": "2012-10-17",
"Statement": [
{
"Effect": "Allow",
"Action": [
"s3:GetObject"
],
"Resource": "arn:aws:s3:::BUCKET_NAME/*"
}
]
}
For more information on IAM policies specific to Amazon S3 buckets and users, please refer to the AWS documentation on Using IAM Policies with Amazon S3.
Limitations
This feature currently works only with compressed results (Compress: true
).
Conclusion
Bacalhau's new feature is a game-changer for users leveraging S3 for result storage. Try it out, and let the simplicity amaze you!
5 Days of Bacalhau 1.2 Blog Series
If you’re interested in exploring our other 1.2 features more in more detail, check back tomorrow for our next 5 Days of Bacalhau blog post.
Day 1 - Job Templates
Day 2 - Streamlined Node Bootstrap
Day 5 - Instrumenting WebAssembly: Enhanced Telemetry with Dylibso Observe SDK
How to Get Involved
We're looking for help in several areas. If you're interested in helping, there are several ways to contribute. Please reach out to us at any of the following locations.
Commercial Support
While Bacalhau is open-source software, the Bacalhau binaries go through the security, verification, and signing build process lovingly crafted by Expanso. You can read more about the difference between open source Bacalhau and commercially supported Bacalhau in our FAQ. If you would like to use our pre-built binaries and receive commercial support, please contact us!
Why does my job publishing to s3 storage keep failing when I execute it in a private network?
```
# bacalhau job run batch_job.yaml
Job successfully submitted. Job ID: j-fd557eaf-6561-4419-bd4c-e022114bcaa1
Checking job status... (Enter Ctrl+C to exit at any time, your job will continue running):
Communicating with the network ................ err ❌ 0.0s
Error submitting job:
Job Results By Node:
To get more details about the run, execute:
bacalhau job describe j-fd557eaf-6561-4419-bd4c-e022114bcaa1
To get more details about the run executions, execute:
bacalhau job executions j-fd557eaf-6561-4419-bd4c-e022114bcaa1
```
Is there a support of custom endpoint for s3?