Edge-Based Inference Machine Learning

ML Inference on Videos With YOLOv5 (10 min)

Forrest

Michael Hoepler

, and

Laura Hohmann

Jan 18, 2024

Introduction

The rapid digitization of today's world has led to an exponential increase in data generation, particularly at the edge—far from centralized data centers. This data, ranging from application logs for troubleshooting software issues to extensive CCTV footage for maintaining security, is a treasure trove of insights waiting to be unlocked through Machine Learning (ML). However, the separation of the data generation locations and the ML model locations creates a unique set of challenges, especially when it comes to data transit. Data is usually at the edge, while the ML training is typically centralized.

Challenges of Centralizing ML

Machine learning models present unique challenges in production environments versus typical binary deployments.

Data Size and Transfer Cost: Having terabytes or even petabytes of data that you need to move to another location for training or inference is quite a challenge. Not only is this expensive, but it is also time-consuming. In the machine learning world, the time it takes to move data means that, on top of expenses, your models may be out of date before you even get to use them.
The Compliance Labyrinth: Compliance is no child's play. Moving data for training, especially personal or sensitive information, across different regions or systems means navigating a complex web of regulatory requirements. Depending on your industry, you could be faced with tens or hundreds of different regulations. Ensuring compliance while moving data can be a daunting task - and these are often triggered the moment data is moved.
Security: The Moving Target: Static data, or data at rest, benefits from a controlled environment, making it easier to secure. However, data in transit is like a moving target, vulnerable to interception, tampering, and unauthorized access. The challenge extends beyond securing the endpoints to ensuring the data's safety as it travels between points.
Volume vs. Value: Although a vast amount of data is collected at the edge, only a small subset may be valuable. Sifting through application logs might reveal a few anomalies or a few frames of interest amid hours of video footage. Storing all this data, when only a fraction is valuable, is not just unnecessary but also costly. The goal is to segregate and retain the essential data while letting go of the rest.
Under-Used Compute Power: Historically, ML inference required heavy-duty, centralized computational resources - often involving expensive GPUs. But times are changing. Innovations like Google's Coral Edge ML, Nvidia's Jetson, and Intel’s Neural Compute Stick are revolutionizing the field. They're not only driving down the costs of ML inference but also demonstrating robust performance on edge devices, making ML more accessible and decentralized.

Traditional Setup: Centralized ML Inference

The diagram below offers a top-level view of a typical surveillance system utilizing ML Inference to detect noteworthy events.

Here's how the process unfolds:

Video Capture: Each camera captures video, buffering it locally on the device.
Local Storage: Video data from the camera’s buffer is saved locally or to nearby storage locations.
Cloud Upload: The data is then uploaded to the cloud for further analysis.
Inference and Storage: Cloud systems perform ML inference to identify significant events. Relevant events are stored, while the rest are either discarded or moved to cold storage.

Again this scenario creates issues such as latency, storage and data transfer costs, and compliance, which we talked about above.

Solution: Edge Based ML with Bacalhau

The previous section highlighted the challenges of handling large data volumes. The solution might seem simple at first glance. Only move essential data. However, is it that straightforward? Generally speaking, this called Streamlined System Design. This describes a system with fewer moving parts is often less prone to failures. Keeping data static and processing it at its source eliminates complex data transfer protocols, enhancing system reliability.

At its core, you use the same binaries and architecture you always have, but with Bacalhau you’re able to bring your ML training where the data is. By shifting ML inference away from central hubs, it significantly cuts costs and boosts efficiency, while delivering real-time insights. This enhances the reliability and quickens response times. In the world of data, success hinges not on gathering information, but on generating valuable and swift insights. Running ML inference on the edge enhances data security, improves efficiency, streamlines system design and makes management simpler. With Bacalhau, the inference at the edge architecture looks like shown below.

The example data has a camera resolution of 4K, the Frame Rate is 24 FPS, the Number of Cameras is 100, which generates 1TB hourly, which sums up to about 9.6 Petabyte annually. For better cost calculations we used 50 virtual machines (e2-standard-8 (GCP)) as a proxy, comparable to a decent Intel NUC, each processing at a rate of 50 frames per second using the YOLOv5 model. At an hourly rate of $28.90 per instance, our approach saves more than 94% of the centralized approach with AWS, GCP or Azure.

Bacalhau's on-the-edge ML inference significantly boosts cost efficiency by optimizing bandwidth and reducing AI compute costs through selective cloud uploads. In our example, you can save on compute, accessing and AI costs. Local nodes ensure uninterrupted data processing during network disruptions and enable time-sensitive insights into your generated data. This not only represents a technological shift but also a cost-effective, reliable approach to data management, promoting trust and privacy.

Deployment of Inference ML with Bacalhau:

Here's the modified workflow:

Bacalhau Agent Installation: Every local camera node now hosts a Bacalhau Agent.
- The ML Inference, previously cloud-based, now operates at these local nodes.
- Through Bacalhau's control plane (not illustrated), each compute node is directed to oversee the video storage location. Each agent is tasked with executing the ML inference on-site.
Selective Cloud Upload: Only events pinpointed by ML Inference get uploaded.
Event-based Persistence: Only relevant events are sent to the cloud. Optionally, non-significant events might be stored off-site, on much cheaper storage.

And you can do all this with just a single command!

bacalhau docker run --target=all \
--input file:///video_dir:/videos \
docker.io/bacalhauproject/yolov5-7.0:v1.0

Please note: This can be quite large! When running this, you will need to have a server configured with allow-listed-local-paths (like in the following command):

bacalhau serve \ 
--node-type requester,compute \ 
--job-selection-data-locality anywhere \ 
--allow-listed-local-paths '/video_dir/**' \ --job-selection-accept-networked \

You can also execute this command with a job.yaml file:

Job: 
   APIVersion: V1beta2 
   Spec: 
        EngineSpec: 
              Params: 
                    Entrypoint: null 
                    EnvironmentVariables: [] 
                    Image: docker.io/bacalhauproject/yolov5-7.0:v1.0    
                    WorkingDirectory: "" 
                    Type: docker 
              Inputs: 
                    - Name: /video_dir 
                    Path: /videos 
                    StorageSource: localDirectory 
                    SourcePath: /video_dir 
              Outputs: 
                    - Name: outputs 
                    Path: /outputs

Then to run this command, you can execute:

cat job.yaml | bacalhau create

Benefits:

Minimized Latency & Bandwidth Use: Local inference facilitates real-time detection. Uploading only key footage significantly reduces bandwidth use.
Cost-Efficient Storage: Only storing 'significant' footage sidesteps the costs associated with transient cloud storage.
Network Outage Resilience: During internet outages, local nodes can still process the data, ensuring continuous surveillance and reducing data processing backlog.
Curbed Data Transfer Costs: You're billed only for the data that matters, making this a cost-effective approach.
Streamlined Regulatory Compliance: Local data processing means Personal Identifiable Information (PII) can be excluded before cloud uploads, aligning with data protection mandates.
Enhanced Consumer Trust: Keeping data local during inference ensures that only the essential data is transferred, leaving sensitive and irrelevant data untouched. This promotes trust as customers are assured that their private information remains in situ and only necessary data points are moved for further processing.

Conclusion

In edge-based ML with YOLOv5 and Bacalhau, we're bringing computation to the data, not the other way around. This cuts costs, improves security, and speeds up processing.

In our use case example, we've seen a notable reduction in costs – handling 1PB of video data annually, the shift to local processing with Bacalhau cuts cloud storage and data transfer expenses by far over 50%. Additionally, inference speeds are significantly faster, reducing from hours to minutes. This approach not only saves money but also enhances data security by limiting data movement.

Keep an eye out for part 2, coming next week.

How to Get Involved

We're looking for help in several areas. If you're interested in helping, there are several ways to contribute. Please reach out to us at any of the following locations.

Commercial Support

While Bacalhau is open-source software, the Bacalhau binaries go through the security, verification, and signing build process lovingly crafted by Expanso. You can read more about the difference between open source Bacalhau and commercially supported Bacalhau in our FAQ. If you would like to use our pre-built binaries and receive commercial support, please contact us!

A guest post by

Founding Engineer @Expanso

Bacalhau

Discussion about this post

Ready for more?