Deploying ML models to the edge with Lightning Flash

Lightning Flash provides an easy way to prototype, finetune and serve models, combined with Synpse, we can deploy them to the edge

Published on September 23, 21

Synpse is an end-to-end platform to manage your device fleet that can grow to hundreds of thousands of devices, perform OTA software updates, collect metrics, logs, deploy your containerized applications and facilitate tunnel-based SSH access to any of your device. You can find a Quick Start here .


High level
High level

In this tutorial we will package and deploy a simple model that exposes an HTTP API and serves predictions to device managed by Synpse.

What is Lightning Flash?

Lightning Flash GitHub
Lightning Flash GitHub

Flash is a high-level deep learning framework for fast prototyping, baselining, finetuning and solving deep learning problems. It features a set of tasks for you to use for inference and finetuning out of the box, and an easy to implement API to customize every step of the process for full flexibility.

Flash is built for beginners with a simple API that requires very little deep learning background, and for data scientists, Kagglers, applied ML practitioners and deep learning researchers that want a quick way to get a deep learning baseline with advanced features PyTorch Lightning offers.

You can view PyTorch Lightning’s quick start here: https://lightning-flash.readthedocs.io/en/latest/quickstart.html.

Prerequisites

Building the model

Lightning Flash public repository has plenty of examples here: https://github.com/PyTorchLightning/lightning-flash/tree/master/flash_examples/serve. I decided to go with the image_classification as it was important to me to have some kind of service that could differentiate between ants and bees. You can read more about the model in the image classification section.

Repository with the code can be found here: https://github.com/synpse-hq/synpse-lightning-flash-example.

Step 1: Clone example repo

1
git clone https://github.com/synpse-hq/synpse-lightning-flash-example.git

Running server locally

What surprised me a lot was how easy it is to start serving with Flash. Open the image_classifier.py file in your favorite editor:

1
2
3
4
5
6
from flash.image import ImageClassifier

# Our downloaded weights
model = ImageClassifier.load_from_checkpoint("./image_classification_model.pt")
# Binding to all interfaces (we will need that so it works in Docker container)
model.serve(host="0.0.0.0")

Now, to start it, we will need to install several dependencies that will help with image classification and serving:

1
pip install -r requirements.txt

To start the server locally:

1
python image_classifier.py

Step 2: Trying out with the client

While the server is running, use the client to make HTTP requests:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
import base64
from pathlib import Path

import requests

import flash

with (Path("./assets") / "ant.jpg").open("rb") as f:
    imgstr = base64.b64encode(f.read()).decode("UTF-8")

body = {"session": "UUID", "payload": {"inputs": {"data": imgstr}}}
resp = requests.post("http://127.0.0.1:8000/predict", json=body)
print(resp.json())

To run it:

python client.py
{'session': 'UUID', 'result': {'outputs': 'ants'}}

I have added both bee.jpg and ant.jpg files to the assets/ directory so feel free to try both :)

Step 3: Building and publishing Docker images with models

We can simply embed the code with data into a Docker image. You can build your own with:

docker build -t <your docker username>/synpse-lighning-flash:latest -f Dockerfile .
docker push <your docker username>/synpse-lighning-flash:latest

Or you can just use mine that I have built and published: karolisr/synpse-lighning-flash:latest.

Step 4: Preparing device

For several years now I have been using a combination of RaspberryPis and an Intel NUC to run various background services such as Home Assistant, NodeRED, Drone, PiHole, etc. It’s a very silent machine and actually performs really well:

Intel NUC
Intel NUC

Installation instructions can be found here but the short version is:

  1. Create a project in Synpse main page
  2. Go to Devices
  3. Click on Provision Device
  4. Copy paste the command to your device

This command will figure out the device architecture, download correct binary and register your device to the service. Once registered, your device will appear in the list:

Registered devices appear in the dashboard
Registered devices appear in the dashboard

Click on the device menu or device details and add a label type: controller, you can put anything you like here but it will have to match application specification later on.

Step 5: Deploy Flash serving to the device

To deploy, it will be similar to what you have seen in Docker Compose files (if you used it before):

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
name: synpse-flash
scheduling:
  type: Conditional
  # Selecting my device. This could be targeting hundreds or thousands
  # of devices in industrial applications
  selectors:
    type: controller
spec:
  containers:
    - name: classifier
      # Docker image name
      image: karolisr/synpse-lighning-flash:latest
      # Optionally create a secret with your DockerHub password. I need this
      # as my IP pulls a lot of images. This can also be used if you are using
      # a private registry
      auth:
        username: karolisr
        fromSecret: dockerPassword
      # This time, we are exposing a custom port and not the default 8000
      ports:
        - 9950:8000

Once deployed, you can view application status and logs:

View application status and logs
View application status and logs

Step 6: Run prediction against the deployed model

One way would be to call the model on http://:9950. However, I would like to demonstrate another feature that Synpse provides - TCP tunnels between your computer and any edge device.

To open a proxy tunnel, run in one terminal:

1
synpse device proxy nuc 8000:9950

Then, you can run our client again to make predictions as if it was running on your own machine:

Predictions through a tunnel
Predictions through a tunnel

We can see predictions being logged in the Synpse dashboard too:

Flash logs
Flash logs

Next steps

Feel free to check out other features of PyTorch Flash like their new Flash Zero, it’s surprisingly easy to use (after spending some time with Tensorflow I really appreciate the simplicity).

For model training you should definitely check out grid.ai, there are some good articles out there on setting up the environment as they offer some free credits as well as sessions that have things like GPU dependencies preinstalled.