Edge AI Anomaly Detection¶
Overview¶
This document contains the code and the instructions for our EclipseCON 2022 Talk: "How to Train Your Dragon and Its Friends: AI on the Edge with Eclipse Kura™"
This notebook can also be viewed and ran on Google Colab.
In this example scenario we will collect the data provided by a Raspberry Pi Sense HAT using Eclipse Kura™ and upload them to a Eclipse Kapua™ instance. We will then download this data and train an AI-based anomaly detector using TensorFlow. Finally we will deploy the trained anomaly detector model leveraging Nvidia Triton™ Inference Server and Eclipse Kura™ integration.
We'll subdivide this example scenario in three main sections:
- Data collection: in this section we'll discuss how to retrieve training data from the field leveraging Eclipse Kura™ and Eclipse Kapua™
- Model building and training: we'll further divide this section in three subsections:
- Data processing: where we'll show how to explore our training data and manipulate them to make them suitable for training (feature selection, scaling and dataset splitting). This will provide us with the "Preprocessing" stage of the resulting AI data-processing pipeline
- Model training: where we'll discuss how we can create a simple Autoencoder in Tensorflow Keras and how to train it. This will provide us with the "Inference" stage of the AI pipeline
- Model evaluation: where we'll cover how can we extract the high level data from the model output and ensure the model was trained correctly. This will provide us with the "Postprocessing" stage of the AI pipeline
- Model deployment: finally we will convert the model to make it suitable for running on Eclipse Kura™ and Nvidia Triton™ and deploy it on the edge.
Data collection¶
Overview¶
In this setup we'll leverage Eclipe Kura™ and Kapua™ for retrieving data from a Raspberry Pi Sense HAT and upload them to the cloud.
The Sense HAT is an add-on board for Raspberry Pi which provides an 8×8 RGB LED matrix, a five-button joystick and includes the following sensors:
- Gyroscope
- Accelerometer
- Magnetometer
- Temperature
- Barometric pressure
- Humidity
Kura™ installation¶
Requirement: A Raspberry Pi 3/4 running the latest version of Raspberry Pi OS 64 bit.
To make everything work on the Raspberry Pi we need to use the develop
version of the raspberry-pi-ubuntu-20-nn
Kura installer (yes, I know we're installing the Ubuntu package on the Raspberry Pi OS but bear with me...) . You can do so by downloading the repo and building locally or by downloading a pre-built installer from the Kura CI artifacts.
Copy the resulting file kura_<version>_raspberry-pi-ubuntu-20_installer-nn.deb
on the target device.
On the target device run the following commands:
sudo apt-get install -y wget apt-transport-https gnupg
sudo wget -O - https://packages.adoptium.net/artifactory/api/gpg/key/public | sudo apt-key add -
sudo echo "deb https://packages.adoptium.net/artifactory/deb $(awk -F= '/^VERSION_CODENAME/{print$2}' /etc/os-release) main" | sudo tee /etc/apt/sources.list.d/adoptium.list
sudo apt-get update && sudo apt-get install temurin-8-jdk chrony
Finally install Kura with:
sudo apt install ./kura_<version>_raspberry-pi-ubuntu-20_installer-nn.deb
Cloud connection¶
After setting up an Eclipse Kura™ instance on the Raspberry Pi we'll need to connect it to an Eclipse Kapua™ instance.
An excellent tutorial on how to deploy a Kapua™ instance using Docker is available in the official repository. For the purpose of this tutorial we'll assume a Kapua™ instance is already running and is available for connection from Kura™
After setting up the Kapua™ instance you can refer to the official Kura™ documentation for connecting the Raspberry Pi to the Kapua™ instance. For the remaining of this tutorial we'll assume a connection with the Kapua™ was correctly established.
Data publisher¶
To publish the collected data on the Cloud we'll need to create a new Cloud Publisher through the Kura™ web interface. Go to "Cloud Connections" and press "New Pub/Sub", in the example below we'll call our new publisher KapuaSenseHatPublisher
.
To keep things clean we'll create a new topic called SenseHat
. To do so we'll move to the KapuaSenseHatPublisher
configuration and we'll update the Application Topic
field to A1/SenseHat
SenseHat driver¶
Kura™ provides a driver that allows to interact to a RaspberryPi SenseHat device using Kura Driver, Asset and Wires frameworks.
From the Kura™ documentation:
Eclipse Kura introduces a model based on the concepts of Drivers and Assets to simplify the communication with the field devices attached to a gateway.
A Driver encapsulates the communication protocol and its configuration parameters, dealing with the low-level characteristics of the field protocol. It opens, closes and performs the communication with the end field device. It also exposes field protocol specific information that can be used by upper levels of abstraction to simplify the interaction with the end devices.
An Asset is a logical representation of a field device, described by a list of Channels. The Asset uses a specific Driver instance to communicate with the underlying device and it models a generic device resource as a Channel. A register in a PLC or a GATT Characteristic in a Bluetooth device are examples of Channels. In this way, each Asset has multiple Channels for reading and writing data from/to an Industrial Device.
The Kura Sense Hat driver requires a few changes on the Raspberry Pi:
- Configured SenseHat: see SenseHat documentation
- I2C interface should be unlocked using
sudo raspi-config
As others Drivers supported by Kura, it is distributed as a deployment package on the Eclipse Marketplace. It consists of two packages:
We need to install both. Complete installation instructions are available here.
Driver configuration¶
We now need to configure the driver to access the sensors on the SenseHat. Move to the "Driver and Assets" section of the web UI and create a new driver. We'll call it driver-sensehat
.
Then add a new Asset (which we'll call asset-sensehat
) to this driver and configure it as per the screenshots below. We'll need a Channel for every sensor we want to access.
Refer to the following table for the driver parameters:
name | type | value.type | resource |
---|---|---|---|
ACC_X | READ | FLOAT | ACCELERATION_X |
ACC_Y | READ | FLOAT | ACCELERATION_Y |
ACC_Z | READ | FLOAT | ACCELERATION_Z |
GYRO_X | READ | FLOAT | GYROSCOPE_X |
GYRO_Y | READ | FLOAT | GYROSCOPE_Y |
GYRO_Z | READ | FLOAT | GYROSCOPE_Z |
HUMIDITY | READ | FLOAT | HUMIDITY |
PRESSURE | READ | FLOAT | PRESSURE |
TEMP_HUM | READ | FLOAT | TEMPERATURE_FROM_HUMIDITY |
TEMP_PRESS | READ | FLOAT | TEMPERATURE_FROM_PRESSURE |
After correctly configuring it you should see the data in the "Data" page of the UI.
Wire graph¶
Now that we have our Driver and Cloud Publisher ready we can put everything together with a Kura Wire Graph.
From Kura™ documentation:
The Kura™ Wires feature aims to simplify the development of IoT Edge Computing Applications leveraging reusable configurable components that can be wired together and which, eventually, allows configurable cooperation between these components.
In the dataflow programming model, the application logic is expressed as a directed graph (flow) where each node can have inputs, outputs, and independent processing units. There are nodes that only produce outputs and ones that only consume inputs, which usually represent the start and the end of the flow. The inner-graph nodes process the inputs and produce outputs for downstream nodes. The processing unit of a node executes independently and does not affect the execution of other nodes. Thus, the nodes are highly reusable and portable.
Move to the "Wire Graph" section of the UI. We'll need a graph with three components:
- A
Timer
which will dictate the sample rate at which we will collect data coming from the Sense Hat - A
WireAsset
for the Sense Hat driver asset - A
Publisher
for the Kapua publisher we created before.
The resulting Wire Graph will look like this:
Timer¶
Configure the timer such that it will poll the SenseHat each second, this can be done by setting the simple.interval
to 1
.
WireAsset¶
Select the driver-sensehat
when creating the WireAsset. No further configuration is needed for this component.
Publisher¶
Create a "Publisher" Wire component and select the KapuaSensehatPublisher
from the target filter.
Don't forget to press "Apply" to start the Wire Graph!
Collect the data¶
At this point you should see data coming from the Rasperry Pi from the Kapua™ console under the SenseHat
topic.
You can download the .csv
file directly from the console using the "Export to CSV" button.
Model building and training¶
Overview¶
We will now use the data collected in the previous section to train an artificial neural network-based Anomaly Detector of our design. To this end we will use an Autoencoder model. To understand why we choose such model we need to understand how it works. From Wikipedia:
An autoencoder is a type of artificial neural network used to learn efficient codings of unlabeled data (unsupervised learning). The encoding is validated and refined by attempting to regenerate the input from the encoding. The autoencoder learns a representation (encoding) for a set of data, typically for dimensionality reduction, by training the network to ignore insignificant data (“noise”).
Another application for autoencoders is anomaly detection. By learning to replicate the most salient features in the training data [...] the model is encouraged to learn to precisely reproduce the most frequently observed characteristics. When facing anomalies, the model should worsen its reconstruction performance. In most cases, only data with normal instances are used to train the autoencoder; in others, the frequency of anomalies is small compared to the observation set so that its contribution to the learned representation could be ignored. After training, the autoencoder will accurately reconstruct "normal" data, while failing to do so with unfamiliar anomalous data. Reconstruction error (the error between the original data and its low dimensional reconstruction) is used as an anomaly score to detect anomalies
In simple terms:
- The Autoencoder is a artificial neural network model that learns how to reconstruct the input data at the output.
- If trained on "normal" data, it learns to recontruct only normal data and fails to reconstruct anomalies.
- We can detect anomalies by computing the reconstruction error of the Autoencoder. If the error is above a certain threshold (which we will decide) the input sample is an anomaly.
Why did we choose this approach over others?
- The Autoencoder falls in the "Unsupervised Learning" category: it doesn't need labeled data to be trained i.e. we don't need to go through all the dataset and manually label the samples as "normal" or "anomaly" (Supervised Learning).
- Simpler data collection: we just need to provide it with the "normal" data. We don't need to artificially generate anomalies to train it on them.
Data Processing¶
We can now work on our .csv
file downloaded from Kapua. For demonstration purposes an already available dataset is provided within this repository.
If you're running this notebook through Google Colab you'll need to download the dataset running the cell below:
!wget https://raw.githubusercontent.com/mattdibi/eclipsecon-edgeAI-talk/master/notebook/train-data-raw.csv
--2022-10-18 15:32:34-- https://raw.githubusercontent.com/mattdibi/eclipsecon-edgeAI-talk/master/notebook/train-data-raw.csv Resolving raw.githubusercontent.com (raw.githubusercontent.com)... 185.199.110.133, 185.199.111.133, 185.199.108.133, ... Connecting to raw.githubusercontent.com (raw.githubusercontent.com)|185.199.110.133|:443... connected. HTTP request sent, awaiting response... 200 OK Length: 13288126 (13M) [text/plain] Saving to: ‘train-data-raw.csv.1’ train-data-raw.csv. 100%[===================>] 12,67M 4,56MB/s in 2,8s 2022-10-18 15:32:38 (4,56 MB/s) - ‘train-data-raw.csv.1’ saved [13288126/13288126]
!ls *.csv
train-data-raw.csv
Let's start taking a look at the content of this dataset, we'll use pandas (Python Data Analysis library) for this.
import pandas as pd
raw_data = pd.read_csv("./train-data-raw.csv")
raw_data.head()
ID | TIMESTAMP | MAGNET_X | TEMP_HUM_timestamp | MAGNET_Z | MAGNET_Y | ACC_Y | ACC_X | GYRO_Y_timestamp | ACC_Z | ... | PRESSURE_timestamp | MAGNET_X_timestamp | ACC_X_timestamp | GYRO_Z_timestamp | HUMIDITY_timestamp | assetName | ACC_Z_timestamp | GYRO_X | GYRO_Y | GYRO_Z | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 1 | 1645778791786 | -2.680372 | 1645778791413 | 5.036951 | 8.646852 | 0.004364 | 0.080122 | 1645778791413 | 0.984048 | ... | 1645778791413 | 1645778791413 | 1645778791413 | 1645778791413 | 1645778791413 | asset-sensehat | 1645778791413 | 0.053243 | 0.028920 | 0.036950 |
1 | 2 | 1645778792381 | -3.110756 | 1645778792378 | 5.952562 | 10.521458 | 0.005091 | 0.080122 | 1645778792378 | 0.992090 | ... | 1645778792378 | 1645778792378 | 1645778792378 | 1645778792378 | 1645778792378 | asset-sensehat | 1645778792378 | -0.051105 | -0.028920 | -0.037256 |
2 | 3 | 1645778793412 | -3.482263 | 1645778793408 | 6.719675 | 11.944528 | 0.005334 | 0.080122 | 1645778793408 | 0.986729 | ... | 1645778793408 | 1645778793408 | 1645778793408 | 1645778793408 | 1645778793408 | asset-sensehat | 1645778793408 | -0.025253 | 0.025560 | 0.038478 |
3 | 4 | 1645778794411 | -3.813552 | 1645778794407 | 7.375115 | 13.093461 | 0.006061 | 0.080122 | 1645778794407 | 0.990384 | ... | 1645778794407 | 1645778794407 | 1645778794407 | 1645778794407 | 1645778794407 | asset-sensehat | 1645778794407 | 0.100695 | -0.023422 | -0.037867 |
4 | 5 | 1645778795411 | -4.050513 | 1645778795407 | 7.854155 | 14.029530 | 0.004849 | 0.080607 | 1645778795407 | 0.988922 | ... | 1645778795407 | 1645778795407 | 1645778795407 | 1645778795407 | 1645778795407 | asset-sensehat | 1645778795407 | -0.100389 | 0.021895 | 0.038172 |
5 rows × 29 columns
Feature selection¶
As you might notice there's some information in the dataset we don't care about and are not meaningful for our application:
ID
- The various
timestamps
assetName
which doesn't change
Then we can remove them from the dataset.
features = ['ACC_Y', 'ACC_X', 'ACC_Z',
'PRESSURE', 'TEMP_PRESS', 'TEMP_HUM',
'HUMIDITY', 'GYRO_X', 'GYRO_Y', 'GYRO_Z']
data = raw_data[features]
data.head()
ACC_Y | ACC_X | ACC_Z | PRESSURE | TEMP_PRESS | TEMP_HUM | HUMIDITY | GYRO_X | GYRO_Y | GYRO_Z | |
---|---|---|---|---|---|---|---|---|---|---|
0 | 0.004364 | 0.080122 | 0.984048 | 992.322998 | 38.724998 | 40.330822 | 19.487146 | 0.053243 | 0.028920 | 0.036950 |
1 | 0.005091 | 0.080122 | 0.992090 | 992.288330 | 38.772915 | 40.385788 | 19.465750 | -0.051105 | -0.028920 | -0.037256 |
2 | 0.005334 | 0.080122 | 0.986729 | 992.275635 | 38.795834 | 40.349144 | 19.572731 | -0.025253 | 0.025560 | 0.038478 |
3 | 0.006061 | 0.080122 | 0.990384 | 992.279053 | 38.797916 | 40.330822 | 19.358767 | 0.100695 | -0.023422 | -0.037867 |
4 | 0.004849 | 0.080607 | 0.988922 | 992.333008 | 38.845833 | 40.385788 | 19.390862 | -0.100389 | 0.021895 | 0.038172 |
%matplotlib inline
import matplotlib.pyplot as plt
data.hist(bins=50, figsize=(20,15))
plt.show()
Note: Some of you might notice that this is a really simple dataset: some of the input data (like GYRO_*
and ACC_*
) do not change much over time. Such a dataset is not very challenging and a few, well-placed, thresholds might be sufficient to spot anomalous behaviour. For this tutorial we decided to keep things simple and easy to replicate. Anomalies can be simply triggered by moving the Raspberry Pi around.
Keep in mind that this approach is generic: any dataset from any appliance/connected device can be processed in the same way we're showing here. That's the magic of neural networks!
Feature scaling¶
AI models don't perform well when the input numerical attributes have very different scales. As you can see ACC_X
, ACC_Y
and ACC_Z
range from 0 to 1, while the PRESSURE
have far higher values.
There are two common ways to address this: normalization and standardization.
Normalization (a.k.a. Min-max scaling) shifts and rescales values so that they end up ranging from 0 to 1. This can be done by subtracting the min value and dividing by the max minus the min.
x' = $\frac{x - min(x)}{max(x) - min(x)}$
Standardization makes the values of each feature in the data have zero-mean (when subtracting the mean in the numerator) and unit-variance. The general method of calculation is to determine the distribution mean and standard deviation for each feature. Next we subtract the mean from each feature. Then we divide the values (mean is already subtracted) of each feature by its standard deviation.
x' = $\frac{x - avg(x)}{\sigma}$
Fortunately for us scikit-learn library provides a function for both of them. In this case we'll use normalization because it works well for this application.
print("Data used in the Triton preprocessor")
print("-----------Min-----------")
print(data.min())
print("-----------Max-----------")
print(data.max())
print("-------------------------")
Data used in the Triton preprocessor -----------Min----------- ACC_Y -0.132551 ACC_X -0.049693 ACC_Z 0.759847 PRESSURE 976.001709 TEMP_PRESS 38.724998 TEMP_HUM 40.220890 HUMIDITY 13.003981 GYRO_X -1.937896 GYRO_Y -0.265019 GYRO_Z -0.250647 dtype: float64 -----------Max----------- ACC_Y 0.093099 ACC_X 0.150289 ACC_Z 1.177543 PRESSURE 1007.996338 TEMP_PRESS 46.093750 TEMP_HUM 48.355824 HUMIDITY 23.506138 GYRO_X 1.923712 GYRO_Y 0.219204 GYRO_Z 0.671759 dtype: float64 -------------------------
from sklearn.preprocessing import MinMaxScaler
scaler = MinMaxScaler()
scaled_data = scaler.fit_transform(data.to_numpy())
pd.DataFrame(scaled_data).describe()
0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | |
---|---|---|---|---|---|---|---|---|---|---|
count | 25278.000000 | 25278.000000 | 25278.000000 | 25278.000000 | 25278.000000 | 25278.000000 | 25278.000000 | 25278.000000 | 25278.000000 | 25278.000000 |
mean | 0.603124 | 0.674196 | 0.550454 | 0.526446 | 0.605576 | 0.552252 | 0.466400 | 0.501160 | 0.545457 | 0.271295 |
std | 0.049333 | 0.015135 | 0.031627 | 0.054050 | 0.288300 | 0.256587 | 0.176293 | 0.062908 | 0.067678 | 0.014665 |
min | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 |
25% | 0.597087 | 0.667343 | 0.544924 | 0.481917 | 0.501060 | 0.441442 | 0.325637 | 0.501348 | 0.544670 | 0.270709 |
50% | 0.603534 | 0.673413 | 0.551342 | 0.521377 | 0.655357 | 0.608108 | 0.511715 | 0.501841 | 0.547096 | 0.271685 |
75% | 0.611055 | 0.680698 | 0.555426 | 0.552892 | 0.819339 | 0.734234 | 0.575212 | 0.502407 | 0.549386 | 0.272577 |
max | 1.000000 | 1.000000 | 1.000000 | 1.000000 | 1.000000 | 1.000000 | 1.000000 | 1.000000 | 1.000000 | 1.000000 |
Train test split¶
The only way to know how well a model will generalize to new data points is to try it on new data. To do so we split our data into two sets: the training set and the test set.
To do so we'll use a function from scikit-learn
.
from sklearn.model_selection import train_test_split
import numpy as np
x_train, x_test = train_test_split(scaled_data, test_size=0.3, random_state=42)
x_train = x_train.astype(np.float32)
x_test = x_test.astype(np.float32)
Model training¶
We can now leverage the Keras API of Tensorflow for creating our Autoencoder and then train it on our dataset.
We'll design a neural network architecture such that we impose a bottleneck in the network which forces a compressed knowledge representation of the original input (also called the latent-space representation). If the input features were each independent of one another, this compression and subsequent reconstruction would be a very difficult task. However, if some sort of structure exists in the data (ie. correlations between input features), this structure can be learned and consequently leveraged when forcing the input through the network's bottleneck.
The bottleneck consists of reducing the number of neurons for each layer of the neural network up to a certain point, and then increase the number until the original input number is reached. This will result in a hourglass shape which is typical for the Autoencoders.
Build the Autoencoder model¶
In this example we'll use a basic fully-connected autoencoder but keep in mind that autoencoders can be built with different classes of neural network (i.e. Convolutional Neural Networks, Recurrent Neural Networks etc).
import os
os.environ['TF_CPP_MIN_LOG_LEVEL']='2' # Avoid AVX2 error
from tensorflow.keras.models import Model
from tensorflow.keras.layers import Input, Dense, Dropout
def create_model(input_dim):
# The encoder will consist of a number of dense layers that decrease in size
# as we taper down towards the bottleneck of the network, the latent space
input_data = Input(shape=(input_dim,), name='INPUT0')
# hidden layers
encoder = Dense(9, activation='tanh', name='encoder_1')(input_data)
encoder = Dropout(.15)(encoder)
encoder = Dense(6, activation='tanh', name='encoder_2')(encoder)
encoder = Dropout(.15)(encoder)
# bottleneck layer
latent_encoding = Dense(3, activation='linear', name='latent_encoding')(encoder)
# The decoder network is a mirror image of the encoder network
decoder = Dense(6, activation='tanh', name='decoder_1')(latent_encoding)
decoder = Dropout(.15)(decoder)
decoder = Dense(9, activation='tanh', name='decoder_2')(decoder)
decoder = Dropout(.15)(decoder)
# The output is the same dimension as the input data we are reconstructing
reconstructed_data = Dense(input_dim, activation='linear', name='OUTPUT0')(decoder)
autoencoder_model = Model(input_data, reconstructed_data)
return autoencoder_model
autoencoder_model = create_model(len(features))
autoencoder_model.summary()
Model: "model" _________________________________________________________________ Layer (type) Output Shape Param # ================================================================= INPUT0 (InputLayer) [(None, 10)] 0 encoder_1 (Dense) (None, 9) 99 dropout (Dropout) (None, 9) 0 encoder_2 (Dense) (None, 6) 60 dropout_1 (Dropout) (None, 6) 0 latent_encoding (Dense) (None, 3) 21 decoder_1 (Dense) (None, 6) 24 dropout_2 (Dropout) (None, 6) 0 decoder_2 (Dense) (None, 9) 63 dropout_3 (Dropout) (None, 9) 0 OUTPUT0 (Dense) (None, 10) 100 ================================================================= Total params: 367 Trainable params: 367 Non-trainable params: 0 _________________________________________________________________
Model training¶
As we already explained, the autoencoder is a type of artificial neural network used to learn efficient codings of unlabeled data. We'll use that to reconstruct the input at the output. To train an autoencoder we don’t need to do anything fancy, just throw the raw input data at it. Autoencoders are considered an unsupervised learning technique since they don’t need explicit labels to train on but to be more precise they are self-supervised because they generate their own labels from the training data.
To train our neural network we need to have a performance metric to measure how well it is learning to reconstruct the data i.e. our loss function. The loss function in our example, which we need to minimize during our training, is the error between the input data and the data reconstructed by the autoencoder. We'll use the Mean Squared Error.
MSE = $\frac{1}{n}\sum_{i=1}^{n}{(Y_i - Y'_i)^2}$
Where:
- $n$: is the number of features (10 in our example)
- $Y_i$: is the original data point i.e. the input of the autoencoder
- $Y'_i$: is the reconstructed data point i.e. the output of the autoencoder
Before starting the training we need to set the hyperparameters). Hyperparameters are parameters whose values control the learning process and determine the values of model parameters that a learning algorithm ends up learning. These are the learning_rate
, max_epochs
, optimizer
and the batch_size
you see in the code snippet below. You may ask yourself how to set them, it all comes down to trial and error. Try tweaking them below and see how they affect the learning process...
A good explaination of their meaning can be found in the Keras documentation.
from tensorflow.keras import optimizers
batch_size = 32
max_epochs = 15
learning_rate = .0001
opt = optimizers.Adam(learning_rate=learning_rate)
autoencoder_model.compile(optimizer=opt, loss='mse', metrics=['accuracy'])
train_history = autoencoder_model.fit(x_train, x_train,
shuffle=True,
epochs=max_epochs,
batch_size=batch_size,
validation_data=(x_test, x_test))
Epoch 1/15 553/553 [==============================] - 1s 1ms/step - loss: 0.2282 - accuracy: 0.1129 - val_loss: 0.0922 - val_accuracy: 0.0045 Epoch 2/15 553/553 [==============================] - 1s 1ms/step - loss: 0.0949 - accuracy: 0.1541 - val_loss: 0.0279 - val_accuracy: 0.4210 Epoch 3/15 553/553 [==============================] - 1s 1ms/step - loss: 0.0613 - accuracy: 0.1779 - val_loss: 0.0206 - val_accuracy: 0.4426 Epoch 4/15 553/553 [==============================] - 1s 1ms/step - loss: 0.0466 - accuracy: 0.2152 - val_loss: 0.0186 - val_accuracy: 0.5276 Epoch 5/15 553/553 [==============================] - 1s 1ms/step - loss: 0.0366 - accuracy: 0.2514 - val_loss: 0.0157 - val_accuracy: 0.5944 Epoch 6/15 553/553 [==============================] - 1s 1ms/step - loss: 0.0290 - accuracy: 0.3083 - val_loss: 0.0119 - val_accuracy: 0.6403 Epoch 7/15 553/553 [==============================] - 1s 1ms/step - loss: 0.0228 - accuracy: 0.3930 - val_loss: 0.0078 - val_accuracy: 0.7182 Epoch 8/15 553/553 [==============================] - 1s 1ms/step - loss: 0.0186 - accuracy: 0.4668 - val_loss: 0.0059 - val_accuracy: 0.8195 Epoch 9/15 553/553 [==============================] - 1s 1ms/step - loss: 0.0157 - accuracy: 0.5021 - val_loss: 0.0048 - val_accuracy: 0.8256 Epoch 10/15 553/553 [==============================] - 1s 1ms/step - loss: 0.0136 - accuracy: 0.5277 - val_loss: 0.0042 - val_accuracy: 0.8263 Epoch 11/15 553/553 [==============================] - 1s 1ms/step - loss: 0.0121 - accuracy: 0.5409 - val_loss: 0.0037 - val_accuracy: 0.8296 Epoch 12/15 553/553 [==============================] - 1s 1ms/step - loss: 0.0107 - accuracy: 0.5569 - val_loss: 0.0036 - val_accuracy: 0.8306 Epoch 13/15 553/553 [==============================] - 1s 1ms/step - loss: 0.0098 - accuracy: 0.5857 - val_loss: 0.0034 - val_accuracy: 0.8256 Epoch 14/15 553/553 [==============================] - 1s 1ms/step - loss: 0.0089 - accuracy: 0.6076 - val_loss: 0.0033 - val_accuracy: 0.8281 Epoch 15/15 553/553 [==============================] - 1s 1ms/step - loss: 0.0083 - accuracy: 0.6337 - val_loss: 0.0032 - val_accuracy: 0.8262
plt.plot(train_history.history['loss'])
plt.plot(train_history.history['val_loss'])
plt.legend(['loss on train data', 'loss on test data'])
<matplotlib.legend.Legend at 0x16d0c3400>
Here we can see the loss for the training set and the test set on the epochs.
Some of you might notice that this graph is somewhat unexpected. Why the validation loss is lower than the train loss? This is the effect of the regularization: regularization terms and dropout layer are affecting the network during training. A good writeup of this effect can be found here.
As an excercise try and compute the average MSE on the training set and the test set. You'll find that the MSE is lower in the training set!
We can now save the model on disk as we'll use this later.
autoencoder_model.save("./saved_model/autoencoder")
WARNING:absl:Function `_wrapped_model` contains input name(s) INPUT0 with unsupported characters which will be renamed to input0 in the SavedModel.
INFO:tensorflow:Assets written to: ./saved_model/autoencoder/assets
INFO:tensorflow:Assets written to: ./saved_model/autoencoder/assets
!ls ./saved_model/autoencoder
assets keras_metadata.pb saved_model.pb variables
Model evaluation¶
We now have a model that reconstruct the input at the output... doesn't sounds really useful right?
Let's see it in action. Let's take a sample from the test set and run it through our autoencoder.
input_sample = x_test[3:4].copy() # Deep copy
reconstructed_sample = autoencoder_model.predict(input_sample)
print(input_sample)
print(reconstructed_sample)
1/1 [==============================] - 0s 109ms/step [[0.603534 0.6770555 0.54900813 0.5327966 0.6680801 0.6171171 0.5198642 0.50135666 0.54716927 0.2718224 ]] [[0.59638697 0.67410123 0.5484349 0.52024144 0.64766663 0.5916597 0.4445051 0.499677 0.54471916 0.26904327]]
import matplotlib.pyplot as plt
index = np.arange(10)
bar_width = 0.35
figure, ax = plt.subplots()
inbar = ax.bar(index, input_sample[0], bar_width, label="Input data")
recbar = ax.bar(index+bar_width, reconstructed_sample[0], bar_width, label="Reconstruced data")
ax.set_xlabel('Features')
ax.set_xticks(index + bar_width / 2)
ax.set_xticklabels(features, rotation = 45)
ax.legend()
<matplotlib.legend.Legend at 0x16d184880>
As we can see from the graph above it reconstructed the input fairly well. It is not perfect since the Autoencoder is lossy but it is good enough
What happens if we manipulate this sample in a way the autoencoder doesn't expect (i.e. we introduce an anomaly)?
Let's try and set the ACC_Z
to a value the autoencoder has never seen before.
input_anomaly = input_sample.copy() # Deep copy
input_anomaly[0][2] = 0.15
reconstructed_anomaly = autoencoder_model.predict(input_anomaly)
print(input_anomaly)
print(reconstructed_anomaly)
1/1 [==============================] - 0s 21ms/step [[0.603534 0.6770555 0.15 0.5327966 0.6680801 0.6171171 0.5198642 0.50135666 0.54716927 0.2718224 ]] [[0.60162103 0.69035804 0.55594885 0.51874125 0.7346029 0.6700014 0.40932336 0.5034408 0.5424664 0.26861513]]
figure, ax = plt.subplots()
inbar = ax.bar(index, input_anomaly[0], bar_width, label="Input anomaly")
recbar = ax.bar(index+bar_width, reconstructed_anomaly[0], bar_width, label="Reconstruced anomaly")
ax.set_xlabel('Features')
ax.set_xticks(index + bar_width / 2)
ax.set_xticklabels(features, rotation = 45)
ax.legend()
<matplotlib.legend.Legend at 0x16d3c0220>
The autoencoder fails to reconstruct the data it received at the input. This means that the reconstruction error is very high.
from sklearn.metrics import mean_squared_error
print("Anomaly %f"% mean_squared_error(input_anomaly[0], reconstructed_anomaly[0]))
print("Normal %f"% mean_squared_error(input_sample[0], reconstructed_sample[0]))
Anomaly 0.018465 Normal 0.000698
It's working as expected!
We now need to decide when to trigger an alarm (i.e. classify an input sample as anomalous) from this reconstruction error. In other words we need to decide our threshold.
There are multiple ways to set this value, in this example we'll use the Z-Score.
From Wikipedia:
In statistics, the standard score is the number of standard deviations by which the value of a raw score (i.e., an observed value or data point) is above or below the mean value of what is being observed or measured.[...]
It is calculated by subtracting the population mean from an individual raw score and then dividing the difference by the population standard deviation.
We'll consider a sample an anomaly if the Reconstruction Error Z-Score is not in the range [-2, +2]. This means that if the reconstruction error for a sample is more than 2 standard deviation away from the average reconstruction error computed on the test set, the sample is an anomaly. This choice is arbirtary, we can control the sensitivity of the detector by changing this range.
x_test_recon = autoencoder_model.predict(x_test)
reconstruction_scores = np.mean((x_test - x_test_recon)**2, axis=1) # MSE
reconstruction_scores_pd = pd.DataFrame({'recon_score': reconstruction_scores})
print(reconstruction_scores_pd.describe())
237/237 [==============================] - 0s 620us/step recon_score count 7584.000000 mean 0.003175 std 0.005438 min 0.000098 25% 0.000816 50% 0.001211 75% 0.002108 max 0.106237
def z_score(mse_sample):
return (mse_sample - reconstruction_scores_pd.mean())/reconstruction_scores_pd.std()
mse_anomaly = mean_squared_error(input_anomaly[0], reconstructed_anomaly[0])
mse_normal = mean_squared_error(input_sample[0], reconstructed_sample[0])
z_score_anomaly = z_score(mse_anomaly)
z_score_normal = z_score(mse_normal)
print("Anomaly Z-score %f"% z_score_anomaly)
print("Normal Z-score %f"% z_score_normal)
Anomaly Z-score 2.811887 Normal Z-score -0.455488
We now have our anomaly detector... let's see how we can deploy it on our Kura™-powered edge device.
Model deployment¶
To deploy our model on the target device we'll leverage Kura™'s newly added Nvidia™ Triton Inferece Server integration.
The Nvidia™ Triton Inference Server is an open-source inference service software that enables the user to deploy trained AI models from any framework on GPU or CPU infrastructure. It supports all major frameworks like TensorFlow, TensorRT, PyTorch, ONNX Runtime, and even custom framework backend. With specific backends, it is also possible to run Python scripts, mainly for pre-and post-processing purposes, and exploit the DALI building block for optimized operations.
For installation refer to the official Kura™ and Triton documentation. For the rest of this tutorial we'll assume a Triton container is available on the target device. It can be simply installed with:
docker pull nvcr.io/nvidia/tritonserver:22.07-tf2-python-py3
We'll also need to install Kura™'s Triton bundles:
- Triton Server Component: for Kura-Triton integration
- AI Wire Component: for making the Triton Inference Server available through the Kura Wires as a Wire component.
Model conversion¶
The first step in using Triton to serve your models is to place one or more models into a model repository i.e. a folder were the model are available for Triton to load. Depending on the type of the model and on what Triton capabilities you want to enable for the model, you may need to create a model configuration for the model. This configuration is a protobuf containing informations about runtime configuration and input/output shape accepted by the model.
For our autoencoder model we'll need three "models":
- A Preprocessor for performing the operations described in the "Data processing" section (Wire envelop translation, feature selection and scaling)
- The Autoencoder model we exported in the "Model training" section
- A Postprocessor for performing the operations described in the "Model evaluation" section (Reconstruction error computation)
To simplify the handling of these models and improve inference performance, we'll use an advanced feature of Triton wich is an Ensemble Model. From Triton official documentation:
An ensemble model represents a pipeline of one or more models and the connection of input and output tensors between those models. Ensemble models are intended to be used to encapsulate a procedure that involves multiple models, such as "data preprocessing -> inference -> data postprocessing". Using ensemble models for this purpose can avoid the overhead of transferring intermediate tensors and minimize the number of requests that must be sent to Triton.
Autoencoder¶
As seen in the "Model training" section, our model is available as a Tensorflow SavedModel which can be simply loaded by the Triton Tensorflow backend. We just need to configure it properly.
We'll start by creating the following folder structure
tf_autoencoder_fp32
├── 1
│ └── model.savedmodel
│ ├── assets
│ ├── keras_metadata.pb
│ ├── saved_model.pb
│ └── variables
│ ├── variables.data-00000-of-00001
│ └── variables.index
└── config.pbtxt
This can be done by copying the model we saved in the Model Training section:
!rm -rf ./tf_autoencoder_fp32/ && mkdir -p ./tf_autoencoder_fp32/1
!ls
AD-EdgeAI.ipynb requirements.txt train-data-raw.csv README.md saved_model train-data-raw.csv.1 imgs tf_autoencoder_fp32
cp -r ./saved_model/autoencoder tf_autoencoder_fp32/1/model.savedmodel
!tree tf_autoencoder_fp32
tf_autoencoder_fp32 └── 1 └── model.savedmodel ├── assets ├── keras_metadata.pb ├── saved_model.pb └── variables ├── variables.data-00000-of-00001 └── variables.index 4 directories, 4 files
Now comes the hard part: we need to provide the model configuration (i.e. the config.pbtxt
file). In the case of the autoencoder is pretty simple:
name: "tf_autoencoder_fp32"
backend: "tensorflow"
max_batch_size: 0
input [
{
name: "INPUT0"
data_type: TYPE_FP32
dims: [ 1, 10 ]
}
]
output [
{
name: "OUTPUT0"
data_type: TYPE_FP32
dims: [ -1, 10 ]
}
]
version_policy: { all { }}
instance_group [{ kind: KIND_CPU }]
Each model input
and output
must specify the name
, data_type
and dims
. We already know all of these:
name
: corresponds to the layer name we've seen in the Model Training section.INPUT0
for the input andOUTPUT0
for the output.data_type
: will be float since we didn't perform any quantizationdims
: is the shape of the in/out tensor. In this case it will correspond to an array with the same length as the number of features.
Other interesting parameters of this configuration are:
backend
: where we set the backend for the model. In this case it will be the Tensorflow backendname
: the name of the model that must correspond to the name of the folderinstance_group
: where we set where we want the model to run. In this case we'll use the CPU since we're on a Raspberry Pi but keep in mind that Triton support multiple accelerators.
for a deep dive into the model configuration parameter take a look at the official documentation.
Preprocessor¶
As discussed in the "Data processing" section, before providing the incoming data to the autoencoder, we need to perform feature selection and scaling. In addition to these responsibilites, the Preprocessor will need to perform a sort of serialization of the data to comply to the input shape accepted by the Autoencoder. This is due to how Kura manages the data running on Wires. More details can be found here.
To perform all of this we'll use the Python backend available in Triton.
As described in the previous section we will need to provide the following folder structure:
preprocessor
├── 1
│ └── model.py
└── config.pbtxt
Preprocessor Configuration¶
As discussed in the official Kura documentation:
The AI wire component takes a WireEnvelope as an input, it processes its records and feeds them to the specified preprocessing or inference model.
...
The models that manage the input and the output must expect a list of inputs such that:
- each input corresponds to an entry of the
WireRecord
properties- the entry key will become the input name (e.g. in the case of an asset, the channel name becomes the tensor name)
- input shape will be
[1]
Therefore for our input
we'll have that each name corresponds to the names we've seen in the Data Collection section. The output
needs to correspond to the input accepted by the model (i.e. INPUT0
).
name: "preprocessor"
backend: "python"
input [
{
name: "ACC_X"
data_type: TYPE_FP32
dims: [ 1 ]
}
]
input [
{
name: "ACC_Y"
data_type: TYPE_FP32
dims: [ 1 ]
}
]
...
input [
{
name: "TEMP_PRESS"
data_type: TYPE_FP32
dims: [ 1 ]
}
]
output [
{
name: "INPUT0"
data_type: TYPE_FP32
dims: [ 1, 10 ]
}
]
instance_group [{ kind: KIND_CPU }]
Preprocessor Model¶
As we've seen in the Data Processing section the Preprocessor is responsible for scaling the input features and serializing them in the tensor shape expected by the Autoencoder model.
This can be done with the following python script:
import numpy as np
import json
import triton_python_backend_utils as pb_utils
class TritonPythonModel:
def initialize(self, args):
self.model_config = model_config = json.loads(args['model_config'])
output0_config = pb_utils.get_output_config_by_name(
model_config, "INPUT0")
self.output0_dtype = pb_utils.triton_string_to_numpy(
output0_config['data_type'])
def execute(self, requests):
output0_dtype = self.output0_dtype
responses = []
for request in requests:
acc_x = pb_utils.get_input_tensor_by_name(request, "ACC_X").as_numpy()
acc_y = pb_utils.get_input_tensor_by_name(request, "ACC_Y").as_numpy()
acc_z = pb_utils.get_input_tensor_by_name(request, "ACC_Z").as_numpy()
gyro_x = pb_utils.get_input_tensor_by_name(request, "GYRO_X").as_numpy()
gyro_y = pb_utils.get_input_tensor_by_name(request, "GYRO_Y").as_numpy()
gyro_z = pb_utils.get_input_tensor_by_name(request, "GYRO_Z").as_numpy()
humidity = pb_utils.get_input_tensor_by_name(request, "HUMIDITY").as_numpy()
pressure = pb_utils.get_input_tensor_by_name(request, "PRESSURE").as_numpy()
temp_hum = pb_utils.get_input_tensor_by_name(request, "TEMP_HUM").as_numpy()
temp_press = pb_utils.get_input_tensor_by_name(request, "TEMP_PRESS").as_numpy()
out_0 = np.array([acc_y, acc_x, acc_z, pressure, temp_press, temp_hum, humidity, gyro_x, gyro_y, gyro_z]).transpose()
# ACC_Y ACC_X ACC_Z PRESSURE TEMP_PRESS TEMP_HUM HUMIDITY GYRO_X GYRO_Y GYRO_Z
min = np.array([-0.132551, -0.049693, 0.759847, 976.001709, 38.724998, 40.220890, 13.003981, -1.937896, -0.265019, -0.250647])
max = np.array([ 0.093099, 0.150289, 1.177543, 1007.996338, 46.093750, 48.355824, 23.506138, 1.923712, 0.219204, 0.671759])
# MinMax scaling
out_0_scaled = (out_0 - min)/(max - min)
# Create output tensor
out_tensor_0 = pb_utils.Tensor("INPUT0",
out_0_scaled.astype(output0_dtype))
inference_response = pb_utils.InferenceResponse(
output_tensors=[out_tensor_0])
responses.append(inference_response)
return responses
Here there are two important things to note:
- The template we're using is taken from the Triton documentation and can be found here.
- The MinMax scaling must be the same we used in our training. For illustration purposes we wrote the
min
andmax
arrays we found in the Data Processing section but we could have serialized theMinMaxScaler
usingpickle
instead.
Postprocessor¶
As discussed in the "Data processing" section, to perform the anomaly detection step we need to compute the Mean Squared Error between the recontructed data and the actual input data. Due to this the configuration of the Postprocessor model will be somewhat more complicated than before: in addition to the output of the Autoencoder model we will need the output of the Preprocessor model.
To perform all of this we'll use the Python backend again.
As described in the previous section we will need to provide the following folder structure:
postprocessor
├── 1
│ └── model.py
└── config.pbtxt
Postprocessor Configuration¶
name: "postprocessor"
backend: "python"
input [
{
name: "RECONSTR0"
data_type: TYPE_FP32
dims: [ 1, 10 ]
}
]
input [
{
name: "ORIG0"
data_type: TYPE_FP32
dims: [ 1, 10 ]
}
]
output [
{
name: "ANOMALY_SCORE0"
data_type: TYPE_FP32
dims: [ 1 ]
}
]
output [
{
name: "ANOMALY0"
data_type: TYPE_BOOL
dims: [ 1 ]
}
]
instance_group [{ kind: KIND_CPU }]
As we can see we have two inputs and two outputs:
- The first input tensor is the reconstruction performed by the autoencoder model
- The second input tensor is the original data (already scaled and serialized by the Preprocessor model)
- The first output is the anomaly score i.e. the reconstruction error between the original and the reconstructed data.
- The second output is a boolean representing whether the data constitute an anomaly or not
Let's see how this is computed by the Python model.
Postprocessor Model¶
import numpy as np
import json
import triton_python_backend_utils as pb_utils
def z_score(mse):
return (mse - MEAN_MSE)/STD_MSE
class TritonPythonModel:
def initialize(self, args):
self.model_config = model_config = json.loads(args['model_config'])
output0_config = pb_utils.get_output_config_by_name(
model_config, "ANOMALY_SCORE0")
output1_config = pb_utils.get_output_config_by_name(
model_config, "ANOMALY0")
self.output0_dtype = pb_utils.triton_string_to_numpy(
output0_config['data_type'])
self.output1_dtype = pb_utils.triton_string_to_numpy(
output1_config['data_type'])
def execute(self, requests):
output0_dtype = self.output0_dtype
output1_dtype = self.output1_dtype
responses = []
for request in requests:
# Get input
x_recon = pb_utils.get_input_tensor_by_name(request, "RECONSTR0").as_numpy()
x_orig = pb_utils.get_input_tensor_by_name(request, "ORIG0").as_numpy()
# Get Mean square error between reconstructed input and original input
reconstruction_score = np.mean((x_orig - x_recon)**2, axis=1)
# Z-Score of Mean square error must be inside [-2; 2]
anomaly = np.array([z_score(reconstruction_score) < -2.0 or z_score(reconstruction_score) > 2.0])
# Create output tensors
out_tensor_0 = pb_utils.Tensor("ANOMALY_SCORE0",
reconstruction_score.astype(output0_dtype))
out_tensor_1 = pb_utils.Tensor("ANOMALY0",
anomaly.astype(output1_dtype))
inference_response = pb_utils.InferenceResponse(
output_tensors=[out_tensor_0, out_tensor_1])
responses.append(inference_response)
return responses
As you can see the script is simple:
- It gets the input tensors
- It computes the Mean Squared Error between the inputs (which is what we called the reconstruction error)
- It computes the Z-Score of the MSE computed for the current sample and flags it as an anomaly if it is farther than 2 standard deviations away from the average MSE.
Note: MEAN_MSE
and STD_MSE
are the mean value and the standard deviation of the Mean Squared Error computed on the test set and correspond to the reconstruction_scores_pd.mean()
and reconstruction_scores_pd.std()
we used in the previous section. We didn't set them as they change for every training performed on the Autoencoder. Be sure to set it to their proper values before trying this model on the Triton server!
Ensemble model¶
To make things easier for ourselves and improve performance we'll consolidate the AI pipeline into an Ensemble Model.
We will need to provide the following folder structure:
ensemble_pipeline
├── 1
└── config.pbtxt
Note that the 1
folder is empty. The ensemble model essentially describe how to connect the models that belong to the processing pipeline.
Therefore we'll need to focus on the configuration only.
name: "ensemble_pipeline"
platform: "ensemble"
max_batch_size: 0
input [
{
name: "ACC_X"
data_type: TYPE_FP32
dims: [ 1 ]
}
]
input [
{
name: "ACC_Y"
data_type: TYPE_FP32
dims: [ 1 ]
}
]
...
input [
{
name: "TEMP_PRESS"
data_type: TYPE_FP32
dims: [ 1 ]
}
]
output [
{
name: "ANOMALY_SCORE0"
data_type: TYPE_FP32
dims: [ 1 ]
}
]
output [
{
name: "ANOMALY0"
data_type: TYPE_BOOL
dims: [ 1 ]
}
]
ensemble_scheduling {
step [
{
model_name: "preprocessor"
model_version: -1
input_map{
key: "ACC_X"
value: "ACC_X"
}
input_map{
key: "ACC_Y"
value: "ACC_Y"
}
...
input_map{
key: "TEMP_PRESS"
value: "TEMP_PRESS"
}
output_map {
key: "INPUT0"
value: "preprocess_out"
}
},
{
model_name: "tf_autoencoder_fp32"
model_version: -1
input_map {
key: "INPUT0"
value: "preprocess_out"
}
output_map {
key: "OUTPUT0"
value: "autoencoder_output"
}
},
{
model_name: "postprocessor"
model_version: -1
input_map {
key: "RECONSTR0"
value: "autoencoder_output"
}
input_map {
key: "ORIG0"
value: "preprocess_out"
}
output_map {
key: "ANOMALY_SCORE0"
value: "ANOMALY_SCORE0"
}
output_map {
key: "ANOMALY0"
value: "ANOMALY0"
}
}
]
}
The configuration is split in two main parts:
- The first is the usual configuration we've seen before: we describe what are the input and the output of our model. In this case the input will correspond to the input of the first model of the pipeline (the Preprocessor) and the output to the output of the last model of the pipeline (the Postprocessor)
- The second part describe how to map the input/output of the models within the pipeline
To better visualize the configuration we can look at the graph below.
Conversion results¶
At this point we should have a folder structure that looks like this:
models
├── ensemble_pipeline
│ ├── 1
│ └── config.pbtxt
├── postprocessor
│ ├── 1
│ │ └── model.py
│ └── config.pbtxt
├── preprocessor
│ ├── 1
│ │ └── model.py
│ └── config.pbtxt
└── tf_autoencoder_fp32
├── 1
│ └── model.savedmodel
│ ├── assets
│ ├── keras_metadata.pb
│ ├── saved_model.pb
│ └── variables
│ ├── variables.data-00000-of-00001
│ └── variables.index
└── config.pbtxt
Kura Deployment¶
We can now move our pipeline to the target device for inference on the edge.
We want to perform anomaly detection in real time, directly within the edge device, using the same data we used to collect for our training.
Triton component configuration¶
To do so we need to copy the models
folder on the target device. For this example we'll use the /home/pi/models
path.
We can now move to the Kura web UI and create a new Triton Server Container Service component instance. The complete documentation can be found here.
In this example we'll call it TritonContainerService
.
Then we'll need to configure it to run our models. Move to the TritonContainerService
configuration interface and set the following parameters:
- Image name/Image tag: use the name and tag of the Triton container image you installed. We're using
nvcr.io/nvidia/tritonserver:22.07-tf2-python-py3
in this example. - Local model repository path: in our example is
/home/pi/models
- Inference Models: we'll need to load all the models of the pipeline so:
preprocessor,postprocessor,tf_autoencoder_fp32,ensemble_pipeline
- Optional configuration for the local backends:
tensorflow,version=2
since Tensorflow 2 is the only available Tensorflow backend in the Triton container image we're using.
You can leave everything else as default.
Once you press the "Apply" button Kura will create a new container from the Triton image we set and spin up the service with our models loaded.
pi@raspberrypi:~ $ docker ps
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
4deae2857b6f nvcr.io/nvidia/tritonserver:22.07-tf2-python-py3 "tritonserver --mode…" 13 seconds ago Up 11 seconds 0.0.0.0:4000->8000/tcp, :::4000->8000/tcp, 0.0.0.0:4001->8001/tcp, :::4001->8001/tcp, 0.0.0.0:4002->8002/tcp, :::4002->8002/tcp tritonserver-kura
Note: if no container is created check that the "Container Orchestration Service" is enabled in the Kura UI. Full documentation for the service can be found here.
Note: if you see an error in the logs like "Internal: Unable to initialize shared memory key 'triton_python_backend_shm_region_2' to requested size (67108864 bytes). If you are running Triton inside docker, use '--shm-size' flag to control the shared memory region size. Each Python backend model instance requires at least 64MBs of shared memory.", you can update the default shared memory size allocated by the Docker daemon. Go to
/etc/docker/daemon.json
, set"default-shm-size": "200m"
and restart the Docker daemon with:sudo systemctl restart docker
.
Wire Graph¶
Finally we can move to the "Wire Graph" UI and create the AI component (in the Emitters/Receiver menu) for interfacing with the Triton instance we just created. We'll call it Triton
in this example.
We just need to change two parameter in the configuration:
- InferenceEngineService Target Filter: we need to select the
TritonContainerService
we created at the step above - inference.model.name: Since we're using an ensemble pipeline we need only that as our inference model.
The resulting wire graph is the following:
And that's it! We should now see the anomaly detection results coming to Kapua in addition to the SenseHat data.
Complete Example¶
A similar but more complete example of the feature presented in this notebook is available in the official Kura™ repository containing all the code and the configuration needed to make it work.
Give it a try!