ML - Performance Analysis on i.MX Platforms

Published on January 13, 2021

This blog post will show some Machine Learning (ML) performance analysis across our platforms and especially with the i.MX 8M Plus based Nitrogen8MP!

It was done as part of our i.MX 8M Plus Machine Learning webinar which you can see here:

Machine Learning 101

HW Setup

In this article, we will focus on 2 different setups:

1. Nitrogen8M + Google EdgeTPU

i.MX 8M Quad based SBC
- Quad Arm® Cortex®-A53 cores @1.5GHz
- GC7000Lite GPU (OpenGL® ES 3.1)
- 4k video decoder + 4k video output
- 2x MIPI-CSI2 camera inputs
- 2x PCI Express Gen2 interfaces
- 2x USB 3.0 controllers
Google Edge TPU ML accelerator
- 4 TOPS total peak performance (int8)
- Integrated power management
- Operating temp: -20 to +85 °C

2. Nitrogen8MP

i.MX 8M Plus based SoM
- Quad Arm® Cortex®-A53 cores @1.8GHz
- Neural Processing Unit (NPU) 2.3 TOPS
- GC7000UltraLite GPU (OpenGL® ES 3.1) + GC520L 2D
- 1080p video encode/decode
- 3 display outputs (MIPI-DSI, LVDS, HDMI)
- 2x MIPI-CSI2 camera inputs + ISP
- 1x PCI Express Gen2 interfaces
- 2x USB 3.0 controllers
- Small form factor: 48mm x 38mm

SW Setup

As the EdgeTPU doesn't use the standard libneuralnetworks.so library for its acceleration, we couldn't use the exact same application. So we decided to use a simple python app in both cases using the same model, label and input image. We packaged everything into one archive for you to download:

benchmark_npu.zip

1. Mendel Linux + PyCoral API

For the EdgeTPU, we will use our Mendel Linux release. along with freely available PyCoral examples:

classify_image.py

$ python3 coral/tflite/python/examples/classification/classify_image.py \
    --model mobilenet_v1_1.0_224_quant_edgetpu.tflite \
    --labels labels_mobilenet_quant_v1_224.txt --input cat.jpg 
----INFERENCE TIME----
Note: The first inference on Edge TPU is slow because it includes loading the model into Edge TPU memory.
27.0ms
4.3ms
4.2ms
4.5ms
4.1ms
-------RESULTS--------
Egyptian cat: 0.58594

detect_image.py

$ python3 coral/tflite/python/examples/detection/detect_image.py \
    --model detect_edgetpu.tflite --labels coco_labels.txt --input bus.jpg
----INFERENCE TIME----
Note: The first inference is slow because it includes loading the model into Edge TPU memory.
39.81 ms
13.59 ms
12.94 ms
15.37 ms
13.85 ms
-------RESULTS--------
bus
id: 5
score: 0.83203125
bbox: BBox(xmin=222, ymin=197, xmax=630, ymax=592)

Note that the default model used for this example was generated using the edgetpu_compiler:

$ edgetpu_compiler mobilenet_v1_1.0_224_quant.tflite
Edge TPU Compiler version 15.0.340273435
Model compiled successfully in 431 ms.
Input model: mobilenet_v1_1.0_224_quant.tflite
Input size: 4.08MiB
Output model: mobilenet_v1_1.0_224_quant_edgetpu.tflite
Output size: 4.40MiB
On-chip memory used for caching model parameters: 4.33MiB
On-chip memory remaining for caching model parameters: 2.81MiB
Off-chip memory used for streaming uncached model parameters: 0.00B
Number of Edge TPU subgraphs: 1
Total number of operations: 31
Operation log: mobilenet_v1_1.0_224_quant_edgetpu.log
See the operation log file for individual operation details.

2. Yocto Zeus (Beta2) + eIQ

As of this writing, we recommend the following Yocto release to use the eIQ from NXP:

Nitrogen8MP Yocto Zeus release

Installation of eIQ is very straightforward once the image is booted up:

root@nitrogen8mp:~# pip3 install eiq

Then you can run the following commands to use the same type of application with the same model/label/input from our benchmark archive:

root@nitrogen8mp:~# pyeiq --run object_classification_tflite -i cat.jpg \
    -m mobilenet_v1_1.0_224_quant.tflite -l labels_mobilenet_quant_v1_224.txt
root@nitrogen8mp:~# pyeiq --run object_detection_tflite -i bus.jpg \
    -m detect.tflite -l coco_labels.txt

In case of the eIQ tests, the result / inference time will show on the display instead of in the command prompt.

Performance analysis results

Here is a table that summarizes our findings, comparing both Neural Processing solutions.

First, the Nitrogen8MP offers better CPU performance which makes sense as its cores are running at a higher frequency than the Nitrogen8M.

Then you can see that both solutions provide about the same performance with a slight advantage to the i.MX 8M Plus NPU.

This might come as a surprise since the TPU is rated with more TOPS but note that this figure is for a highly optimized model. In this article we use the same "standard" pre-built model, the EdgeTPU could most likely offer better performance with an optimized model.

Anyway we invite you to use these benchmarking techniques to better select the right product for your project.

As always, feel free to send your feedback about this article to support@boundarydevices.com.