The SmartFace Embedded solvers are binaries processing the NN model inference and model-related pre-processing and post-processing operation. For each NN model, there is a specific solver.

Our NN models can be accelerated on a variety of HW, including CPUs, GPUs and NPUs. Depending on the target platform and inference engine we distinguish different solvers.

Currently, SmartFace Embedded supports NN acceleration using the following inference engines.

In SFE Toolkit use the function sfeSolverCreate to load a correct solver.


std::string solver_face_detect = SOLVER_FACE_DETECT;
SFESolver detector_solver{};
// Load detection solver
SFEError error =
    sfeSolverCreate(solver_face_detect.c_str(), &detector_solver);

To configure solver-specific parameters, use the function sfeSolverCreateWithParameters instead.

In SFE Stream Processor you set the paths to the solvers to be loaded and used via solvers section of settings.yaml:

ONNX Runtime

ONNX Runtime inference engine enables inference of NN models on a variety of HW using different so-called execution providers. ONNX Runtime solver comes with the suffix onnxrt.solver in its name.

ONNX Runtime solvers are currently supported for Windows x86, Linux x86, Jetson ARM64 and Android architectures.

SFE Toolkit currently uses the following ONNX Runtime versions:

  • 1.13.1 - Linux, Windows and NVidia Jetson with Jetpack 5.0 and higher
  • 1.12.0 - NVidia Jetson with Jetpack 4.6
  • 1.7.2 - NVidia Jetson with Jetpack 4.4

The ONNX Runtime execution provider can be configured using a solver parameter “runtime_provider” or environment variable ONNXRUNTIME_SOLVER_RUNTIME_PROVIDER. Supported runtime/execution providers are:

  • “cpu” - Linux, Windows, Jetson, Android
  • “cuda” - Linux, Windows, Jetson
  • “tensorrt” - Linux, Windows, Jetson Note: Environment variable overrides solver parameter.

ONNXRuntime solvers depend on the onnxruntime library. For Windows, Microsoft C and C++ (MSVC) runtime libraries are required to be installed see this page.


The default ONNX Runtime CPU Execution Provider is MLAS.

The performance and CPU utilization can be configured using the following solver parameters and environment variables:

  • solver parameter “inter_threads” or env variable ONNXRUNTIME_SOLVER_INTER_THREADS ** Sets the number of threads used to parallelize the execution of the graph (across nodes). ** Default value is 0 which means the default number of threads will be used. ** If “parallel” execution mode is turned on, this sets the maximum number of threads to use to run them in parallel. ** If “sequential” execution mode is enabled this value is ignored, it acts as if it was set to 1.
  • solver parameter “intra_threads” or env variable ONNXRUNTIME_SOLVER_INTRA_THREADS ** Sets the number of threads used to parallelize the execution within nodes ** Default value is 0 which means the default number of threads will be used.
  • solver parameter “execution_mode” or env variable ONNXRUNTIME_SOLVER_EXECUTION_MODE ** Controls whether the operators are executed in parallel or sequentially ** “parallel” - execute operators in the graph in parallel. ** “sequential” - execute operators in the graph sequentially. ** default value is “parallel” Note: Environment variable overrides solver parameter.


The CUDA Execution Provider enables hardware-accelerated computation on Nvidia CUDA-enabled GPUs and NVidia Jetson platforms

The supported CUDA and cuDNN version requirements are documented here.

You can specify the ID of a CUDA device where the NN model inference will be executed by setting a solver parameter “device_id” or env variable ONNXRUNTIME_SOLVER_DEVICE_ID:

  • default value is 0 Note: Environment variable overrides solver parameter.


With the TensorRT execution provider, the ONNX Runtime delivers better inferencing performance on the same hardware compared to generic GPU acceleration.

The TensorRT execution provider in the ONNX Runtime makes use of NVIDIA’s TensorRT Deep Learning inferencing engine to accelerate the ONNX model in their family of GPUs.

The supported TensorRT and CUDA versions are documented here.

You can configure TensorRT settings by environment variables. Find more details here.

To decrease the ONNX model load time, you can enable the TensorRT engine caching with the following environment variables:

  • ORT_TENSORRT_ENGINE_CACHE_ENABLE: Enable TensorRT engine caching. The default value is 0 (disabled), value 1 means caching is enabled.
  • ORT_TENSORRT_CACHE_PATH: Specify the path for the TensorRT engine and profile files if ORT_TENSORRT_ENGINE_CACHE_ENABLE is 1.

Rockchip NPU

Rockchip NPU is used for NN model acceleration on Rockchip RV1109 and RV1126.

Rockchip solver comes with the suffix rockchip.solver in its name.

Ambarella CVFlow

Ambarella CVFlow is used for NN model acceleration on Ambarella CV25, CV22 and CV2 chips.

Ambarella solver comes with the suffix ambarella.solver in its name.

SmartFace Embedded currently supports the following version of Ambarella SDK:

  • Ambarella SDK 3.0 - dependency on EazyAI library (
  • Ambarella SDK 2.5.8 - dependency on nnctrl (v0.3.0) and cavalry_mem (v0.0.6) libraries Please note there are different packages with different solvers required for various combinations of CV architecture and Ambarella SDK version


HailoRT is used for NN model acceleration on Axiomtek RSC101 AI box, but also other edge devices using Hailo-8 chip.

Hailo solver comes with the suffix hailo.solver in its name.

Follow instructions to install HailoRT.

TensorFlow Lite

TensorFlow Lite inference engine is used for NN model acceleration on NXP’s NPU hardware i.MX 8 series.

TensorFlow Lite solver comes with the suffix tflite.solver in its name.

Acceleration on the NPU is handled by the VX delegate plugin. VX Delegate enables accelerating the inference on on-chip hardware accelerator on i.MX 8 series. The VX Delegate directly uses the hardware accelerator driver (OpenVX with extension) to fully utilize the accelerator capabilities.

SmartFace Embedded currently uses Tensorflow Lite 2.9.1.

Supported TFLite solver parameters

  • tflite_solver.delegate
  • tflite_solver.num_threads
  • tflite_solver.vx_delegate.device_id
  • tflite_solver.vx_delegate.cache
  • tflite_solver.vx_delegate.cache_file_path

Supported TFLite solver environment variables


Note: Environment variable overrides solver parameter.

Delegate parameter

Enables users to specify a delegate that should be used for inference. For example, GPU delegate or VX delegate can be specified. Possible values of this parameter, whether it is set via generic solver API or using an environment variable, are:

  • “cpu”,
  • “gpu”,
  • “nnapi”,
  • “vx”.

The default value is cpu

Num Threads parameter

Allows users to specify the number of threads to be used for model inference. Currently, this parameter only affects the CPU delegate and does nothing if specified along with a different delegate. Possible values are non-float numbers starting from -1 to infinity`. Special behavior is triggered when the following values are provided:

  • -1 - let the tflite engine choose the most suitable number of threads.
  • 0 - Multithreading disabled.

The default value is -1, which means that the Tensorflow-lite engine will decide how many threads it’s going to use.

Vx Delegate device ID

In an environment with multiple VX NPUs, allows you to specify a device ID that a solver should use for inference.

The default value is 0.

Vx Delegate caching enabled and cache file path

Enables you to turn on the VX model caching. This is useful when the first VX model inference takes too long. It is capable of caching its models into files and reusing them in another instance of the process. You can turn this VX feature on by setting tflite_solver.vx_delegate.cache to “true”. By default, it will cache a model into a file with path: /tmp/tflite_solver.vxcache.

Default behavior: caching is disabled

SmartFace Embedded Input Solvers

SmartFace Embedded also supports other solvers for processing various inputs, like Camera input, Imager input, and GStreamer.

These solvers are not available for all supported platforms.

Camera input solver

This solver opens a camera device with a given index and outputs the image as an output tensor.


  • camera_index : u32 (index of linux camera device) Default:0
  • camera_width: u32 (width of the camera resolution) Default:640
  • camera_height: u32 (height of the camera resolution) Default:480
  • camera_fps: u32 (framerate of the camera) Default: 5
  • camera_format: string (frame format of the camera) Default: YUYV
    • “MJPEG”
    • “YUYV”
    • “GRAY”
    • “RAWRGB”
    • “NV12”

You can check available cameras using the command gst-device-monitor-1.0

GStreamer input solver

This solver runs a GStreamer pipeline, consumes its output and creates a tensor.

Default gst pipeline is v4l2src device={gst_video_device} ! video/x-raw,width={gst_width},height={gst_height},framerate=10/1 ! videoconvert ! video/x-raw,format=BGR | appsink name={app_sink} drop=True max-buffers=1 emit-signals=True async=false sync=false


  • gst_pipeline: string (complete gst pipeline that overrides every other parameter)
  • gst_width: u32 (width of the camera resolution)
  • gst_height: u32 (height of the camera resolution)
  • gst_app_sink_name: string (app sink name)
  • gst_video_device: string (linux video device e.g. /dev/video1)

A GStreamer pipeline can be used to obtain BGR frames from USBCam, Video file or RTSP stream.

Rules to follow when providing a custom Gstreamer pipeline:

  • Don’t specify any *sink for a pipeline, i.e appsink, autovideosink, …
  • Always make sure the pipeline outputs a video/x-raw,format=BGR frame
  • Check if width/height properties are correctly set
  • Check if your detector model accepts the same input shape as the pipeline provides

See also GStreamer pipelines for more information on how to configure the pipeline properly.