Rapid video processing

In addition to processing of live rtsp streams, SmartFace Platform is also capable to process video files. The processing of video files is faster than the actual length of uploaded videos - so called faster than real time processing. You can easily upload the video file and SmartFace Platfrom will process it in the same way as the real time video stream, i.e. detects faces, extracts biometric data, identifies detected person against watchlists, stores the information into the database (based on the data storage configuration) and provides ZeroMQ notifications. It is possible to upload multiple videos at a same time and process them in parallel. Pedestrian detection is currently not supported in Rapid video processing.

The processing speed depends on several variables:

  • length of the video

  • number of faces occurring in the video

  • computing power of your server

  • configuration of the processing

  • configuration of SmartFace Platform services

Rapid video processing explained

To process a video file you need to upload it to SmartFace Platform. To do so, create a Video record via an API call. Here as an example of such call:

Create Video record API Call (POST)

	"name": "name_of_the_video",
	"source": "C:\\folder\\my video.mp4",
	"enabled": true,
	"faceDetectorConfig": {
        "minFaceSize": 20,
		"maxFaceSize": 200,
		"maxFaces": 20,
		"confidenceThreshold": 450
	"faceDetectorResourceId": "cpu",
    "templateGeneratorResourceId": "cpu",
	"redetectionTime": 500,
	"templateGenerationTime": 250,
	"faceSaveStrategy": "All",
	"saveFrameImageData": true,
	"maskImagePath": string,
	"imageQuality": 90,
    "matchingConfig": {
 		"matchDetectedFaces": true,
        "maxResultsCount": 1

The source to the video file can be represented as an absolute file path (video file must be available locally on the same machine as SmartFace Platform) or by the http URL (e.g. file hosted by Minio.). Once the Video record is successfully created and the video file is available at the defined source SmartFace Platform automatically starts to process the video file.

SmartFace Platform processes the video in following steps:

  • Video file is read and sliced to multiple chunks
    • These chunks are processed in parallel
    • Number of chunks processed at a same time depends on resources available and the configuration of services
  • The detection of faces in processed video chunks is done in configured time intervals
    • The detection interval can be configured by a redetectionTime property defined in the API call
    • This detection interval is so called coarse grained interval (less frequent).
    • This feature allows SmartFace Platform to process the video faster and while using less resources
  • When a face is detected in the processed video, SmartFace Platform starts to perform more frequent detections around the time when the face was detected
    • This detection interval is so called fine grained interval (more frequent) and ensures that SmartFace Platform will not miss any important detections in video scene at that time
    • This detection interval can be configured by templateGenerationTime attribute
  • For each detected face SmartFace Platform performs an extraction
    • Generates biometric template and extracts further information like age, gender or whether a person is wearing a face mask
  • All detected faces are matched against watchlist members stored in SmartFace Platform’s watchlists
  • When the processing of the video is finished all detected faces are aggregated into tracklets
    • Tracklet represents person’s movement in particular video scene (time windows)
  • All the information from the processing is also provided to you via ZeroMQ messages

Processing speed

Several factors can affect the performance of the processing:

  • Video length

  • Number of faces occurring in the video

  • Configuration of processing

  • Computing power available

  • Configuration of SmartFace Platform services

Video length

Longer the video, more time SmartFace Platform will need for the processing.

Number of faces in the video

The detection of faces and the extraction is one of the most demanding processes in terms of computing power and time required. Therefore, more faces within the video, more time is needed for the processing. It is impossible to exactly calculate the speed of the processing only from approximate number of faces within the video, because multiple variables affects the actual speed.

This documentation should not provide you the tool for such an estimation, it should help you to understand that the processing of videos from various scenes will take a different time. For example, the security footage from an empty street with only few people on the scene will take less time to process, than the processing of the footage from crowded airport, even when the length of both videos will be the same and you will use a same configuration and machine.

Configuration of processing

There are multiple parameters of the configuration that directly affects duration of the video processing.

ChunkLength is the predefined length of the video chunks that are sliced from the processed video by SmartFace Platform. Default value is 10 minutes. We do not recommend to change this value. Only possible use case when lowering this value to speed up the processing can be when you need to process very short video. You can configure this property in SmartFace.appsettings.json file located next to the binary files.

"VideoRecordSlicing": {
   "ChunkLength": "00:10:00"

Less frequent detection interval and more frequent detection interval can be set by the following properties when creating VideoRecord via API call:

redetectionTimemillisecondscoarse grained interval

Defines the period in which SmartFace makes a detection on a frame from the video. So called coarse grained interval ensures that the video is processed faster and less resources is used.

Lower the time, the more detections will be done. In result more resources and time needed for the processing.

Higher the time, the less detections will be done. In result processing can miss some faces that appears between the detection interval.

It is recommended to set this value between 500-2000 ms, depending on the use case, available resources and expectations you have on how often the scene in the video will change and a possible person will appear.

templateGenerationTimemillisecondsfine grained interval

When a face is detected in the processed video, SmartFace Platform starts to perform more frequent detections around the time of detected face. This detection interval is so called fine grained interval and ensures that SmartFace Platform will not miss any important detections in video scene at that time.

It is recommended to set this value between 100-500 ms, depending on the use case, available resources and required details of detected people.

Aggregation of faces into tracklets (done in post processing) can be configured by VideoDataAggregator configuration parameters. Values can be defined in SmartFace.VideoDataAggregator.appsettings.json file located next to the binary files.

"Aggregation" : { 
   "AggregationSimilarityThreshold" : 45, 
   "TrackletAggregationPeriodMs" : 3000, 
   "MatchRequestParallelism" : 3, 
   "Export" : "Standard",
   "EntityNotificationsEnabled": false

AggregationSimilarityThreshold is a face similarity threshold defines whether detected faces are joined into one tracklet. If you experiencing that some tracklets contains faces that clearly does not belong to them, try increase this threshold to prevent this from happening. However increasing this values may result in creating multiple tracklets. We recommend to set this value at least to 45.

TrackletAggregationPeriodMsproperty defines maximum time window between face appearances to be considered as one tracklet. E.g. if person will reappear into scene after time greater than this period, it will be considered as a new tracklet.

MatchRequestParallelism defines parallelism that aggregator uses for detected faces

Export defines whether data are exported to SmartFace Platform database (Standard) or published into RabbitMQ (RabbitMq)

EntityNotificationsEnabled controls sending of ZeroMQ notifications based on data created by video processing. If false (default) only notifications about video record state changes are sent. If true then notifications are sent for newly created entities such as Faces, Tracklets and Matches.

Computing power available

Available computing power can significantly affects the performances and processing speed. More resources allow SmartFace Platform to better scale processes which are responsible for the processing of the video. If possible use CPUs with higher frequency (3+ GHz). In current version of the offline video processing SmartFace Platform does not utilize the GPU in full potential, so if possible we recommend to have more CPU cores, than to have a GPU on your server.

Scaling of SmartFace Platform Services for offline video processing

There are main processes that ensure the offline video processing. You can spawn multiple processes that are marked as Scalable (in table below) to achieve higher parallelism and faster processing in case your hardware provides enough resources in terms of CPU cores, RAM and GPU.

Binary nameService nameScalableRAM usageGPU acceleration
VideoReaderSFVideoReaderYes100 MB (may vary based on resolution of the video)N/A
VideoDataCollectorSFVideoDataCollectorNo~50 MBN/A
VideoDataAggregatorSFVideoDataAggregatorYes~100 MB (may vary by number of detected faces in the video)N/A
RpcDetectorSFDetect(Cpu/Gpu)Yes (Cpu only)~1.5 GBYes (Only one GPU accelerated process on machine supported)
RpcExtractorSFExtract(Cpu/Gpu)Yes (Cpu only)~1.7 GBYes (Only one GPU accelerated process on machine supported)

Default scaling recommendations

For each physical machine we recommend using this default configuration:

  • 2x instances of SFVideoReader service

  • 1x instance of SFDetectGpu service (if GPU available)

  • N instances of SFDetectCpu service - N depends on available cores (N may be lower in case machine does not have enough RAM during the processing)

  • N instances of SFExtractCpu service - N depends on available cores (N may be lower in case machine does not have enough RAM during the processing)

  • 1x instance of SFVideoDataCollector service

  • 1x instance of SFVideoDataAggregator service

⚠️ 1 cpu core needed per 1 instance of the service

Scaling recommendations

It is important to understand that to gain the best performance (thus lowest processing time) you need to create an optimal scaling setup configuration. Main aspect of finding the correct configuration depends highly on input videos that will be processed.

Table below should guide you to find the best configuration based on a key parameters of the video.

Expected videoRecommendation
Long video with few faces

Spawn additional VideoReader services to be able to decode and process the video faster.

There is no need for extra detect/extract services because there is very few faces expected.

Long video with a lot of faces (crowd analytics)Spawn additional extractor and detector services, because the high traffic of faces is expected.
Many short videos <= ~15 min

Spawn additional VideoDataAggregators (to parallelize the post processing of each video record).

Consider setting of ChunkLength of VideoSlicer to around ~ 1/2 of the video duration.

Spawn also additional VideoReaders to parallelize processing of multiple videos.

Depending on number of expected faces (spawn additional detectors/extractors).

Retrieving information about processing

After the video is processed, according to Export parameter configured for Aggregation, there are several datasets available.

In case when Export is configured to Standard:

Aggregated tracklets with faces and match results.Tracklets, faces and match results are created in the database.

Notification about updated state of VideoRecord


VideoState from notification can be either Processed/Error, depending on successful/unsuccessful aggregation.

When Export is configured to RabbitMq:

Aggregated tracklets with faces.Tracklets with their faces are published as messages to durable OfflineVideo_DataExport queue in RabbitMQ.

Notification about updated state of VideoRecord


VideoState from notification can be either Processed/Error, depending on successful/unsuccessful aggregation.

Entity notifications

By default no entity notifications are sent about created entities. Using parameter EntityNotificationsEnabled configured for Aggregation, the notifications can be enabled by setting the parameter to true.

When enabled, following notifications are produced:


Notification about saved database entities:


These notification are sent for every created tracklet/face/successful match result.

Note that these notifications are produced only when the entities are actually stored in a database. This means that when Export is configured to RabbitMq these notifications are not produced

Fast watchlist notifications:


Notifications about successful/unsuccessful matches. No match results are not persisted in database.

Watchlist matching

By specifying matchingConfig in the API request it is possible to configure how faces detected in Rapid Video Processing are matched against watchlists present in SmartFace platform. See following table for more information.


Specifies whether detected faces should be matched. When true all detected faces (regardless of FaceSaveStrategy) are matched against watchlists. When false no matching is performed.

Default value is true


Specifies how many MatchResults to generate per face. If more than one watchlist member matches with higher score then specified on the watchlist, then up to N such MatchResults will be generated, where N is value of maxResultsCount property.

Note that this behavior is applied per each watchlist, so even when value of maxResultsCount is set to 1 and a face from processed video matches with members from multiple watchlists then multiple MatchResults will be produces for single face.

Default value is 1

Clearing Video record

⚠️ Use with caution as this action is destructive!

In case the video record processing ends in Error state or the processing results are not satisfactory (e.g. too many or too few faces were detected) the video record can be reprocessed. To perform the reprocessing, first all created data must be cleared.

To clear all data produced from the Video record, use the POST /api/v1/VideoRecords/{id}/Clear endpoint on the REST API. This call will do the following:

  • Delete all data produced by the video record processing, such as

    • Detected faces

    • Tracklets

    • MatchResults

    • Frames

  • Set the video back to Ready state, from which the settings can be modified and processing started again using PUT /api/v1/VideoRecords REST API endpoint.

Without any change in the configuration, the video processing will yield the exact same result, expect for MatchResults, as they depend on the configuration of Watchlists and WatchlistMembers at the time of processing.