Rapid Video Processing

In addition to processing live RTSP streams, SmartFace Platform is also capable of processing video files. The processing of video files is faster than the actual length of uploaded videos - so-called faster than real-time processing.

You can easily upload the video file and SmartFace Platfrom will process it in the same way as the real-time video stream, i.e. detects faces, extracts biometric data, identifies detected person against watchlists, stores the information in databases (based on the data storage configuration) and provides notifications either via GraphQL API or ZeroMQ. It is also possible to upload multiple videos at the same time and process them in parallel.

Pedestrian and object detection is currently not supported in Rapid Video Processing.

Initiating Rapid Video Processing

To process a video file you need to upload it to SmartFace Platform. You can do it either using the SmartFace Station or you can create a Video record via the REST API call. Here is an example of such a call using POST /api/v1/VideoRecords endpoint:

{
	"name": "name_of_the_video",
	"source": "C:\\folder\\my video.mp4",
	"enabled": true,
	"faceDetectorConfig": {
        "minFaceSize": 20,
		"maxFaceSize": 200,
		"maxFaces": 20,
		"confidenceThreshold": 450
	},
	"faceDetectorResourceId": "cpu",
    "templateGeneratorResourceId": "cpu",
	"redetectionTime": 500,
	"templateGenerationTime": 250,
	"faceSaveStrategy": "All",
	"saveFrameImageData": true,
	"maskImagePath": string,
	"imageQuality": 90,
    "matchingConfig": {
 		"matchDetectedFaces": true,
        "maxResultsCount": 1
    }
}

The source of the video file can be represented as an absolute file path (the video file must be available locally on the same machine as SmartFace Platform) or by the HTTP URL (e.g. file hosted by Min.io).

Once the Video record is successfully created and the video file is available at the defined source SmartFace Platform automatically starts to process the video file.

Processing Steps

1. Reading and Slicing video file(s) into chunks

These chunks are processed in parallel
The number of chunks processed at the same time depends on the resources available and the configuration of services

2. Face detection in processed video chunks in configured time intervals

The detection interval can be configured by a redetectionTime property defined in the REST API call
This detection interval is a so-called coarse-grained interval (less frequent).
This feature allows the SmartFace Platform to process the video faster while using fewer resources

3. More frequent detections around the time when the face was detected are performed

This detection interval is a so-called fine-grained interval (more frequent) and ensures that the SmartFace Platform will not miss any important detections in video scene at that time
This detection interval can be configured by templateGenerationTime attribute
For each detected face SmartFace Platform performs an extraction. Generates biometric template and extracts further information like age, gender or whether a person is wearing a face mask

4. All detected faces are matched against watchlist members stored in SmartFace Platform’s watchlists

When the processing of the video is finished all detected faces are aggregated into tracklets
Tracklet represents a person’s movement in a particular video scene (time windows)

For more information about the matching please read here.

5. All the data from the processing is stored in the DB based on your setup
For more information about the resulting data and retrieving of information please read here.

6. All the information from the processing is also provided to you via APIs - GraphQL Subscriptions and ZeroMQ messages
For more information about the notifications please read here.

Processing speed

The processing speed of a Rapid Video Investigation depends on several factors that can affect the performance of the processing:

Video length
The number of faces occurring in the video
Configuration of processing
Computing power available
Configuration of SmartFace Platform services

Video length

The longer the video, the more time SmartFace Platform will need for the processing.

Number of occurring faces in the video

The detection of faces and the extraction is one of the most demanding processes in terms of computing power and time required. Therefore, the more faces within the video the more time is needed for the processing. It is impossible to calculate the speed of the processing only from the approximate number of faces within the video because multiple variables affect the actual speed.

This documentation should not provide you the tool for such an estimation, it should help you to understand that the processing of videos from various scenes will take a different time. For example, the security footage from an empty street with only a few people on the scene will take less time to process, than the processing of the footage from a crowded airport, even when the length of both videos is the same and you will use the same configuration and machine.

Configuration of processing

There are multiple parameters of the configuration that directly affect the duration of the video processing.

ChunkLength is the predefined length of the video chunks that are sliced from the processed video by the SmartFace Platform. The default value is 10 minutes. We do not recommend changing this value. The only possible use case when lowering this value to speed up the processing can be when you need to process a very short video.

You can configure this property for a Windows installation in SmartFace.appsettings.json file located next to the binary files.

"VideoRecordSlicing": {
   "ChunkLength": "00:10:00"
}

For a Docker installation configuration, please add such a line to the .env file in your installation directory.

VideoRecordSlicing__ChunkLength=00:10:00

Less frequent detection intervals and more frequent detection intervals can be set by the following properties when creating VideoRecord via REST API call:

Name	Value	Description	Description
redetectionTime	milliseconds	coarse grained interval	Defines the period in which SmartFace makes a detection on a frame from the video. So-called coarse-grained interval ensures that the video is processed faster and fewer resources are used. Lower the time, the more detections will be done. As a result, more resources and time are needed for the processing. The higher the time, the fewer detections will be done. As a result, processing can miss some faces that appear between the detection interval. It is recommended to set this value between `500` - `2000` ms, depending on the use case, available resources and expectations you have on how often the scene in the video will change and a possible person will appear.
templateGenerationTime	milliseconds	fine grained interval	When a face is detected in the processed video, the SmartFace Platform starts to perform more frequent detections around the time of the detected face. This detection interval is a so-called fine-grained interval and ensures that the SmartFace Platform will not miss any important detections in the video scene at that time. It is recommended to set this value between 100-500 ms, depending on the use case, available resources and required details of detected people.

Aggregation of faces into tracklets (done in post-processing) can be configured by VideoDataAggregator configuration parameters.

For the Windows installation configuration the values can be defined in SmartFace.VideoDataAggregator.appsettings.json file located next to the binary files.

"Aggregation" : { 
   "AggregationSimilarityThreshold" : 45, 
   "TrackletAggregationPeriodMs" : 3000, 
   "MatchRequestParallelism" : 3, 
   "Export" : "Standard",
   "EntityNotificationsEnabled": false
}

For a Docker installation configuration, please add such lines to the .env file in your installation directory.

Aggregation__AggregationSimilarityThreshold=45 
Aggregation__TrackletAggregationPeriodMs=3000
Aggregation__MatchRequestParallelism=3
Aggregation__Export=Standard
Aggregation__EntityNotificationsEnabled=false

AggregationSimilarityThreshold is a face similarity threshold and it defines whether detected faces are joined into one tracklet. If you are experiencing that some tracklets contain faces that clearly do not belong to them, try to increase this threshold to prevent this from happening. However, increasing these values may result in creating multiple tracklets. We recommend to set this value at least to 45.

TrackletAggregationPeriodMsproperty defines maximum time window between face appearances to be considered as one tracklet. E.g. if person will reappear into scene after time greater than this period, it will be considered as a new tracklet.

MatchRequestParallelism defines parallelism that aggregator uses for detected faces

Export defines whether data are exported to SmartFace Platform database (Standard) or published into RabbitMQ (RabbitMq)

EntityNotificationsEnabled controls sending of ZeroMQ notifications based on data created by video processing. If false (default) only notifications about video record state changes are sent. If true then notifications are sent for newly created entities such as Faces, Tracklets and Matches.

Computing power available

Available computing power can significantly affect the performance and processing speed. More resources allow the SmartFace Platform to better scale processes that responsible for the processing of the video. Due to the nature of involved processes it is recommended to use a server with a higher CPU performance (more cores, higher CPU frequency 3+ GHz) as GPU utilization does have lower impact on the final results.

Configuration of SmartFace Platform services

There are main processes that ensure the offline video processing. You can spawn multiple processes that are marked as Scalable (in the table below) to achieve higher parallelism and faster processing in case your hardware provides enough resources in terms of CPU cores, RAM and GPU.

Binary name	Service name	Scalable	RAM usage	GPU acceleration
VideoReader	SFVideoReader	Yes	100 MB (may vary based on resolution of the video)	N/A
VideoDataCollector	SFVideoDataCollector	No	~50 MB	N/A
VideoDataAggregator	SFVideoDataAggregator	Yes	~100 MB (may vary by number of detected faces in the video)	N/A
RpcDetector	SFDetect(Cpu/Gpu)	Yes (Cpu only)	~1.5 GB	Yes (Only one GPU accelerated process on machine supported)
RpcExtractor	SFExtract(Cpu/Gpu)	Yes (Cpu only)	~1.7 GB	Yes (Only one GPU accelerated process on machine supported)

Default scaling recommendations

For each physical machine we recommend using this default configuration:

2x instances of SFVideoReader service
1x instance of SFDetectGpu service (if GPU available)
N instances of SFDetectCpu service - N depends on available cores (N may be lower in the case machine does not have enough RAM during the processing)
N instances of SFExtractCpu service - N depends on available cores (N may be lower in the case machine does not have enough RAM during the processing)
1x instance of SFVideoDataCollector service
1x instance of SFVideoDataAggregator service

⚠️ 1 cpu core needed per 1 instance of the service

Scaling recommendations

It is important to understand that to gain the best performance (thus the lowest processing time) you need to create an optimal scaling setup configuration. Main aspect of finding the correct configuration depends highly on input videos that will be processed.

Table below should guide you to find the best configuration based on key parameters of the video.

Expected video	Recommendation
Long video with few faces	Spawn additional VideoReader services to be able to decode and process the video faster. There is no need for extra detect/extract services because there are very few faces expected.
Long video with a lot of faces (crowd analytics)	Spawn additional extractor and detector services because a high traffic of faces is expected.
Many short videos <= ~15 min	Spawn additional VideoDataAggregators (to parallelize the post processing of each video record). Consider setting `ChunkLength` of VideoSlicer to around ~ 1/2 of the video duration. Spawn additional VideoReaders to parallelize processing of multiple videos. Depending on the number of expected faces, spawn additional detectors/extractors.

Watchlist matching

By specifying matchingConfig in the API request it is possible to configure how faces detected in Rapid Video Processing are matched against watchlists present in SmartFace platform. See following table for more information.

Property Description

matchDetectedFaces Specifies whether detected faces should be matched. When true all detected faces (regardless of FaceSaveStrategy) are matched against watchlists. When false no matching is performed.

Default value is true.

maxResultsCount Specifies how many MatchResults to generate per face. If more than one watchlist member matches with a higher score than specified on the watchlist, then up to N such MatchResults will be generated, where N is the value of maxResultsCount property.

Note that this behavior is applied per each watchlist, so even when the value of maxResultsCount is set to 1 and a face from processed video matches with members from multiple watchlists, then multiple MatchResults will be produced for a single face.

Default value is 1.

Property	Description
matchDetectedFaces	Specifies whether detected faces should be matched. When `true` all detected faces (regardless of FaceSaveStrategy) are matched against watchlists. When `false` no matching is performed. Default value is `true`.
maxResultsCount	Specifies how many MatchResults to generate per face. If more than one watchlist member matches with a higher score than specified on the watchlist, then up to N such MatchResults will be generated, where N is the value of `maxResultsCount` property. Note that this behavior is applied per each watchlist, so even when the value of `maxResultsCount` is set to 1 and a face from processed video matches with members from multiple watchlists, then multiple MatchResults will be produced for a single face. Default value is `1`.

Retrieving information about processing

After the video is processed, according to Export parameter configured for Aggregation, there are several datasets available.

In case when Export is configured to Standard:

Entity	Description
Aggregated tracklets with faces and match results.	Tracklets, faces, and match results are created in the database.
Notification about updated state of VideoRecord `videoRecords.update`	VideoState from notification can be either Processed/Error, depending on successful/unsuccessful aggregation.

When Export is configured to RabbitMq:

Entity	Description
Aggregated tracklets with faces.	Tracklets with their faces are published as messages to durable OfflineVideo_DataExport queue in RabbitMQ.
Notification about updated state of VideoRecord `videoRecords.update`	VideoState from notification can be either Processed/Error, depending on successful/unsuccessful aggregation.

Entity notifications

By default no entity notifications are sent about created entities. Using parameter EntityNotificationsEnabled configured for Aggregation, the notifications can be enabled by setting the parameter to true.

When enabled, following notifications are produced:

Entity	Description
Notification about saved database entities: `faces.insert` `faces.extracted` `tracklets.completed` `matchResults.match.insert`	These notifications are sent for every created tracklet/face/successful match result. Note that these notifications are produced only when the entities are actually stored in a database. This means that when Export is configured to RabbitMq these notifications are not produced.
Fast watchlist notifications: `matchResults.match` `matchResults.nomatch`	Notifications about successful/unsuccessful matches. No match results are not persisted in the database.

Clearing Video record

⚠️ Use with caution as this action is destructive!

In case the video record processing ends in Error state or the processing results are not satisfactory (e.g. too many or too few faces were detected) the video record can be reprocessed. To perform the reprocessing, first all created data must be cleared.

To clear all data produced from the Video record, use the POST /api/v1/VideoRecords/{id}/Clear endpoint on the REST API. This call will do the following:

Delete all data produced by the video record processing, such as
- Detected faces
- Tracklets
- MatchResults
- Frames
Set the video back to Ready state, from which the settings can be modified and processing started again using PUT /api/v1/VideoRecords REST API endpoint.

Without any change in the configuration, the video processing will yield the exact same result, expect for MatchResults, as they depend on the configuration of Watchlists and WatchlistMembers at the time of processing.

Rapid Video Processing

Initiating Rapid Video Processing#

Processing Steps#

Processing speed#

Video length#

Number of occurring faces in the video#

Configuration of processing#

Computing power available#

Configuration of SmartFace Platform services#

Default scaling recommendations#

Scaling recommendations#

Watchlist matching#

Retrieving information about processing#

Entity notifications#

Clearing Video record#