Rapid video processing
In addition to processing of live rtsp streams, SmartFace Platform is also capable to process video files. The processing of video files is faster than the actual length of uploaded videos - so called faster than real time processing. You can easily upload the video file and SmartFace Platfrom will process it in the same way as the real time video stream, i.e. detects faces, extracts biometric data, identifies detected person against watchlists, stores the information into the database (based on the data storage configuration) and provides ZeroMQ notifications. It is possible to upload multiple videos at a same time and process them in parallel. Pedestrian detection is currently not supported in Rapid video processing.
The processing speed depends on several variables:
length of the video
number of faces occurring in the video
computing power of your server
configuration of the processing
configuration of SmartFace Platform services
Rapid video processing explained
To process a video file you need to upload it to SmartFace Platform. To do so, create a Video record via an API call. Here as an example of such call:
Create Video record API Call (POST)
/api/v1/VideoRecords
{
"name": "name_of_the_video",
"source": "C:\\folder\\my video.mp4",
"enabled": true,
"faceDetectorConfig": {
"minFaceSize": 20,
"maxFaceSize": 200,
"maxFaces": 20,
"confidenceThreshold": 450
},
"faceDetectorResourceId": "cpu",
"templateGeneratorResourceId": "cpu",
"redetectionTime": 500,
"templateGenerationTime": 250,
"faceSaveStrategy": "All",
"saveFrameImageData": true,
"maskImagePath": string,
"imageQuality": 90,
"matchingConfig": {
"matchDetectedFaces": true,
"maxResultsCount": 1
}
}
The source to the video file can be represented as an absolute file path (video file must be available locally on the same machine as SmartFace Platform) or by the http URL (e.g. file hosted by Minio.). Once the Video record
is successfully created and the video file is available at the defined source
SmartFace Platform automatically starts to process the video file.
SmartFace Platform processes the video in following steps:
- Video file is read and sliced to multiple chunks
- These chunks are processed in parallel
- Number of chunks processed at a same time depends on resources available and the configuration of services
- The detection of faces in processed video chunks is done in configured time intervals
- The detection interval can be configured by a
redetectionTime
property defined in the API call - This detection interval is so called coarse grained interval (less frequent).
- This feature allows SmartFace Platform to process the video faster and while using less resources
- The detection interval can be configured by a
- When a face is detected in the processed video, SmartFace Platform starts to perform more frequent detections around the time when the face was detected
- This detection interval is so called fine grained interval (more frequent) and ensures that SmartFace Platform will not miss any important detections in video scene at that time
- This detection interval can be configured by
templateGenerationTime
attribute
- For each detected face SmartFace Platform performs an extraction
- Generates biometric template and extracts further information like age, gender or whether a person is wearing a face mask
- All detected faces are matched against watchlist members stored in SmartFace Platform’s watchlists
- When the processing of the video is finished all detected faces are aggregated into tracklets
- Tracklet represents person’s movement in particular video scene (time windows)
- All the data from the processing is stored in the DB based on your data storage configuration
- All the information from the processing is also provided to you via ZeroMQ messages
Processing speed
Several factors can affect the performance of the processing:
Video length
Number of faces occurring in the video
Configuration of processing
Computing power available
Configuration of SmartFace Platform services
Video length
Longer the video, more time SmartFace Platform will need for the processing.
Number of faces in the video
The detection of faces and the extraction is one of the most demanding processes in terms of computing power and time required. Therefore, more faces within the video, more time is needed for the processing. It is impossible to exactly calculate the speed of the processing only from approximate number of faces within the video, because multiple variables affects the actual speed.
This documentation should not provide you the tool for such an estimation, it should help you to understand that the processing of videos from various scenes will take a different time. For example, the security footage from an empty street with only few people on the scene will take less time to process, than the processing of the footage from crowded airport, even when the length of both videos will be the same and you will use a same configuration and machine.
Configuration of processing
There are multiple parameters of the configuration that directly affects duration of the video processing.
ChunkLength
is the predefined length of the video chunks that are sliced from the processed video by SmartFace Platform. Default value is 10 minutes. We do not recommend to change this value. Only possible use case when lowering this value to speed up the processing can be when you need to process very short video. You can configure this property in SmartFace.appsettings.json
file located next to the binary files.
"VideoRecordSlicing": {
"ChunkLength": "00:10:00"
}
Less frequent detection interval and more frequent detection interval can be set by the following properties when creating VideoRecord
via API call:
Name | Value | Descrption | Descrption |
---|---|---|---|
redetectionTime | milliseconds | coarse grained interval | Defines the period in which SmartFace makes a detection on a frame from the video. So called coarse grained interval ensures that the video is processed faster and less resources is used. Lower the time, the more detections will be done. In result more resources and time needed for the processing. Higher the time, the less detections will be done. In result processing can miss some faces that appears between the detection interval. It is recommended to set this value between 500-2000 ms, depending on the use case, available resources and expectations you have on how often the scene in the video will change and a possible person will appear. |
templateGenerationTime | milliseconds | fine grained interval | When a face is detected in the processed video, SmartFace Platform starts to perform more frequent detections around the time of detected face. This detection interval is so called fine grained interval and ensures that SmartFace Platform will not miss any important detections in video scene at that time. It is recommended to set this value between 100-500 ms, depending on the use case, available resources and required details of detected people. |
Aggregation of faces into tracklets (done in post processing) can be configured by VideoDataAggregator
configuration parameters. Values can be defined in SmartFace.VideoDataAggregator.appsettings.json
file located next to the binary files.
"Aggregation" : {
"AggregationSimilarityThreshold" : 45,
"TrackletAggregationPeriodMs" : 3000,
"MatchRequestParallelism" : 3,
"Export" : "Standard",
"EntityNotificationsEnabled": false
}
AggregationSimilarityThreshold
is a face similarity threshold defines whether detected faces are joined into one tracklet. If you experiencing that some tracklets contains faces that clearly does not belong to them, try increase this threshold to prevent this from happening. However increasing this values may result in creating multiple tracklets. We recommend to set this value at least to 45.
TrackletAggregationPeriodMs
property defines maximum time window between face appearances to be considered as one tracklet. E.g. if person will reappear into scene after time greater than this period, it will be considered as a new tracklet.
MatchRequestParallelism
defines parallelism that aggregator uses for detected faces
Export
defines whether data are exported to SmartFace Platform database (Standard
) or published into RabbitMQ (RabbitMq
)
EntityNotificationsEnabled
controls sending of ZeroMQ notifications based on data created by video processing. If false
(default) only notifications about video record state changes are sent. If true
then notifications are sent for newly created entities such as Faces, Tracklets and Matches.
Computing power available
Available computing power can significantly affects the performances and processing speed. More resources allow SmartFace Platform to better scale processes which are responsible for the processing of the video. If possible use CPUs with higher frequency (3+ GHz). In current version of the offline video processing SmartFace Platform does not utilize the GPU in full potential, so if possible we recommend to have more CPU cores, than to have a GPU on your server.
Scaling of SmartFace Platform Services for offline video processing
There are main processes that ensure the offline video processing. You can spawn multiple processes that are marked as Scalable (in table below) to achieve higher parallelism and faster processing in case your hardware provides enough resources in terms of CPU cores, RAM and GPU.
Binary name | Service name | Scalable | RAM usage | GPU acceleration |
---|---|---|---|---|
VideoReader | SFVideoReader | Yes | 100 MB (may vary based on resolution of the video) | N/A |
VideoDataCollector | SFVideoDataCollector | No | ~50 MB | N/A |
VideoDataAggregator | SFVideoDataAggregator | Yes | ~100 MB (may vary by number of detected faces in the video) | N/A |
RpcDetector | SFDetect(Cpu/Gpu) | Yes (Cpu only) | ~1.5 GB | Yes (Only one GPU accelerated process on machine supported) |
RpcExtractor | SFExtract(Cpu/Gpu) | Yes (Cpu only) | ~1.7 GB | Yes (Only one GPU accelerated process on machine supported) |
Default scaling recommendations
For each physical machine we recommend using this default configuration:
2x instances of SFVideoReader service
1x instance of SFDetectGpu service (if GPU available)
N instances of SFDetectCpu service - N depends on available cores (N may be lower in case machine does not have enough RAM during the processing)
N instances of SFExtractCpu service - N depends on available cores (N may be lower in case machine does not have enough RAM during the processing)
1x instance of SFVideoDataCollector service
1x instance of SFVideoDataAggregator service
Scaling recommendations
It is important to understand that to gain the best performance (thus lowest processing time) you need to create an optimal scaling setup configuration. Main aspect of finding the correct configuration depends highly on input videos that will be processed.
Table below should guide you to find the best configuration based on a key parameters of the video.
Expected video | Recommendation |
---|---|
Long video with few faces | Spawn additional VideoReader services to be able to decode and process the video faster. There is no need for extra detect/extract services because there is very few faces expected. |
Long video with a lot of faces (crowd analytics) | Spawn additional extractor and detector services, because the high traffic of faces is expected. |
Many short videos <= ~15 min | Spawn additional VideoDataAggregators (to parallelize the post processing of each video record). Consider setting of ChunkLength of VideoSlicer to around ~ 1/2 of the video duration. Spawn also additional VideoReaders to parallelize processing of multiple videos. Depending on number of expected faces (spawn additional detectors/extractors). |
Retrieving information about processing
After the video is processed, according to Export parameter configured for Aggregation, there are several datasets available.
In case when Export is configured to Standard
:
Entity | Description |
---|---|
Aggregated tracklets with faces and match results. | Tracklets, faces and match results are created in the database. |
Notification about updated state of VideoRecord
| VideoState from notification can be either Processed/Error, depending on successful/unsuccessful aggregation. |
When Export is configured to RabbitMq
:
Entity | Description |
---|---|
Aggregated tracklets with faces. | Tracklets with their faces are published as messages to durable OfflineVideo_DataExport queue in RabbitMQ. |
Notification about updated state of VideoRecord
| VideoState from notification can be either Processed/Error, depending on successful/unsuccessful aggregation. |
Entity notifications
By default no entity notifications are sent about created entities. Using parameter EntityNotificationsEnabled
configured for Aggregation, the notifications can be enabled by setting the parameter to true
.
When enabled, following notifications are produced:
Entity | Description |
---|---|
Notification about saved database entities:
| These notification are sent for every created tracklet/face/successful match result. Note that these notifications are produced only when the entities are actually stored in a database. This means that when Export is configured to RabbitMq these notifications are not produced |
Fast watchlist notifications:
| Notifications about successful/unsuccessful matches. No match results are not persisted in database. |
Watchlist matching
By specifying matchingConfig
in the API request it is possible to configure how faces detected in Rapid Video Processing are matched against watchlists present in SmartFace platform. See following table for more information.
Property | Descrption |
---|---|
matchDetectedFaces | Specifies whether detected faces should be matched. When Default value is |
maxResultsCount | Specifies how many MatchResults to generate per face. If more than one watchlist member matches with higher score then specified on the watchlist, then up to N such MatchResults will be generated, where N is value of Note that this behavior is applied per each watchlist, so even when value of Default value is |
Clearing Video record
In case the video record processing ends in Error state or the processing results are not satisfactory (e.g. too many or too few faces were detected) the video record can be reprocessed. To perform the reprocessing, first all created data must be cleared.
To clear all data produced from the Video record, use the POST /api/v1/VideoRecords/{id}/Clear
endpoint on the REST API. This call will do the following:
Delete all data produced by the video record processing, such as
Detected faces
Tracklets
MatchResults
Frames
Set the video back to Ready state, from which the settings can be modified and processing started again using
PUT /api/v1/VideoRecords
REST API endpoint.
Without any change in the configuration, the video processing will yield the exact same result, expect for MatchResults, as they depend on the configuration of Watchlists and WatchlistMembers at the time of processing.