Accuracy
Face Identification
Face Identification accuracy describes how reliably the system can find the correct identity within a gallery of enrolled faces.
Two key metrics define identification performance:
- FPIR (False Positive Identification Rate) – probability that the system incorrectly returns a match when it should not.
- FNIR (False Negative Identification Rate) – probability that the system fails to return a correct match when one exists.
These are opposing forces:
- Increasing the identification (matching) threshold reduces FPIR (fewer false matches), but increases FNIR (more missed matches).
- Decreasing the threshold reduces FNIR, but increases FPIR.
Choosing the right threshold depends on your use case:
| Scenario | Typical Priority |
|---|---|
| Border control / secure access | Low FPIR |
| Watchlist / surveillance | Low FNIR |
Two ways of expressing performance are often used:
- FPIR@FNIR = X% – FPIR measured when FNIR is fixed at X%.
- FNIR@FPIR = X% – FNIR measured when FPIR is fixed at X%.
A large threshold is not always better – if it is too strict, even genuine users may fail to match (high FNIR).
Conversely, a small threshold may yield too many false matches (high FPIR).
The goal is to find a balanced operating point for your gallery size and use case.
Template Extraction Algorithms
SmartFace provides two algorithms for extracting biometric templates:
| Algorithm | Description | Performance | Recommended Use |
|---|---|---|---|
| Balanced | Default extractor providing a good trade-off between speed and accuracy. | Fast | Video surveillance, small to mid-size galleries |
| Accurate | Enhanced extractor optimized for maximum precision. | Average | Access control systems, large galleries, high-security applications |
The Accurate extractor yields tighter score distributions (better separation of genuine and impostor scores), improving FNIR at low FPIR levels.
However, it is computationally heavier and therefore not the default.
Importance of Dataset and Quality
Identification accuracy depends strongly on:
- The dataset (lighting, pose, demographics, sensor type)
- Enrollment image quality (sharp, frontal, ICAO-compliant images)
- Probe image quality (pose, illumination, blur, occlusions)
Our results are based on internal evaluation datasets, but you must always validate performance on your own data.
Real-world conditions vary greatly and can influence optimal thresholds.
The higher the threshold you configure, the higher image quality you must ensure — both for enrollment and identification.
Poor-quality or non-frontal captures will lead to degraded results even at the same threshold.
Measured Results
The following tables present real measured results for identification accuracy across different gallery sizes.
Each scenario corresponds to a typical deployment scale and complexity.
Gallery ≈5.6 k images
| FPIR Level | FNIR Balanced (%) | Threshold Balanced | FNIR Accurate (%) | Threshold Accurate |
|---|---|---|---|---|
| 1:2 | 48.138 | 20.010 | 35.240 | 20.652 |
| 1:5 | 56.276 | 23.832 | 41.843 | 24.554 |
| 1:10 | 63.724 | 26.442 | 47.908 | 26.840 |
| 1:20 | 70.633 | 29.102 | 54.741 | 29.229 |
| 1:50 | 79.616 | 33.345 | 65.950 | 33.213 |
| 1:100 | 84.031 | 35.770 | 76.161 | 37.260 |
| 1:200 | 91.286 | 41.370 | 87.946 | 43.927 |
| 1:500 | 98.004 | 55.300 | 99.347 | 66.100 |
| 1:1000 | 99.309 | 65.056 | 99.693 | 68.627 |
| 1:2000 | 99.655 | 69.705 | 99.770 | 70.481 |
Gallery ≈10 k images
| FPIR Level | FNIR Balanced (%) | Threshold Balanced | FNIR Accurate (%) | Threshold Accurate |
|---|---|---|---|---|
| 1:2 | 0.000 | 29.318 | 0.000 | 28.568 |
| 1:5 | 0.000 | 33.239 | 0.000 | 32.620 |
| 1:10 | 0.000 | 35.937 | 0.000 | 35.294 |
| 1:20 | 0.000 | 38.682 | 0.000 | 37.212 |
| 1:50 | 0.000 | 41.516 | 0.000 | 40.306 |
| 1:100 | 0.000 | 43.988 | 0.000 | 42.259 |
| 1:200 | 0.000 | 45.749 | 0.000 | 44.394 |
| 1:500 | 0.000 | 49.185 | 0.000 | 45.466 |
| 1:1000 | 0.000 | 49.418 | 0.000 | 46.438 |
Gallery ≈100 k images
| FPIR Level | FNIR Balanced (%) | Threshold Balanced | FNIR Accurate (%) | Threshold Accurate |
|---|---|---|---|---|
| 1:2 | 0.047 | 35.338 | 0.023 | 34.375 |
| 1:5 | 0.090 | 39.921 | 0.027 | 38.223 |
| 1:10 | 0.123 | 42.920 | 0.033 | 40.982 |
| 1:20 | 0.197 | 45.568 | 0.033 | 43.460 |
| 1:50 | 0.317 | 49.471 | 0.047 | 46.495 |
| 1:100 | 0.420 | 52.439 | 0.067 | 49.680 |
| 1:200 | 0.577 | 55.270 | 0.097 | 52.670 |
| 1:500 | 0.823 | 58.942 | 0.150 | 56.472 |
| 1:1000 | 1.107 | 62.328 | 0.277 | 61.368 |
| 1:2000 | 1.510 | 65.709 | 0.437 | 64.943 |
| 1:5000 | 4.057 | 72.195 | 3.453 | 70.649 |
| 1:10000 | 8.180 | 76.240 | 10.007 | 74.262 |
| 1:20000 | 12.317 | 78.844 | 13.927 | 76.611 |
Gallery ≈1.7 M images
| FPIR Level | FNIR Balanced (%) | Threshold Balanced | FNIR Accurate (%) | Threshold Accurate |
|---|---|---|---|---|
| 1:2 | 0.474 | 59.814 | 0.190 | 57.635 |
| 1:5 | 0.669 | 66.713 | 0.244 | 64.896 |
| 1:10 | 0.949 | 69.133 | 0.298 | 66.694 |
| 1:20 | 1.297 | 70.629 | 0.370 | 67.575 |
| 1:50 | 1.784 | 72.350 | 0.614 | 68.608 |
| 1:100 | 2.367 | 73.637 | 0.836 | 69.283 |
| 1:200 | 3.095 | 75.010 | 1.179 | 69.930 |
| 1:500 | 4.924 | 77.072 | 1.956 | 71.111 |
| 1:1000 | 8.543 | 79.657 | 3.899 | 72.491 |
| 1:2000 | 20.071 | 83.716 | 22.935 | 81.938 |
| 1:5000 | 83.674 | 95.280 | 63.914 | 90.745 |
| 1:10000 | 99.968 | 99.278 | 99.937 | 98.386 |
| 1:20000 | 99.968 | 99.700 | 99.968 | 99.412 |
Choosing the Right Threshold
Identification accuracy is a trade-off between FPIR, FNIR, and processing performance.
Determine acceptable FPIR for your system.
- For 1 false match in 1,000 searches → FPIR = 1:1000.
- For 1 false match in 10,000 searches → FPIR = 1:10000.
Select the corresponding threshold from the table above for your gallery size and algorithm.
Validate on your own data — real operational conditions will influence final FNIR.
Maintain image quality.
- High thresholds demand high-quality ICAO-compliant, frontal, well-lit images.
- Poor images may require lowering the threshold or improving the camera setup.
Example
For a gallery of 100,000 identities targeting 1 false match per 10,000 searches (FPIR = 1:1000), you can start with thresholds around ~76 for Balanced or ~74~ for Accurate then fine-tune using your own dataset.
Face Liveness
Passive Liveness
The final decision of whether a face is real, or a spoof, should be determined by the passive liveness score and threshold. If the score is above the threshold, this can be interpreted as accepted. If the score is below the threshold, it is rejected.
Setting the correct threshold depends on the security/convenience balance that is required for the specific use case.
The results below are for the currently used IFace version 5.14.
Thresholds for distant passive liveness:
| False Accept Rate (Level) | False Reject Rate [%] | Threshold |
|---|---|---|
| 1:5 | 0.055 | 67.01692199707031 |
| 1:10 | 0.068 | 69.2877197265625 |
| 1:50 | 0.806 | 78.48787689208984 |
| 1:100 | 2.432 | 83.84744262695312 |
| 1:500 | 10.700 | 90.17024993896484 |
| 1:1000 | 18.092 | 92.82566833496094 |
| 1:10000 | 51.722 | 97.27446746826172 |
Thresholds for nearby passive liveness:
| False Accept Rate (Level) | False Reject Rate [%] | Threshold |
|---|---|---|
| 1:5 | 0.305 | 74.63423156738281 |
| 1:10 | 1.536 | 81.77136993408203 |
| 1:50 | 8.800 | 89.43424987792969 |
| 1:100 | 13.523 | 91.35359954833984 |
| 1:500 | 26.425 | 94.28164672851562 |
| 1:1000 | 33.646 | 95.55736541748047 |
| 1:5000 | 54.978 | 97.1170883178711 |
| 1:10000 | 59.248 | 97.44188690185547 |
Example
Let’s set the threshold of distant passive liveness to 83.84. If we have a representative set of 10,000 real faces, statistically about 243 of the faces will be on average incorrectly marked as spoofs, even though they were real faces (False Reject Rate). If we have a set of 10,000 spoofs, statistically about 100 of the spoofs will be on average wrongly marked as real faces (False Accept Rate).
To make the Spoof/Liveness check more strict we can look for a higher Threshold, where it is less likely to accept a spoof face as a real one, while it will more likely consider a real face as a spoof face. If you set the Threshold to 92.82 instead, we will have about 9 faces incorrectly accepted out of 10,000 spoofs and about 1,809 real faces considered to be spoofs out of 10,000 real faces.