COMPARATIVE STUDY OF VECTOR DATABASES BASED ON ANN FOR FAST AND ACCURATE FACE RECOGNITION IN DIGITAL FORENSICS

DOI: 10.31673/2409-7292.2025.028437

Authors

  • Т. О. Фединишин, (Fedynishyn T.O.) Information Security Department, Lviv Polytechnic National University
  • О. О. Партика, (Partyka O.O.) Information Security Department, Lviv Polytechnic National University

DOI:

https://doi.org/10.31673/2409-7292.2025.028437

Abstract

The rapid growth of biometric data and the growing need for accurate face verification in the field of digital forensics
have necessitated the creation of scalable and efficient facial image search systems. This paper presents a comparative study of
five vector search algorithms — HNSW, Faiss, Annoy, PyNNDescent, and Nearest Neighbors — for face identification tasks
based on vector representations (embeddings). The experiment was designed taking into account conditions close to real forensic
scenarios, with a focus on such key evaluation metrics as Top-1 accuracy, similarity coefficient distribution, and query
processing time. All tested methods demonstrated high accuracy (over 91%), but significant differences were recorded between
them in terms of match confidence and speed. Faiss showed the highest similarity indicators, which indicates better search
accuracy, although it required significantly more computational resources. In contrast, the HNSW and PyNNDescent algorithms
provided near-instantaneous query processing with competitive accuracy, but with higher variability in the quality of results.
Annoy proved to be a compromise solution that combines high accuracy with low latency. The results highlight important tradeoffs between accuracy, confidence, and efficiency, providing valuable guidelines for selecting optimal vector search
technologies in facial recognition systems in forensics. In addition, the study proposed a reproducible benchmarking
methodology that can be used to further evaluate biometric search tools in law enforcement and security.
Keywords: vector search, facial recognition, digital forensics, HNSW, Faiss, Annoy, PyNNDescent, biometric
identification.

References
1. Taigman, Y., Yang, M., Ranzato, M., & Wolf, L. (2014). DeepFace: Closing the Gap to Human-Level
Performance in Face Verification. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition
(CVPR). https://doi.org/10.1109/CVPR.2014.220.
2. Johnson, J., Douze, M., & Jégou, H. (2019). Billion-scale similarity search with GPUs. IEEE Transactions on
Big Data, 7(3), 535–547. https://doi.org/10.1109/TBDATA.2019.2921572.
3. Facebook AI Research. Faiss: A library for efficient similarity search and clustering of dense vectors.
https://doi.org/10.48550/arXiv.1702.08734.
4. Bernhardsson, E. (2015). Annoy: Approximate Nearest Neighbors in C++/Python. GitHub Repository.
https://doi.org/10.5281/zenodo.3528499.
5. Malkov, Y. A., & Yashunin, D. A. (2020). Efficient and robust approximate nearest neighbor search using
Hierarchical Navigable Small World graphs. IEEE Transactions on Pattern Analysis and Machine Intelligence, 42(4),
824–836. https://doi.org/10.1109/TPAMI.2018.2889473.
6. McInnes, L., Healy, J., & Melville, J. (2018). UMAP: Uniform Manifold Approximation and Projection for
Dimension Reduction. arXiv preprint. https://doi.org/10.48550/arXiv.1802.03426.
7. Malkov, Y. A., & Yashunin, D. A. (2020). Efficient and robust approximate nearest neighbor search using
Hierarchical Navigable Small World graphs. IEEE TPAMI, 42(4), 824–836. https://doi.org/10.1109/TPAMI.
2018.2889473.
8. Johnson, J., Douze, M., & Jégou, H. (2019). Billion-scale similarity search with GPUs. IEEE Transactions on
Big Data, 7(3), 535–547. https://doi.org/10.1109/TBDATA.2019.2921572.
9. Guo, Y., Zhang, L., Hu, Y., He, X., & Gao, J. (2020). Deep learning for image retrieval: Recent progress and
challenges. Pattern Recognition, 104, 107199. https://doi.org/10.1016/j.patcog.2020.107199 Zhang, L., Lin, Y., & Sun,
M. (2021). Comparative evaluation of large-scale face retrieval systems. Neurocomputing, 443, 164–175.
https://doi.org/10.1016/j.neucom.2021.02.059.
10. Bernhardsson, E. (2019). Annoy: Approximate Nearest Neighbors in C++/Python. GitHub/Zenodo. https://
doi.org/10.5281/zenodo.3528499.
11. McInnes, L., Healy, J., & Astels, S. (2020). UMAP and PyNNDescent for high-speed neighbor finding. arXiv
preprint. https://doi.org/10.48550/arXiv.2007.11462.
12. Pedregosa, F. et al. (2011). Scikit-learn: Machine learning in Python. JMLR, 12, 2825–2830.
https://doi.org/10.48550/arXiv.1201.0490 Aumüller, M., Bernhardsson, E., & Faithfull, A. (2020). ANN-benchmarks: A
benchmarking tool for approximate nearest neighbor algorithms. arXiv preprint. https://doi.org/10.48550/arXiv.
1608.03908.
13. Zhang, R., Wang, D., & Tan, C. (2021). A comparative study of ANN methods in face embedding retrieval.
Journal of Forensic Sciences, 66(4), 1281–1292. https://doi.org/10.1111/1556-4029.14663.
14. Deng, J., Guo, J., Xue, N., & Zafeiriou, S. (2019). ArcFace: Additive angular margin loss for deep face
recognition. CVPR. https://doi.org/10.1109/CVPR.2019.00482.
15. Wang, H., Wang, Y., Zhou, Z., Ji, X., Gong, D., Zhou, J., & Liu, W. (2018). CosFace: Large margin cosine
loss for deep face recognition. CVPR. https://doi.org/10.1109/CVPR.2018.00482.
16. Huang, Y., Wang, Y., & Chen, C. (2020). CurricularFace: Adaptive curriculum learning loss for deep face
recognition. CVPR. https://doi.org/10.1109/CVPR42600.2020.00487.
17. Kumar, A., & Singh, R. (2020). Face recognition for crime analysis and suspect identification: A survey. ACM
Computing Surveys, 53(6), 1–37. https://doi.org/10.1145/3417980.
18. Zhong, Y., Zheng, L., Cao, D., & Li, S. Z. (2022). Face re-identification with video surveillance in forensic
scenarios. IEEE TBIOM, 4(3), 371–384. https://doi.org/10.1109/TBIOM.2022.3144896.
19. Choi, J., & Yoon, S. (2021). Forensic triage on smartphones: Machine learning-assisted image retrieval. Digital
Investigation, 37, 301066. https://doi.org/10.1016/j.diin.2021.301066.
20. Bui, T., & Huynh, D. (2020). Person identification from mobile gallery photos. Forensic Science International:
Digital Investigation, 33, 300957. https://doi.org/10.1016/j.fsidi.2020.300957.
21. Savić, M., & Radojević, B. (2019). Forensic-level face retrieval with occlusion handling. Pattern Recognition
Letters, 128, 496–503. https://doi.org/10.1016/j.patrec.2019.09.003.
22. Goyal, S., & Katarya, R. (2022). Explainable ANN-based systems for forensic decisions. IEEE Access, 10,
45632–45644. https://doi.org/10.1109/ACCESS.2022.3170081.
23. Sommers, R., & Hernandez-Orallo, J. (2021). Fairness and transparency in face recognition retrieval systems.
Ethics and Information Technology, 23(3), 389–406. https://doi.org/10.1007/s10676-021-09602-w.
24. Wang, L., Song, W., & Tang, Y. (2022). Real-time vector search optimization for edge computing. Journal of
Parallel and Distributed Computing, 164, 27–37. https://doi.org/10.1016/j.jpdc.2022.03.005.
25. Bhattacharya, S., & Singh, N. (2021). Lightweight ANN frameworks for mobile forensics. Mobile Networks
and Applications, 26(4), 1580–1594. https://doi.org/10.1007/s11036-021-01768-z.
26. O. Mykhaylova, et al., Person-of-Interest Detection on Mobile Forensics Data—AI-Driven Roadmap, in:
Cybersecurity Providing in Information and Telecommunication Systems, vol. 3654 (2024) 239–251.

Published

2025-06-28

Issue

Section

Articles