Effective Geometric Hashing of the Feature Hyperspace for a Quick Accurate Search of the Most Similar Descriptors in Large Datasets

Dmitry Pozdnyakov

doi:doi:10.11648/j.ajcst.20210402.12

| Peer-Reviewed

Effective Geometric Hashing of the Feature Hyperspace for a Quick Accurate Search of the Most Similar Descriptors in Large Datasets

Dmitry Pozdnyakov

Published in American Journal of Computer Science and Technology (Volume 4, Issue 2)

Received: 25 June 2021 Accepted: 16 July 2021 Published: 24 July 2021

Views: Downloads:

Download PDF

Share This Article

Twitter
Linked In
Facebook

Abstract

The algorithm of effective geometric hashing of the facial feature hyperspace for the accelerated search of the most similar face descriptors by their cosine similarity is described in the present study. The algorithm includes 6 required stages of processing descriptors extracted by a neural network from face images. The first stage is filtration of the descriptor database by selecting the most representative descriptor for each person from the set of descriptors corresponding to his/her different face images. The second stage is evaluation of a number of statistical values for all the components of the selected descriptors. The third stage is intermediate hashing through quantization of every descriptor component value so that almost the same quantity of descriptors corresponds to any quantum number. The fourth stage is statistical processing of the descriptor database to determine the most discriminative descriptor key components and their hierarchy. The fifth stage is calculation of the descriptor hash code for every most representative descriptor from the considered database. The sixth final stage is a special cataloging of data in the form of a multi-tiered directory ordered by the hash codes. The search acceleration is achieved through sparse processing of the whole directory when the hash code obtained for the requested person descriptor acts as a very selective search filter. The developed algorithm always provides the same absolute accuracy as the brute-force search. Through the example of the LFW dataset consideration, the average search acceleration by about 100 times is achieved under conditions that the descriptors have been extracted by a neural network trained on the WiderFace dataset with application of the additive angular margin loss function.

Published in	American Journal of Computer Science and Technology (Volume 4, Issue 2)
DOI	10.11648/j.ajcst.20210402.12
Page(s)	38-45
Creative Commons	This is an Open Access article, distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution and reproduction in any medium or format, provided the original work is properly cited.
Copyright	Copyright © The Author(s), 2021. Published by Science Publishing Group

Keywords

Geometric Hashing, Brute Force, Search, Feature Hyperspace, Descriptor, Cosine Similarity

References

[1]	Hou B.-W., Zheng R., Yang G.-Sh. (2014) Quick search algorithms based on ethnic facial image database. 2014 IEEE 5th International Conference on Software Engineering and Service Science, 573-576.
[2]	Wang M., Deng W. (2018) Deep face recognition: a survey. https://arxiv.org/pdf/1804.06655.pdf, 1-24.
[3]	Liu W., Wen Y., Yu Zh., Li M., Raj B., Song L. (2017) SphereFace: deep hypersphere embedding for face recognition. 2017 IEEE Conference on Computer Vision and Pattern Recognition, 6738-6746.
[4]	Wang H., Wang Y., Zhou Zh., Ji X., Gong D., Zhou J., Li Zh., Liu W. (2018) CosFace: large margin cosine loss for deep face recognition. 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, 5265-5274.
[5]	Deng J., Guo J., Xue N., Zafeiriou S. (2020) ArcFace: additive angular margin loss for deep face recognition. 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition, 4685-4694.
[6]	Cao Q., Ying Y., Li P. (2013) Similarity metric learning for face recognition. 2013 IEEE International Conference on Computer Vision, 2408-2415.
[7]	Vezzetti E., Marcolin F. (2015, Bentham Books) Similarity measures for face recognition.
[8]	Nguyen H. V., Bai L. (2011) Cosine similarity metric learning for face verification. 2010 Asian Conference on Computer Vision, 709-720.
[9]	https://en.wikipedia.org/wiki/Nearest_neighbor_search.
[10]	Aumüller M. (2020) Algorithm engineering for high- dimensional similarity search problems. 18th International Symposium on Experimental Algorithms, 160, 1: 1-3.
[11]	Johnson J., Douze M., J´egou H. (in press) Billionscale similarity search with GPUs. IEEE Transactions on Big Data. Also in (2017) https://arxiv.org/pdf/1702.08734.pdf, 1-12.
[12]	FAISS (Facebook AI Similarity Search). https://github.com/facebookresearch/faiss/wiki.
[13]	https://en.wikipedia.org/wiki/Voronoi_diagram.
[14]	https://en.wikipedia.org/wiki/K-d_tree.
[15]	https://en.wikipedia.org/wiki/Search_engine_indexing.
[16]	https://en.wikipedia.org/wiki/Geometric_hashing.
[17]	Benchmarking nearest neighbors. https://github.com/erikbern/ann-benchmarks.
[18]	Li W., Zhang Y., Sun Y., Wang W., Li M., Zhang W., Lin X. (2019) Approximate nearest neighbor search on high dimensional data – experiments, analyses, and improvement. IEEE Transactions on Knowledge and Data Engineering, 32 (8), 1475-1488.
[19]	Hyvonen V., Pitkanen T., Tasoulis S., Jaasaari E., Tuomainen R., Wang L., Corander J., Roos T. (2016) Fast k-NN search. https://arxiv.org/pdf/1509.06957.pdf, 1-10.
[20]	Aumüller M., Bernhardsson E., Faithfull A. (2020) ANN- benchmarks: a benchmarking tool for approximate nearest neighbor algorithms. Information Systems, 87, 1-13.
[21]	Abu-Aisheh Z., Raveaux R., Ramel J.-Y. (2020) Efficient k-nearest neighbors search in graph space. Pattern Recognition Letters, 134, 77-86.
[22]	Hwang Y., Baek M., Kim S., Han B., Ahn H.-K. (2018) Product quantized translation for fast nearest neighbor search. The Thirty-Second AAAI Conference on Artificial Intelligence, 3295-3301.
[23]	Abdelhadi A. M. S., Bouganis Ch.-S., Constantinides G. A. (2019) Accelerated approximate nearest neighbors search through hierarchical product quantization. 2019 International Conference on Field-Programmable Technology, 90-98.
[24]	Subramanya S. J., Devvrit F., Simhadri H. V., Krishaswamy R., Simhadri H. V. (2019) DiskANN: fast accurate billion-point nearest neighbor search on a single node. 33rd Conference on Neural Information Processing Systems, 13748-13758.
[25]	Guo R., Sun Ph., Lindgren E., Geng Q., Simcha D., Chern F., Kumar S. (2020) Accelerating large-scale inference with anisotropic vector quantization. 37th International Conference on Machine Learning, 119, 3887-3896.
[26]	Ren J., Zhang M., Li D. (2020) HM-ANN: efficient billion-point nearest neighbor search on heterogeneous memory. 34th Conference on Neural Information Processing Systems, 1-13.
[27]	Cariou C., Moan S. L., Chehdi K. (2020) Improving k-nearest neighbor approaches for density-based pixel clustering in hyperspectral remote sensing images. Remote Sensing, 12 (22), 3745-3771.
[28]	Li M., Zhang Y., Sun Y., Wang W., Tsang I. W., Lin X. (2020) I/O efficient approximate nearest neighbor search based on learned functions. 2020 IEEE 36th International Conference on Data Engineering, 289-300.
[29]	Dong Y., Indyk P., Razenshteyn I., Wagner T. (2020) Learning space partitions for nearest neighbor search. Eighth International Conference on Learning Representations, 1-16.
[30]	Aghbari Z. A., Ismail T., Kamel I. (2020) SparkNN: a distributed in-memory data partitioning for KNN queries on big spatial data. Data Science Journal, 19 (1), 35-48.
[31]	Chen H., Chillotti I., Dong Y., Poburinnaya O., Razenshteyn I., Riazi M. S. (2020) SANNS: scaling up secure approximate k-nearest neighbors search. Proceedings of the 29th USENIX Security Symposium, 2111-2128.
[32]	Pan Y., Pan Z., Wang Y., Wang W. (2020) A new fast search algorithm for exact k-nearest neighbors based on optimal triangle-inequality-based check strategy. Knowledge-Based Systems, 189, 105088.
[33]	WIDERFACE: a face detection benchmark. http://shuoyang1213.me/WIDERFACE.
[34]	Labeled faces in the wild. http://vis-www.cs.umass.edu/lfw.
[35]	Pozdnyakov D. (2020) Determination of the most representative descriptor among a set of feature vectors for the same object. https://arxiv.org/ftp/arxiv/papers/2007/2007.03021.pdf, 1-8.
[36]	https://en.wikipedia.org/wiki/Bisection_method.

Cite This Article

Plain Text BibTeX RIS

APA Style

Dmitry Pozdnyakov. (2021). Effective Geometric Hashing of the Feature Hyperspace for a Quick Accurate Search of the Most Similar Descriptors in Large Datasets. American Journal of Computer Science and Technology, 4(2), 38-45. https://doi.org/10.11648/j.ajcst.20210402.12

Copy | Download

ACS Style

Dmitry Pozdnyakov. Effective Geometric Hashing of the Feature Hyperspace for a Quick Accurate Search of the Most Similar Descriptors in Large Datasets. Am. J. Comput. Sci. Technol. 2021, 4(2), 38-45. doi: 10.11648/j.ajcst.20210402.12

Copy | Download

AMA Style

Dmitry Pozdnyakov. Effective Geometric Hashing of the Feature Hyperspace for a Quick Accurate Search of the Most Similar Descriptors in Large Datasets. Am J Comput Sci Technol. 2021;4(2):38-45. doi: 10.11648/j.ajcst.20210402.12

Copy | Download

@article{10.11648/j.ajcst.20210402.12,
  author = {Dmitry Pozdnyakov},
  title = {Effective Geometric Hashing of the Feature Hyperspace for a Quick Accurate Search of the Most Similar Descriptors in Large Datasets},
  journal = {American Journal of Computer Science and Technology},
  volume = {4},
  number = {2},
  pages = {38-45},
  doi = {10.11648/j.ajcst.20210402.12},
  url = {https://doi.org/10.11648/j.ajcst.20210402.12},
  eprint = {https://article.sciencepublishinggroup.com/pdf/10.11648.j.ajcst.20210402.12},
  abstract = {The algorithm of effective geometric hashing of the facial feature hyperspace for the accelerated search of the most similar face descriptors by their cosine similarity is described in the present study. The algorithm includes 6 required stages of processing descriptors extracted by a neural network from face images. The first stage is filtration of the descriptor database by selecting the most representative descriptor for each person from the set of descriptors corresponding to his/her different face images. The second stage is evaluation of a number of statistical values for all the components of the selected descriptors. The third stage is intermediate hashing through quantization of every descriptor component value so that almost the same quantity of descriptors corresponds to any quantum number. The fourth stage is statistical processing of the descriptor database to determine the most discriminative descriptor key components and their hierarchy. The fifth stage is calculation of the descriptor hash code for every most representative descriptor from the considered database. The sixth final stage is a special cataloging of data in the form of a multi-tiered directory ordered by the hash codes. The search acceleration is achieved through sparse processing of the whole directory when the hash code obtained for the requested person descriptor acts as a very selective search filter. The developed algorithm always provides the same absolute accuracy as the brute-force search. Through the example of the LFW dataset consideration, the average search acceleration by about 100 times is achieved under conditions that the descriptors have been extracted by a neural network trained on the WiderFace dataset with application of the additive angular margin loss function.},
 year = {2021}
}

Copy | Download

TY  - JOUR
T1  - Effective Geometric Hashing of the Feature Hyperspace for a Quick Accurate Search of the Most Similar Descriptors in Large Datasets
AU  - Dmitry Pozdnyakov
Y1  - 2021/07/24
PY  - 2021
N1  - https://doi.org/10.11648/j.ajcst.20210402.12
DO  - 10.11648/j.ajcst.20210402.12
T2  - American Journal of Computer Science and Technology
JF  - American Journal of Computer Science and Technology
JO  - American Journal of Computer Science and Technology
SP  - 38
EP  - 45
PB  - Science Publishing Group
SN  - 2640-012X
UR  - https://doi.org/10.11648/j.ajcst.20210402.12
AB  - The algorithm of effective geometric hashing of the facial feature hyperspace for the accelerated search of the most similar face descriptors by their cosine similarity is described in the present study. The algorithm includes 6 required stages of processing descriptors extracted by a neural network from face images. The first stage is filtration of the descriptor database by selecting the most representative descriptor for each person from the set of descriptors corresponding to his/her different face images. The second stage is evaluation of a number of statistical values for all the components of the selected descriptors. The third stage is intermediate hashing through quantization of every descriptor component value so that almost the same quantity of descriptors corresponds to any quantum number. The fourth stage is statistical processing of the descriptor database to determine the most discriminative descriptor key components and their hierarchy. The fifth stage is calculation of the descriptor hash code for every most representative descriptor from the considered database. The sixth final stage is a special cataloging of data in the form of a multi-tiered directory ordered by the hash codes. The search acceleration is achieved through sparse processing of the whole directory when the hash code obtained for the requested person descriptor acts as a very selective search filter. The developed algorithm always provides the same absolute accuracy as the brute-force search. Through the example of the LFW dataset consideration, the average search acceleration by about 100 times is achieved under conditions that the descriptors have been extracted by a neural network trained on the WiderFace dataset with application of the additive angular margin loss function.
VL  - 4
IS  - 2
ER  -

Copy | Download

Author Information

Dmitry Pozdnyakov

Deep Learning Department, Closed Joint Stock Company "Oxagile", Minsk, Belarus

Download PDF

Sections

Plain Text BibTeX RIS

APA Style

Dmitry Pozdnyakov. (2021). Effective Geometric Hashing of the Feature Hyperspace for a Quick Accurate Search of the Most Similar Descriptors in Large Datasets. American Journal of Computer Science and Technology, 4(2), 38-45. https://doi.org/10.11648/j.ajcst.20210402.12

Copy | Download

ACS Style

Dmitry Pozdnyakov. Effective Geometric Hashing of the Feature Hyperspace for a Quick Accurate Search of the Most Similar Descriptors in Large Datasets. Am. J. Comput. Sci. Technol. 2021, 4(2), 38-45. doi: 10.11648/j.ajcst.20210402.12

Copy | Download

AMA Style

Dmitry Pozdnyakov. Effective Geometric Hashing of the Feature Hyperspace for a Quick Accurate Search of the Most Similar Descriptors in Large Datasets. Am J Comput Sci Technol. 2021;4(2):38-45. doi: 10.11648/j.ajcst.20210402.12

Copy | Download

@article{10.11648/j.ajcst.20210402.12,
  author = {Dmitry Pozdnyakov},
  title = {Effective Geometric Hashing of the Feature Hyperspace for a Quick Accurate Search of the Most Similar Descriptors in Large Datasets},
  journal = {American Journal of Computer Science and Technology},
  volume = {4},
  number = {2},
  pages = {38-45},
  doi = {10.11648/j.ajcst.20210402.12},
  url = {https://doi.org/10.11648/j.ajcst.20210402.12},
  eprint = {https://article.sciencepublishinggroup.com/pdf/10.11648.j.ajcst.20210402.12},
  abstract = {The algorithm of effective geometric hashing of the facial feature hyperspace for the accelerated search of the most similar face descriptors by their cosine similarity is described in the present study. The algorithm includes 6 required stages of processing descriptors extracted by a neural network from face images. The first stage is filtration of the descriptor database by selecting the most representative descriptor for each person from the set of descriptors corresponding to his/her different face images. The second stage is evaluation of a number of statistical values for all the components of the selected descriptors. The third stage is intermediate hashing through quantization of every descriptor component value so that almost the same quantity of descriptors corresponds to any quantum number. The fourth stage is statistical processing of the descriptor database to determine the most discriminative descriptor key components and their hierarchy. The fifth stage is calculation of the descriptor hash code for every most representative descriptor from the considered database. The sixth final stage is a special cataloging of data in the form of a multi-tiered directory ordered by the hash codes. The search acceleration is achieved through sparse processing of the whole directory when the hash code obtained for the requested person descriptor acts as a very selective search filter. The developed algorithm always provides the same absolute accuracy as the brute-force search. Through the example of the LFW dataset consideration, the average search acceleration by about 100 times is achieved under conditions that the descriptors have been extracted by a neural network trained on the WiderFace dataset with application of the additive angular margin loss function.},
 year = {2021}
}

Copy | Download

TY  - JOUR
T1  - Effective Geometric Hashing of the Feature Hyperspace for a Quick Accurate Search of the Most Similar Descriptors in Large Datasets
AU  - Dmitry Pozdnyakov
Y1  - 2021/07/24
PY  - 2021
N1  - https://doi.org/10.11648/j.ajcst.20210402.12
DO  - 10.11648/j.ajcst.20210402.12
T2  - American Journal of Computer Science and Technology
JF  - American Journal of Computer Science and Technology
JO  - American Journal of Computer Science and Technology
SP  - 38
EP  - 45
PB  - Science Publishing Group
SN  - 2640-012X
UR  - https://doi.org/10.11648/j.ajcst.20210402.12
AB  - The algorithm of effective geometric hashing of the facial feature hyperspace for the accelerated search of the most similar face descriptors by their cosine similarity is described in the present study. The algorithm includes 6 required stages of processing descriptors extracted by a neural network from face images. The first stage is filtration of the descriptor database by selecting the most representative descriptor for each person from the set of descriptors corresponding to his/her different face images. The second stage is evaluation of a number of statistical values for all the components of the selected descriptors. The third stage is intermediate hashing through quantization of every descriptor component value so that almost the same quantity of descriptors corresponds to any quantum number. The fourth stage is statistical processing of the descriptor database to determine the most discriminative descriptor key components and their hierarchy. The fifth stage is calculation of the descriptor hash code for every most representative descriptor from the considered database. The sixth final stage is a special cataloging of data in the form of a multi-tiered directory ordered by the hash codes. The search acceleration is achieved through sparse processing of the whole directory when the hash code obtained for the requested person descriptor acts as a very selective search filter. The developed algorithm always provides the same absolute accuracy as the brute-force search. Through the example of the LFW dataset consideration, the average search acceleration by about 100 times is achieved under conditions that the descriptors have been extracted by a neural network trained on the WiderFace dataset with application of the additive angular margin loss function.
VL  - 4
IS  - 2
ER  -

Copy | Download