| Peer-Reviewed

Research and Implementation of Word Detection System Based on Improved DFA in China

Received: 12 January 2023    Accepted: 20 February 2023    Published: 4 March 2023
Views:       Downloads:
Abstract

Since the second half of the last century, the intensive usage of digital texts and textual databases produced the need for efficient search methods and data structures. Even though there are many traditional pattern matching algorithms such as regular matching, AC algorithm, and WM algorithm, in this paper, we use on a word detection method based on an improved DFA algorithm. We focus on the implementation of content matching technology using an improved DFA algorithm. We used the approach that can retrieve the emoticon icon, half corner character, repeated word based on ConcurrentSkipListMap to construct the tree of the word filtering system. We introduce the architecture of the system that mainly depends on the middleware, database, and data processing parts. The algorithm performs functions including filtering the word to match multiple pattern strings, to share a common prefix of a string that can reduce repeated lookups and save memory space. We use the pre-trained word vector model to achieve good results for the expansion and improvement of the sensitive lexicon. The system realizes the functions of word matching, including initializing, changing, matching, and highlighting of the word database, various processes that are tested and analyzed. We did a simulation to capture relevant word data and import it into MySQL database for storage. The method for message sensitive word recognition effectively improves the speed and accuracy of the algorithm recognition, the efficiency of word matching. We emphasize the DFA algorithm is the best approach compared to AC algorithm and other algorithms. Through function test, system test, and performance test, some valuable results are obtained. As a result of the tests, valuable results are founded from functional tests, system tests, performance tests. The system realizes the characteristics of large thesaurus and high matching efficiency of long text. It can meet the requirement of network real-time transmission, so it can be applied in the network. This paper proposes an improved multi-mode matching algorithm for word detection based on DFA. The algorithm maximizes the speed of problem detection and response efficiency and purifies the network space by optimizing the algorithm for the characters of the text content, the number of basic words and the detection efficiency. As a result of our research, we have shown the data from different sources of the system can be reused to reduce repeated construction costs.

Published in American Journal of Computer Science and Technology (Volume 6, Issue 1)

This article belongs to the Special Issue Advances in Computer Science and Future Technology

DOI 10.11648/j.ajcst.20230601.14
Page(s) 25-32
Creative Commons

This is an Open Access article, distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution and reproduction in any medium or format, provided the original work is properly cited.

Copyright

Copyright © The Author(s), 2024. Published by Science Publishing Group

Keywords

DFA, MySQL, Word Detection System, Word Changing, Word Matching

References
[1] Zhao Junjie. A calculate way of rapid string precision used for keyword index matches. Computer Systems & Applications, 2010, 19 (2): 189-191.
[2] Kurniawan D H, Munir R. A new string matching algorithm based on logical indexing//Proc of International Conference on Electrical Engineering and Informatics. Piscataway, NJ: IEEE Press, 2015: 394-399.
[3] AHO A V, CORASICK M J. Efficient string matching: An aid to bibliographic search. Communications of the ACM, 1975, 18 (6): 333-340.
[4] WU S, MANBER U. A fast algorithm for multi-pattern searching. Tucson, AZ: University of Arizona,
[5] Liu Chuan, Wang Wenyong, Wang Meng, et al. An efficient instance selection algorithm to reconstruct training set for support vector machine. Knowledge-Based Systems, 2017, 116 (1): 58-73.
[6] Deng Yigui, Wu Yuying. Information filtering algorithm of text content-based sensitive words decision tree. Computer Engineering, 2014, 40 (9): 300-304.
[7] Chen Yongjie, Wushour•Silamu, Yu Qing. An improved multi-pattern matching algorithm based on Aho-Corasick algorithm. Modern Electronics Technique, 2019, 42 (4): 89-93.
[8] Guan Donghai, Yuan Weiwei, Lee Y K, et al. Improving supervised learning performance by using fuzzy clustering method to select training data. Journal of Intelligent & Fuzzy Systems, 2008, 19 (4): 321-334.
[9] Liu Lijun. Design and Optimization of DFA Word Segmentation Algorithm based on Keyword filtering system. Computer Application and Software, 2012 (1): 284-287.
[10] Majed AbuSafiya. Automata-based Algorithm for Multiple Word Matching. International Journal of Advanced Computer Science and Applications (IJACSA), 2021, 12, (3), 54-65.
[11] Cheng Yuanbin. Translating a kind of NFA into DFA straightly. Computer Systems & Applications, 2012, 21 (10): 109-113.
[12] Xu Qiang. Design and implementation of regular expression engines based on deterministic finite automata. Xi’an: Xidian University, 2012.
[13] Cavalcanti G D C, Soares R J O. Ranking-based instance selection for pattern classification. Expert Systems with Applications, 2020, 150: 113269.
[14] Pinkerton A, Boerhout J I, Bottalico T. Using an embedded web server to allow a standard multi-tasking operating system to manage, control and display live or recorded condition monitoring data from real time hardware: US, US14816238. 2016-03-31.
[15] Liu J, Bian G, Qin C, et al. A fast multi-pattern matching algorithm for mining big network data. China Communications, 2019, 16 (5), 121-136.
[16] Proux D, Cheminot E, Guerin N. Method and system for phishing detection: US, 11/443240. 2010-02-23.
[17] Zhao Wei. Research and Practice of Software Testing Strategy based on Black Box Testing. Management and Technology of Small and Medium-sized Enterprises (Upper issue), 2017, (01), 144-145.
[18] Ranjan R. College Database Management System, 2021.
[19] Lim H, Lee N. Survey and Proposal on Binary Search Algorithms for Longest Prefix Match. IEEE Communications Surveys & Tutorials, 2012, 14, (3), 681-697.
[20] Dong Mei, Chang Zhijun, Zhang Runjie. A multi-pattern matching algorithm for incremental data specification of scientific literature metadata. Data Analysis and Knowledge Discovery, 2021, 5 (6), 10.
[21] Pande A, V Pant, Gupta M, et al. Design Patterns Discovery in Source Code: Novel Technique Using Substring Match. TEM Journal, 2021, 10, (3), 1166-1174.
[22] Wu S, Manber U. A fast algorithm for multi-pattern searching. US: Department of Computer Science, 1994: 1-11.
[23] LeCun Y, Bengio Y, Hinton G. Deep learning. Nature, 2015, 521 (7553): 436-444.
[24] Xu Jianhua. Designing nonlinear classifiers through minimizing VC dimension bound/ /Proc of International Symposium on Neural Networks. Berlin: Springer, 2005: 900-905.
[25] Yao R, Cao Y, Ding Z, et al. A Sensitive Words Filtering Model Based on Web Text Features//Proceedings of the 2018 2nd International Conference on Computer Science and Artificial Intelligence. 2018: 516-520.
[26] Becchi M, Crowley P. A-DFA: A time-and space-efficient DFA compression algorithm for fast regular expression evaluation [J]. ACM Transactions on Architecture and Code Optimization (TACO), 2013, 10 (1): 1-26.
[27] Xue Pengqiang, Wushouer, Lamu. Sensitive Information Filtering Algorithm based on Network Text Information. Computer Engineering and Design, 2016, 37 (9): 2447-2452.
[28] Zhang Zhi-Yue. Research and implementation of website word monitoring system based on improved DFA algorithm.
Cite This Article
  • APA Style

    Feng Kai, Tuyatsetseg Badarch. (2023). Research and Implementation of Word Detection System Based on Improved DFA in China. American Journal of Computer Science and Technology, 6(1), 25-32. https://doi.org/10.11648/j.ajcst.20230601.14

    Copy | Download

    ACS Style

    Feng Kai; Tuyatsetseg Badarch. Research and Implementation of Word Detection System Based on Improved DFA in China. Am. J. Comput. Sci. Technol. 2023, 6(1), 25-32. doi: 10.11648/j.ajcst.20230601.14

    Copy | Download

    AMA Style

    Feng Kai, Tuyatsetseg Badarch. Research and Implementation of Word Detection System Based on Improved DFA in China. Am J Comput Sci Technol. 2023;6(1):25-32. doi: 10.11648/j.ajcst.20230601.14

    Copy | Download

  • @article{10.11648/j.ajcst.20230601.14,
      author = {Feng Kai and Tuyatsetseg Badarch},
      title = {Research and Implementation of Word Detection System Based on Improved DFA in China},
      journal = {American Journal of Computer Science and Technology},
      volume = {6},
      number = {1},
      pages = {25-32},
      doi = {10.11648/j.ajcst.20230601.14},
      url = {https://doi.org/10.11648/j.ajcst.20230601.14},
      eprint = {https://article.sciencepublishinggroup.com/pdf/10.11648.j.ajcst.20230601.14},
      abstract = {Since the second half of the last century, the intensive usage of digital texts and textual databases produced the need for efficient search methods and data structures. Even though there are many traditional pattern matching algorithms such as regular matching, AC algorithm, and WM algorithm, in this paper, we use on a word detection method based on an improved DFA algorithm. We focus on the implementation of content matching technology using an improved DFA algorithm. We used the approach that can retrieve the emoticon icon, half corner character, repeated word based on ConcurrentSkipListMap to construct the tree of the word filtering system. We introduce the architecture of the system that mainly depends on the middleware, database, and data processing parts. The algorithm performs functions including filtering the word to match multiple pattern strings, to share a common prefix of a string that can reduce repeated lookups and save memory space. We use the pre-trained word vector model to achieve good results for the expansion and improvement of the sensitive lexicon. The system realizes the functions of word matching, including initializing, changing, matching, and highlighting of the word database, various processes that are tested and analyzed. We did a simulation to capture relevant word data and import it into MySQL database for storage. The method for message sensitive word recognition effectively improves the speed and accuracy of the algorithm recognition, the efficiency of word matching. We emphasize the DFA algorithm is the best approach compared to AC algorithm and other algorithms. Through function test, system test, and performance test, some valuable results are obtained. As a result of the tests, valuable results are founded from functional tests, system tests, performance tests. The system realizes the characteristics of large thesaurus and high matching efficiency of long text. It can meet the requirement of network real-time transmission, so it can be applied in the network. This paper proposes an improved multi-mode matching algorithm for word detection based on DFA. The algorithm maximizes the speed of problem detection and response efficiency and purifies the network space by optimizing the algorithm for the characters of the text content, the number of basic words and the detection efficiency. As a result of our research, we have shown the data from different sources of the system can be reused to reduce repeated construction costs.},
     year = {2023}
    }
    

    Copy | Download

  • TY  - JOUR
    T1  - Research and Implementation of Word Detection System Based on Improved DFA in China
    AU  - Feng Kai
    AU  - Tuyatsetseg Badarch
    Y1  - 2023/03/04
    PY  - 2023
    N1  - https://doi.org/10.11648/j.ajcst.20230601.14
    DO  - 10.11648/j.ajcst.20230601.14
    T2  - American Journal of Computer Science and Technology
    JF  - American Journal of Computer Science and Technology
    JO  - American Journal of Computer Science and Technology
    SP  - 25
    EP  - 32
    PB  - Science Publishing Group
    SN  - 2640-012X
    UR  - https://doi.org/10.11648/j.ajcst.20230601.14
    AB  - Since the second half of the last century, the intensive usage of digital texts and textual databases produced the need for efficient search methods and data structures. Even though there are many traditional pattern matching algorithms such as regular matching, AC algorithm, and WM algorithm, in this paper, we use on a word detection method based on an improved DFA algorithm. We focus on the implementation of content matching technology using an improved DFA algorithm. We used the approach that can retrieve the emoticon icon, half corner character, repeated word based on ConcurrentSkipListMap to construct the tree of the word filtering system. We introduce the architecture of the system that mainly depends on the middleware, database, and data processing parts. The algorithm performs functions including filtering the word to match multiple pattern strings, to share a common prefix of a string that can reduce repeated lookups and save memory space. We use the pre-trained word vector model to achieve good results for the expansion and improvement of the sensitive lexicon. The system realizes the functions of word matching, including initializing, changing, matching, and highlighting of the word database, various processes that are tested and analyzed. We did a simulation to capture relevant word data and import it into MySQL database for storage. The method for message sensitive word recognition effectively improves the speed and accuracy of the algorithm recognition, the efficiency of word matching. We emphasize the DFA algorithm is the best approach compared to AC algorithm and other algorithms. Through function test, system test, and performance test, some valuable results are obtained. As a result of the tests, valuable results are founded from functional tests, system tests, performance tests. The system realizes the characteristics of large thesaurus and high matching efficiency of long text. It can meet the requirement of network real-time transmission, so it can be applied in the network. This paper proposes an improved multi-mode matching algorithm for word detection based on DFA. The algorithm maximizes the speed of problem detection and response efficiency and purifies the network space by optimizing the algorithm for the characters of the text content, the number of basic words and the detection efficiency. As a result of our research, we have shown the data from different sources of the system can be reused to reduce repeated construction costs.
    VL  - 6
    IS  - 1
    ER  - 

    Copy | Download

Author Information
  • Department of Information Technology, School of Information Technology and Design, Mongolian National University, Ulaanbaatar, Mongolia

  • Department of Information Technology, School of Information Technology and Design, Mongolian National University, Ulaanbaatar, Mongolia

  • Sections