Systematic Literature Review on Optical Character Recognition Methods for Text Extraction

Krisna Bayu Aditya Nurcahyo; Ricky Eka Putra; Yuni Yamasari

Authors

Krisna Bayu Aditya Nurcahyo Universitas Negeri Surabaya Author
Ricky Eka Putra Universitas Negeri Surabaya Author
Yuni Yamasari Universitas Negeri Surabaya Author

Keywords:

Optical Character Recognition (OCR), text extraction, prisma, CNN, deep learning

Abstract

The development of technology has driven a significant increase in the need for document digitization and automation of text-based data processing. A systematic review is needed to identify progress related to the development of OCR in text extraction. Therefore, this study presents a systematic literature review on the development and use of OCR in text extraction using the PRISMA method. The study began with an initial search of 38 studies, which were then selected based on established criteria. Seven relevant articles were successfully identified through a focused search using the keywords "Optical Character Recognition/OCR." The results of the literature review analysis show that the Convolutional Neural Network (CNN) method is the most widely used approach in the development of OCR for text extraction. In addition, the analysis results also reveal that OCR has been applied in various fields, including healthcare, public administration, government, transportation, and commercial services. This study also highlights the various benefits as well as several challenges that are still faced in the future development of OCR. These challenges include improving character recognition accuracy and handling font variations as well as image quality. Thus, the insights generated by this research contribute to the development of OCR as a more reliable and effective tool in supporting document digitization processes.

References

[1] A. T. P. D. Akhsa, M. I. Burhan, and A. Munandar, “Integrasi OCR dan TF-IDF untuk metadata otomatis pada pencarian dokumen digital,” Jurnal FASILKOM (teknologi inFormASi dan Ilmu KOMputer), vol. 15, no. 2, pp. 304–311, 2025, doi: 10.37859/jf.v15i2.9918.

[2] H. Gowrishankar and K. S. Praveen, “Optical Character Recognition (Ocr): A Comprehensive Review,” International Research Journal of Modernization in Engineering Technology and Science, vol. 5, no. 7, pp. 2504–2508, 2023, doi:10.56726/IRJMETS43530.

[3] F. Styono, B. S. Riza, and M. Furqan, “Document image analysis for deep learning-based text recognition,” in Proc. 1st International Conference on Science and Technology (ICST UISU), pp. 243–248, 2024, doi:10.30743/yb06bm29.

[4] S. P. Aniruddha, V. K. Gowda, M. Jaya Krishna Datta, M. Rehan, and P. Indu Raj, “Survey On Ocr And Cnn Based Approaches For Text Extraction From Images And Documents,” International Advanced Research Journal in Science, Engineering and Technology (IARJSET), vol. 12, no. 2, pp. 148–156, 2025, doi:10.17148/IARJSET.2025.12218.

[5] M. Rahulil, Y. Yamasari, R. E. Putra, I. M. Suartana, and A. Qoiriah, “A Systematic Literature Review on Chatbot Development for WhatsApp : Programming Language, Method and Utility,” Jurnal Serambi Engineering (JSE), vol. X, no. 3, pp. 14363–14371, 2025.

[6] M. J. Page et al., “The PRISMA 2020 statement: an updated guideline for reporting systematic reviews Systematic reviews and Meta-Analyses,” 2021, doi:10.1136/bmj.n71.

[7] P. Sharma, “Advancements in OCR: A Deep Learning Algorithm for Enhanced Text Recognition,” International Journal of Inventive Engineering and Sciences (IJIES), vol. 10, no. 8, 2023, doi: 10.35940/ijies/F4263.0810823.

[8] Ç. Sayallar, A. Sayar, and N. Babalık, “An OCR Engine for Printed Receipt Images using Deep Learning Techniques,” International Journal of Advanced Computer Science and Applications (IJACSA), vol. 14, no. 2, pp. 833-840, 2023, doi:10.14569/IJACSA.2023.0140295.

[9] Y. Shambharkar, S. Salagrama, K. Sharma, O. Mishra, and D. Parashar, “An Automatic Framework for Number Plate Detection using OCR and Deep Learning Approach,” Science and Information (SAI) Organization, vol. 14, no. 4, pp. 8-14, 2023, doi:10.14569/IJACSA.2023.0140402.

[10] G. Suddul and J. F. L. Seguin, “A custom‑built deep learning approach for text extraction from identity card images,” International Journal of Informatics and Communication Technology (IJ‑ICT), vol. 13, no. 1, pp. 34–41, 2024, doi:10.11591/ijict.v13i1.pp34-41.

[11] T. W. Ramdhani, I. Budi, and B. Purwandari, “Optical Character Recognition Engines Performance Comparison in Information Extraction,” International Journal of Advanced Computer Science and Applications (IJACSA), vol. 12, no. 8, pp. 120–127, 2021, doi:10.14569/IJACSA.2021.0120814.

[12] P. Imura, A. Wongkamhang, P. Chotikunnan, R. Chotikunnan, N. Thongpance, and A. Nirapai, “Development of OCR Technology Application System for Health Data Recording,” International Journal of Online and Biomedical Engineering (iJOE), vol. 21, no. 4, pp. 125-149, 2025, doi:10.3991/ijoe.v21i04.53483.

[13] K. Mohsenzadegan, V. Tavakkoli, and K. Kyamakya, “A Smart Visual Sensing Concept Involving Deep Learning for a Robust Optical Character Recognition under Hard Real-World Conditions,” Sensors, vol. 22, no. 16, p. 6025, 2022, doi:10.3390/s22166025.

[14] A. S. P. Aniruddha, V. K. Gowda, J. K. Datta, M. Rehan, and I. Raj, "Survey on OCR and CNN Based Approaches for Text Extraction from Images and Documents," International Advanced Research Journal in Science, Engineering and Technology, vol. 12, no. 2, pp. 148–156, 2025, doi:10.17148/IARJSET.2025.12218.

[15] A. Kaur and G. S. Lehal, “The Evolution and Impact of Optical Character Recognition (OCR) in Digital Transformation,” International Journal of Emerging Trends in Engineering and Development (IJETED), vol. 15, no. 5, pp. 27–40, 2025, doi:10.5281/zenodo.15779421.