Development of Typographical Error Identification Application in Indonesian Language Using Jaro-Winkler Distance Algorithm

3 minute read

New publication in collaboration with my undergraduate students, Grelly. This publication is based on her final project to get her bachelor’s degree.

Language: Indonesian

Abstract:
Text is one of the media used by humans to communicate and interact every day, especially in the field of education, for example, in writing a final project report. The most common thing in writing text is typographical errors. Based on these problems, an application is needed to help the writer to be able to identify typographical errors in the Indonesian Language document. The application developed using Laravel version 5.8 for web application and Python version 3 for processing datasets, developing model, and developing web services. Model built uses the NLTK library and Jaro-Winkler distance algorithm implemented using the pylibjaro library. The dataset uses an open-source dataset in the form of a list of words from KBBI. This application only supports pdf files. The results of the model are applied to the web services with output in the form of JSON data. The JSON data contains a list of words that have true or false values, the number of document words, the number of correct words, the number of incorrect words, and the time of program execution.

Keywords: Text, Document, Bahasa Indonesia, Typographical Error, Jaro-Winkler Distance

Fulltext: PDF(Indonesian Language)

DOI: 10.30595/juita.v8i1.6344

References:

[1] A. I. Fahma, I. Cholissodin, and R. S. Perdana, “Identifikasi Kesalahan Penulisan Kata ( Typographical Error ) pada Dokumen Berbahasa Indonesia Menggunakan Metode N-gram dan Levenshtein Distance,” J. Pengemb. Teknol. Inf. dan Ilmu Komput., vol. 2, no. 1, pp. 53–62, 2018.

[2] Murtiningsih S.Pd, “Kesalahan Berbahasa Indonesia Mahasiswa S-1 PGSD STIKIP Nuuwar Fak-fak,” J. Penelit. Ilmu Pendidik., vol. 6, no. 1, pp. 74–82, 2013.

[3] M. Javed et al., “A Study of Students’ Assessment in Writing Skills of the English Language,” Int. J. Instr., vol. 6, no. 2, pp. 129–144, 2013.

[4] Y. Rochmawati and R. Kusumaningrum, “Studi Perbandingan Algoritma Pencarian String dalam Metode Approximate String Matching untuk Identifikasi Kesalahan Pengetikan Teks,” J. Buana Inform., vol. 7, no. 2, pp. 125–134, 2016.

[5] K. M. Suryaningrum and A. T, “Pengkoreksian dan Suggestion Word pada Keyword Menggunakan Algoritma Jaro-Winkler,” J. Teknol. Informasi-AITI, vol. 13, no. 2, pp. 169–181, 2016.

[6] J. Frando, I. Ruslianto, R. Hidayati, J. Rekayasa, and S. Komputer, “Penerapan Jaro Winkler Distance dalam Aplikasi Pengoreksi Kesalahan Penulisan Bahasa Indonesia Berbasis Web,” Coding J. Komput. dan Apl., vol. 07, no. 03, pp. 44–53, 2019.

[7] T. Maghfira, I. Cholissodin, and A. Widodo, “Deteksi Kesalahan Ejaan dan Penentuan Rekomendasi Koreksi Kata yang Tepat Pada Dokumen Jurnal JTIIK Menggunakan Dictionary Lookup dan Damerau-Levenshtein Distance,” J. Pengemb. Teknol. Inf. dan Ilmu Komput., vol. 1, no. 6, pp. 498–506, 2017.

[8] W. W. A. Umboh, S. R. Sentinuwo, and A. M. Sambul, “Rancang Bangun Aplikasi Deteksi Kesalahan Penulisan Naskah Dokumen Skripsi,” E-Journal Tek. Inform., vol. 11, no. 1, 2017.

[9] C. I. Ratnasari, S. Kusumadewi, and L. Rosita, “A Non-Word Error Spell Checker for Patient Complaints in Bahasa Indonesia,” Int. J. Inf. Technol. Comput. Sci. Open Source, vol. 1, no. 1, pp. 18–21, 2017.

[10] M. M. Yulianto, R. Arifudin, and A. Alamsyah, “Autocomplete and Spell Checking Levenshtein Distance Algorithm To Getting Text Suggest Error Data Searching In Library,” Sci. J. Informatics, vol. 5, no. 1, p. 75, 2018.

[11] P. H. Hema and C. Sunitha, “Spell Checker for Non Word Error Detection : Survey,” Int. J. Adv. Res. Comput. Sci. Softw. Eng., vol. 5, no. 3, pp. 360–363, 2015.

[12] A. Prasetyo, W. M. Baihaqi, and I. S. Had, “Algoritma Jaro-Winkler Distance: Fitur Autocorrect dan Spelling Suggestion pada Penulisan Naskah Bahasa Indonesia di BMS TV,” J. Teknol. Inf. dan Ilmu Komput., vol. 5, no. 4, p. 435, 2018.

[13] H. Gueddah, A. Yousfi, and M. Belkasmi, “The filtered combination of the weighted edit distance and the Jaro-Winkler distance to improve spellchecking Arabic texts,” Proc. IEEE/ACS Int. Conf. Comput. Syst. Appl. AICCSA, vol. 2016-July, no. May 2019, 2016.

[14] Mutammimah, H. Sujaini, and R. D. Nyoto, “Analisis Perbandingan Metode Spelling Corrector Peter Norvig dan Spelling Checker BK-Trees pada Kata Berbahasa Indonesia,” J. Sist. dan Teknol. Inf., vol. 5, no. 1, pp. 1–5, 2016.

[15] R. Gadde and J. Kelly, “Using Multiprocessing to Make Python Code Faster,” Medium, 2018. [Online]. Available: https://medium.com/@urban_institute/using-multiprocessing-to-make-python-code-faster-23ea5ef996ba. [Accessed: 28-Jul-2019].