SEGMENTASI DOKUMEN BAHASA INDONESIA MENGGUNAKAN TEXT TILING

Yunianita - Rahmawati

Abstract


Text tiling aims to split long documents into multiple related paragraphs. In this study, the documents are used as data by omitting the reading format as inputs in the segmentation. Text tiling method has three stages, namely tokenisation, determination of similarity, and the introduction of limits. In this study, the results of the segmentation algorithm using tiling text has not yet reached the objective. This is because the segmentation of the document is strongly influenced by a common word file, the determined number of tokens in a token-sequence, and the determination of the number token-sequence within a block.Tthe writing of a word and text tiling algorithm is very sensitive to the reading format, such as titles and subtitles, so that the reading format must be removed to have the body of the text only. Segmentation results increased after the trials. From the experiment of the 15 reading segmentation results show that an accuracy of precision is 59,3% and of recall is 80%. These trials used 4140 common words. The total coefficient score for similarity is 5, the number of tokens in a token-sequence is 20, and the number of token-sequence within a block is 3.

Keywords :  : text tiling, segmentation, multiparagraph segmentation


Full Text:

PDF

References


Claudia Regina Rahardjo. (2003). Studi Analisa Pengenalan Struktur Sub Topik dalam Teks dengan Menggunakan Algoritma Text Tiling. Perpustakaan Sekolah Tinggi Teknik (STTS) Surabaya, Indonesia.

Jati Sasongko Wibowo dan Sri Hartati. (Jan, 2011). Text Document Retrievel In English Using Keywords of Indonesian Dictionary Based. IJCCS, Vol. 5 No. 1.

Kosasih, E. (2007). 1700 Bank Soal Bimbingan Pemantapan Bahasa Indonesia Untuk SMA/MA. Bandung : Yrama Widya.

Lamhot Robinson. Implementasi Metode Generalized Vector Space Model Pada Aplikasi Information Retrieval untuk Pencarian Informasi Pada Kumpulan Dokumen Teknik Elektro Di UPT BPI LIPI. Universitas Komputer Indonesia. Bandung. ISSN : 2089-903.

M.K., Sabarti Akhadiah., Maidar Arsjad., dan Sakura Ridwan. (1986). Materi Pokok Bahasa Indonesia. Jakarta : Karunika Jakarta.

Marti A. Hearst. (29 April 1994). Context and Structurein Automated Full-Text Information Access. Computer Science Division (EECS) University of California Berkeley, California 94720.

Marti A. Hearst. (June 1994). Multi-Paragraph Segmentation of Expository Text. Proceedings of the 32nd Annual Meeting of the Association for Computational Linguistics, Las Cruces, NM.

Marti A. Hearst. (1997). TextTiling : Segmenting Text into Multi-Paragraph Subtopic Passages. Comput. Linguist., vol. 23, no. 1, pp. 33–64. Retrieved from http://dl.acm.org/citation.cfm?id=972687%5Cnhttp://dl.acm.org/citation.cfm?id=972684.972687.

Marti A. Hearst and C. Plaunt. (1993). Subtopic Structuring for Full-Length Document Access. Proc. Annu. Int. ACM SIGIR Conf. Res. Dev. Infofmation Retr., no. June 2002, pp. 59–68. Retrieved from doi: 10.1145/160688.160695.

Rahardi, R. Kunjana. (2009). Penyuntingan Bahasa Indonesia Untuk Karang-Mengarang. Jakarta : Erlangga.

Rahardi, R. Kunjana. (2006). Dimensi-Dimensi Kebahasan Aneka Masalah Bahasa Indonesia Terkini. Jakarta : Erlangga.

Satanjeev Banerjee and Alexander I. Rudnicky. (2006). A TextTiling Based Approach to Topic Boundary Detection in Meetings. Language Technologies InstituteCarnegie Mellon UniversityPittsburgh, PA. United States.




DOI: http://dx.doi.org/10.31000/jika.v5i3.5037

Article Metrics

Abstract - 119 PDF - 102

Refbacks

  • There are currently no refbacks.


Creative Commons License
This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.

 

CURRENT INDEXING