Sigit Purnomo Personal Site

Extraction and attribution of public figures statements for journalism in Indonesia using deep learning

2 minute read

Published: February 27, 2024

News articles are usually written by journalists based on statements taken from interviews with public figures. Attribution from such statements provides important information and it can be extracted from news articles to build a knowledge base by developing a sequential tagging scheme such as entity recognition. This research applies two deep learning architectures: recurrent neural networks-based and transformer-based, to establish public figures statement attribution and extraction models in the Indonesian Language. The experiments are conducted using five deep-learning model architectures with two different corpus sizes to investigate the impact of corpus size on each model’s performance. The experiments show that the best model for the RNN-based architecture is PFSA-ID-BLWCA which achieves 81.34 % F1 score, and the best model for the transformer-based is PFSA-ID-TWCA which obtains 81.01 % F1 score. This research also discovers that the size of the corpus influences the model performances. Furthermore, the study lays a foundation to overcome the attribution extraction in another language, especially low-resource languages, with some necessary adjustments.

PFSA-ID: An Annotated Indonesian Corpus and Baseline Model of Public Figures Statements Attributions

2 minute read

Published: November 07, 2022

By far, the corpus for the quotation extraction and quotation attribution tasks in Indonesian is still limited in quantity and depth. This study aims to develop an Indonesian corpus of public figure statements attributions and a baseline model for attribution extraction, so it will contribute to fostering research in information extraction for the Indonesian language.

Understanding quotation extraction and attribution: towards automatic extraction of public figure’s statements for journalism in Indonesia

3 minute read

Published: December 09, 2020

Extracting information from unstructured data becomes a challenging task for computational linguistics. Public figure’s statement attributed by journalists in a story is one type of information that can be processed into structured data. Therefore, having the knowledge base about this data will be very beneficial for further use, such as for opinion mining, claim detection and fact-checking. This study aims to understand statement extraction tasks and the models that have already been applied to formulate a framework for further study.

Research about Named Entity Recognition Published in ArXiv

2 minute read

Published: September 14, 2020

Named Entity Recognition (NER) is a task in Information Extraction consisting in identifying and classifying just some types of information elements, called Named Entities (NE). I have tried to collect and curate some publications form Arxiv that related to NER, and the results were listed here. Please enjoy it!

Research about Sentiment Analysis in Social Media Published in ArXiv

2 minute read

Published: September 04, 2020

Sentiment analysis is the area which deals with judgments, responses as well as feelings, which is generated from texts, being extensively used in fields like data mining, web mining, and social media analytics because sentiments are the most essential characteristics to judge the human behavior. I have tried to collect and curate some publications form Arxiv that related to the sentiment analysis in social media, and the results were listed here. Please enjoy it!

Sigit Purnomo

Recent posts

Extraction and attribution of public figures statements for journalism in Indonesia using deep learning

PFSA-ID: An Annotated Indonesian Corpus and Baseline Model of Public Figures Statements Attributions

Understanding quotation extraction and attribution: towards automatic extraction of public figure’s statements for journalism in Indonesia

Research about Named Entity Recognition Published in ArXiv

Research about Sentiment Analysis in Social Media Published in ArXiv