Detailed Information

Cited 2 time in webofscience Cited 2 time in scopus
Metadata Downloads

Machine-Learning-Based Gender Distribution Prediction from Anonymous News Comments: The Case of Korean News Portalopen access

Authors
Suh, Jong Hwan
Issue Date
Aug-2022
Publisher
MDPI
Keywords
anonymity; social media; big data; news comments; gender prediction; word embedding; machine learning
Citation
SUSTAINABILITY, v.14, no.16
Indexed
SCIE
SSCI
SCOPUS
Journal Title
SUSTAINABILITY
Volume
14
Number
16
URI
https://scholarworks.gnu.ac.kr/handle/sw.gnu/1023
DOI
10.3390/su14169939
ISSN
2071-1050
2071-1050
Abstract
Anonymous news comment data from a news portal in South Korea, naver.com, can help conduct gender research and resolve related issues for sustainable societies. Nevertheless, only a small portion of gender information (i.e., gender distribution) is open to the public, and therefore, it has rarely been considered for gender research. Hence, this paper aims to resolve the matter of incomplete gender information and make the anonymous news comment data usable for gender research as new social media big data. This paper proposes a machine-learning-based approach for predicting the gender distribution (i.e., male and female rates) of anonymous news commenters for a news article. Initially, the big data of news articles and their anonymous news comments were collected and divided into labeled and unlabeled datasets (i.e., with and without gender information). The word2vec approach was employed to represent a news article by the characteristics of the news comments. Then, using the labeled dataset, various prediction techniques were evaluated for predicting the gender distribution of anonymous news commenters for a labeled news article. As a result, the neural network was selected as the best prediction technique, and it could accurately predict the gender distribution of anonymous news commenters of the labeled news article. Thus, this study showed that a machine-learning-based approach can overcome the incomplete gender information problem of anonymous social media users. Moreover, when the gender distributions of the unlabeled news articles were predicted using the best neural network model, trained with the labeled dataset, their distribution turned out different from the labeled news articles. The result indicates that using only the labeled dataset for gender research can result in misleading findings and distorted conclusions. The predicted gender distributions for the unlabeled news articles can help to better understand anonymous news commenters as humans for sustainable societies. Eventually, this study provides a new way for data-driven computational social science with incomplete and anonymous social media big data.
Files in This Item
There are no files associated with this item.
Appears in
Collections
College of Business Administration > Department of Management Information Systems > Journal Articles

qrcode

Items in ScholarWorks are protected by copyright, with all rights reserved, unless otherwise indicated.

Related Researcher

Researcher Suh, Jong Hwan photo

Suh, Jong Hwan
경영대학 (경영정보학과)
Read more

Altmetrics

Total Views & Downloads

BROWSE