Does the Quality and Readability of Information Related to Varicocele Obtained from ChatGPT 4.0 Remain Consistent Across Different Models of Inquiry?

Luo, Zhao; Kam, Sung Chul; Kim, Ji Yong; Hu, Wenhao; Lin, Chuan; Park, Hyun Jun; Shin, Yu Seob

doi:10.5534/wjmh.240331

상세 보기

Does the Quality and Readability of Information Related to Varicocele Obtained from ChatGPT 4.0 Remain Consistent Across Different Models of Inquiry?

Luo, Zhao;
Kam, Sung Chul;
Kim, Ji Yong;
Hu, Wenhao;
Lin, Chuan;
외 2명

Citations

WEB OF SCIENCE

0

Citations

SCOPUS

0

초록

Purpose There is a growing tendency of individuals resorting to Chat-Generative Pretrained Transformer (ChatGPT) as a source of medical information on specific ailments. Varicocele is a prevalent condition affecting the male reproductive system. The quality, readability, and consistency of the information related to varicocele that individuals obtain through interactive access to ChatGPT remains uncertain. Materials and Methods This study employed Google Trends data to extract 25 trending questions since 2004. Two distinct inquiry methodologies were employed with ChatGPT 4.0: repetition mode (each question repeated three times) and cyclic mode (each question input once in three consecutive cycles). The generated texts were evaluated according to a number of criteria, including the Automated Readability Index (ARI), the Flesch Reading Ease Score (FRES), the Gunning Fog Index (GFI), the DISCERN score and the Ensuring Quality Information for Patients (EQIP). Kruskal-Wallis and Mann-Whitney U tests were employed to compare the text quality, readability, and consistency between the two modes. Results The results demonstrated that the texts generated in repetition and cyclic modes exhibited no statistically significant differences in ARI (12.06 +/- 1.29 vs. 12.27 +/- 1.74), FRES (36.08 +/- 8.70 vs. 36.87 +/- 7.73), GFI (13.14 +/- 1.81 vs. 13.25 +/- 1.50), DISCERN scores (38.08 +/- 6.55 vs. 38.35 +/- 6.50) and EQIP (47.92 +/- 6.84 vs. 48.35 +/- 5.56) (p>0.05). These findings indicate that ChatGPT 4.0 consistently produces information of comparable complexity and quality across different inquiry modes. Conclusions This study found that ChatGPT-generated medical information on "varicocele" demonstrates consistent quality and readability across different modes, highlighting its potential for stable healthcare information provision. However, the content's complexity poses challenges for general readers, and notable limitations in quality and reliability highlight the need for improved accuracy, credibility, and readability in AI-generated medical content.

키워드

Comprehension; Infertility; Large language models; Varicocele; HEALTH

제목: Does the Quality and Readability of Information Related to Varicocele Obtained from ChatGPT 4.0 Remain Consistent Across Different Models of Inquiry?

저자: Luo, Zhao; Kam, Sung Chul; Kim, Ji Yong; Hu, Wenhao; Lin, Chuan; Park, Hyun Jun; Shin, Yu Seob

DOI: 10.5534/wjmh.240331

발행일: 2026-01

유형: Article

저널명: The World Journal of Men's Health

권: 44

호: 1

페이지: 161 ~ 170