Does the Quality and Readability of Information Related to Varicocele Obtained from ChatGPT 4.0 Remain Consistent Across Different Models of Inquiry?open access
- Authors
- Luo, Zhao; Kam, Sung Chul; Kim, Ji Yong; Hu, Wenhao; Lin, Chuan; Park, Hyun Jun; Shin, Yu Seob
- Issue Date
- May-2025
- Publisher
- 대한남성과학회
- Keywords
- Comprehension; Infertility; Large language models; Varicocele
- Citation
- The World Journal of Men's Health, v.44, no.1, pp 161 - 170
- Pages
- 10
- Indexed
- SCIE
SCOPUS
KCI
- Journal Title
- The World Journal of Men's Health
- Volume
- 44
- Number
- 1
- Start Page
- 161
- End Page
- 170
- URI
- https://scholarworks.gnu.ac.kr/handle/sw.gnu/79115
- DOI
- 10.5534/wjmh.240331
- ISSN
- 2287-4208
2287-4690
- Abstract
- Purpose There is a growing tendency of individuals resorting to Chat-Generative Pretrained Transformer (ChatGPT) as a source of medical information on specific ailments. Varicocele is a prevalent condition affecting the male reproductive system. The quality, readability, and consistency of the information related to varicocele that individuals obtain through interactive access to ChatGPT remains uncertain. Materials and Methods This study employed Google Trends data to extract 25 trending questions since 2004. Two distinct inquiry methodologies were employed with ChatGPT 4.0: repetition mode (each question repeated three times) and cyclic mode (each question input once in three consecutive cycles). The generated texts were evaluated according to a number of criteria, including the Automated Readability Index (ARI), the Flesch Reading Ease Score (FRES), the Gunning Fog Index (GFI), the DISCERN score and the Ensuring Quality Information for Patients (EQIP). Kruskal-Wallis and Mann-Whitney U tests were employed to compare the text quality, readability, and consistency between the two modes. Results The results demonstrated that the texts generated in repetition and cyclic modes exhibited no statistically significant differences in ARI (12.06 +/- 1.29 vs. 12.27 +/- 1.74), FRES (36.08 +/- 8.70 vs. 36.87 +/- 7.73), GFI (13.14 +/- 1.81 vs. 13.25 +/- 1.50), DISCERN scores (38.08 +/- 6.55 vs. 38.35 +/- 6.50) and EQIP (47.92 +/- 6.84 vs. 48.35 +/- 5.56) (p>0.05). These findings indicate that ChatGPT 4.0 consistently produces information of comparable complexity and quality across different inquiry modes. Conclusions This study found that ChatGPT-generated medical information on "varicocele" demonstrates consistent quality and readability across different modes, highlighting its potential for stable healthcare information provision. However, the content's complexity poses challenges for general readers, and notable limitations in quality and reliability highlight the need for improved accuracy, credibility, and readability in AI-generated medical content.
- Files in This Item
- There are no files associated with this item.
- Appears in
Collections - College of Medicine > Department of Medicine > Journal Articles

Items in ScholarWorks are protected by copyright, with all rights reserved, unless otherwise indicated.