The impact of language variability on artificial intelligence performance in regenerative endodontics

Yükleniyor...
Küçük Resim

Tarih

May 2025

Dergi Başlığı

Dergi ISSN

Cilt Başlığı

Yayıncı

Multidisciplinary Digital Publishing Institute (MDPI)

Erişim Hakkı

info:eu-repo/semantics/openAccess

Özet

Regenerative endodontic procedures (REPs) are promising treatments for immature teeth with necrotic pulp. Artificial intelligence (AI) is increasingly used in dentistry; thus, this study evaluates the reliability of AI-generated information on REPs, comparing four AI models against clinical guidelines. Methods: ChatGPT-4o, Claude 3.5 Sonnet, Grok 2, and Gemini 2.0 Advanced were tested with 20 REP-related questions from the ESE/AAE guidelines and expert consensus. Questions were posed in Turkish and English, with or without prompts. Two specialists assessed 640 AI-generated answers via a four-point rubric. Inter-rater reliability and response accuracy were statistically analyzed. Results: Inter-rater reliability was high (0.85–0.97). ChatGPT-4o showed higher accuracy with English prompts (p < 0.05). Claude was more accurate than Grok in the Turkish (nonprompted) and English (prompted) conditions (p < 0.05). No model reached ≥80% accuracy. Claude (English, prompted) scored highest; Grok-Turkish (nonprompted) scored lowest. Conclusions: The performance of AI models varies significantly across languages. English queries yield higher accuracy. While AI shows potential for REPs information, current models lack sufficient accuracy for clinical reliance. Cautious interpretation and validation against guidelines are essential. Further research is needed to enhance AI performance in specialized dental fields.

Açıklama

Anahtar Kelimeler

Artificial İntelligence, Chatgpt, Claude, Dental Education, Endodontics, Gemini, Grok, Regenerative Endodontic Procedures

Kaynak

Healthcare (Switzerland)

WoS Q Değeri

Q2

Scopus Q Değeri

Cilt

13

Sayı

10

Künye