The impact of language variability on artificial intelligence performance in regenerative endodontics
Tarih
Dergi Başlığı
Dergi ISSN
Cilt Başlığı
Yayıncı
Erişim Hakkı
Özet
Regenerative endodontic procedures (REPs) are promising treatments for immature teeth with necrotic pulp. Artificial intelligence (AI) is increasingly used in dentistry; thus, this study evaluates the reliability of AI-generated information on REPs, comparing four AI models against clinical guidelines. Methods: ChatGPT-4o, Claude 3.5 Sonnet, Grok 2, and Gemini 2.0 Advanced were tested with 20 REP-related questions from the ESE/AAE guidelines and expert consensus. Questions were posed in Turkish and English, with or without prompts. Two specialists assessed 640 AI-generated answers via a four-point rubric. Inter-rater reliability and response accuracy were statistically analyzed. Results: Inter-rater reliability was high (0.85–0.97). ChatGPT-4o showed higher accuracy with English prompts (p < 0.05). Claude was more accurate than Grok in the Turkish (nonprompted) and English (prompted) conditions (p < 0.05). No model reached ≥80% accuracy. Claude (English, prompted) scored highest; Grok-Turkish (nonprompted) scored lowest. Conclusions: The performance of AI models varies significantly across languages. English queries yield higher accuracy. While AI shows potential for REPs information, current models lack sufficient accuracy for clinical reliance. Cautious interpretation and validation against guidelines are essential. Further research is needed to enhance AI performance in specialized dental fields.












