J Neurol. 2025 Aug 22;272(9):586. doi: 10.1007/s00415-025-13261-3.
ABSTRACT
BACKGROUND AND OBJECTIVES: Accurate interpretation of electrodiagnostic (EDX) studies is essential for the diagnosis and management of neuromuscular disorders. Artificial intelligence (AI) based tools may improve consistency and quality of EDX reporting and reduce workload. The aim of this study is to evaluate the performance of an AI-assisted, multi-agent framework (INSPIRE) in comparison with standard physician interpretation in a randomized controlled trial (RCT).
METHODS: We prospectively enrolled 200 patients (out of 363 assessed for eligibility) referred for EDX. Patients were randomly assigned to either a control group (physician-only interpretation) or an intervention group (physician-AI interpretation). Three board-certified physicians, rotated across both arms. In the intervention group, an AI-generated preliminary report was combined with the physician’s independent findings (human-AI integration). The primary outcome was EDX report quality using score we developed named AI-Generated EMG Report Score (AIGERS; score range, 0-1, with higher scores indicating more accurate or complete reports). Secondary outcomes included a physician-reported AI integration rating score (PAIR) and a compliance survey evaluating ease of AI adoption.
RESULTS: Of the 200 enrolled patients, 100 were allocated to AI-assisted interpretation and 100 to physician-only reporting. While AI-generated preliminary reports offered moderate consistency on the AIGERS metric, the integrated (physician-AI) approach did not significantly outperform physician-only methods. Despite some anecdotal advantages such as efficiency in suggesting standardized terminology quantitatively, the AIGERS scores for physician-AI integration was nearly the same as those in the physician-only arm and did not reach statistical significance (p > 0.05 for all comparisons). Physicians reported variable acceptance of AI suggestions, expressing concerns about the interpretability of AI outputs and workflow interruptions. Physician-AI collaboration scores showed moderate trust in the AI’s suggestions (mean 3.7/5) but rated efficiency (2.0/5), ease of use (1.7/5), and workload reduction (1.7/5) as poor, indicating usability challenges and workflow interruptions.
DISCUSSION: In this single-center, randomized trial, AI-assisted EDX interpretation did not demonstrate a significant advantage over conventional physician-only interpretation. Nevertheless, the AI framework may help reduce workload and documentation burdens by handling simpler, routine EDX tests freeing physicians to focus on more complex cases that require greater expertise.
TRIAL REGISTRATION: ClinicalTrials.gov Identifier: NCT06902675.
PMID:40844612 | DOI:10.1007/s00415-025-13261-3