Predicting Epilepsy Surgery Success with AI-Powered Trajectory Analysis

Author: Denis Avetisyan

A new deep learning framework combines longitudinal MRI scans with large language models to forecast patient outcomes and provide clearer, more interpretable insights.

The framework leverages registered pre- and post-operative MRI pairs, encoded via a 3D Siamese ResNet-50, to project morphological differences into trajectory vectors-enabling retrieval of historically similar cases-and then employs a quantized large language model to synthesize a transparent, natural-language surgical prognosis based on this retrieved evidence.

Neuro-Oracle leverages trajectory analysis and retrieval-augmented generation to improve sensitivity in epilepsy surgical prognosis compared to static MRI evaluation.

Predicting successful outcomes from epilepsy surgery remains a clinical challenge, often limited by reliance on static pre-operative scans. This is addressed in ‘Neuro-Oracle: A Trajectory-Aware Agentic RAG Framework for Interpretable Epilepsy Surgical Prognosis’, which introduces a novel framework leveraging longitudinal MRI data to model disease progression and synthesize interpretable prognoses. By distilling pre-to-post-operative changes into trajectory vectors and retrieving similar surgical cases, Neuro-Oracle achieves $AUC$ values up to 0.905, surpassing baseline performance while providing structured justifications. Could this trajectory-aware, agentic approach unlock more personalized and reliable predictions for patients undergoing epilepsy surgery?

Beyond Snapshots: Tracking the Brain’s True Trajectory

Current approaches to predicting success in epilepsy surgery often depend on a single MRI scan, assessing static features like hippocampal sclerosis or lesion size. However, this snapshot fails to capture the dynamic biological processes unfolding in the brain, particularly the subtle, yet critical, changes occurring before and after surgical intervention. The brain is not a static entity; it remodels itself, and these longitudinal alterations – the rate of tissue atrophy, the evolution of network connectivity, and the response to therapeutic stimulation – hold valuable prognostic information. By neglecting the trajectory of morphological change, current methods miss crucial signals that could differentiate between patients who will achieve seizure freedom and those who will not, hindering the development of truly personalized treatment strategies.

Contemporary evaluations of epilepsy surgery outcomes often treat pre- and post-operative magnetic resonance imaging (MRI) scans as discrete data points, thereby overlooking the crucial dynamics of morphological change. This approach discards potentially vital information regarding a patient’s response to treatment, as subtle but significant alterations in brain structure – such as reductions in lesion volume or the emergence of new tissue – may indicate successful intervention or the need for adjustments. Existing analytical pipelines typically prioritize static features extracted from individual scans, failing to capture the trajectory of these changes over time. Consequently, the ability to accurately forecast seizure freedom is hampered, as the body’s inherent adaptability and the brain’s remodeling processes – central to surgical success – remain largely unquantified and underutilized in predictive models.

Predicting successful outcomes following epilepsy surgery demands a shift from analyzing static brain scans to understanding how a patient’s brain changes over time. Seizure freedom isn’t determined by a single snapshot of morphology, but by the trajectory of that morphology – the pattern of structural evolution before and after surgical intervention. A robust predictive framework must therefore encode these longitudinal changes, interpreting not just what a brain looks like, but how it arrived at that state and, crucially, where it is headed. By mapping this dynamic process, clinicians can potentially discern subtle indicators of treatment response – or impending failure – that remain hidden in conventional, cross-sectional analyses, ultimately personalizing treatment strategies and improving patient prognoses.

Combining longitudinal trajectory data <span class="katex-eq" data-katex-display="false">\Delta v</span> with a Llama-3 reasoning agent within the Neuro-Oracle pipeline maintains high diagnostic performance (AUC ≈ 0.87) comparable to single scans with k-NN (AUC = 0.793), while also enabling interpretative clinical reasoning. — Combining longitudinal trajectory data $\Delta v$ with a Llama-3 reasoning agent within the Neuro-Oracle pipeline maintains high diagnostic performance (AUC ≈ 0.87) comparable to single scans with k-NN (AUC = 0.793), while also enabling interpretative clinical reasoning.

Neuro-Oracle: Mapping the Brain’s Evolving Landscape

Neuro-Oracle employs a 3D Siamese Network architecture to quantify morphological changes observed in paired pre- and post-operative MRI scans. This network consists of two identical convolutional neural networks, each processing one of the MRI volumes. The final layers of each network produce a feature embedding, and the difference between these embeddings – the morphological delta – is then encoded into a fixed-length vector, termed the Trajectory Vector. This vector effectively captures the magnitude and direction of morphological alterations induced by the treatment, providing a condensed representation of the patient’s anatomical response for subsequent analysis and prediction.

Rigid registration and Z-score normalization constitute critical preprocessing steps in the Neuro-Oracle pipeline to facilitate accurate morphological comparison of pre- and post-operative MRI scans. Rigid registration precisely aligns the scans by applying translation and rotation, correcting for patient movement and scanner variations; this ensures that anatomical structures are spatially congruent. Following registration, Z-score normalization standardizes the voxel intensities within each scan by subtracting the mean and dividing by the standard deviation. This process mitigates the impact of differing image acquisition parameters and individual variations in tissue contrast, resulting in a consistent intensity scale across all subjects and enabling reliable quantification of morphological changes represented in the Trajectory Vector.

The Trajectory Vector, generated by the 3D Siamese Network, encapsulates the quantifiable morphological change experienced by a patient during treatment. This vector serves as a high-dimensional representation of individual treatment response, capturing the magnitude and direction of anatomical alterations observed between pre- and post-operative scans. By analyzing these patient-specific vectors, machine learning models can be trained to predict treatment outcomes, enabling a personalized approach to prognosis and potentially informing future treatment strategies. The efficacy of this predictive capability is directly linked to the vector’s ability to accurately represent the complex interplay of morphological changes unique to each patient’s response.

Neuro-Oracle (M5) demonstrates high specificity (0.921) in identifying successful temporal resections but limited sensitivity (0.566) when distinguishing them from failed or complex resections, as shown by 5-fold cross-validation.

From Similarity to Insight: Learning from Past Cases

Neuro-Oracle employs Cosine Nearest-Neighbor Search, facilitated by the FAISS library, to identify historical patient Trajectory Vectors with high geometric similarity to a new patient’s vector representation. Trajectory Vectors are multi-dimensional representations of a patient’s clinical progression, encapsulating a sequence of measurements and events. Cosine similarity quantifies the angle between these vectors; smaller angles, and therefore higher cosine values, indicate greater similarity. FAISS, optimized for large-scale similarity search, enables rapid retrieval of the k most similar historical cases from a database of previously observed patient trajectories. This retrieved set of comparable cases forms the basis for subsequent reasoning and prediction within the Neuro-Oracle system.

The system learns from comparable cases by analyzing retrieved Trajectory Vectors, which represent a patient’s medical history, to identify correlations between specific patterns and clinical outcomes. This process involves examining historical data where similar trajectory vectors are associated with either positive or negative results, allowing the system to statistically associate particular medical features or sequences of events with increased probabilities of success or failure. By quantifying these associations, the system builds a knowledge base of predictive indicators, effectively learning from the collective experiences represented in the historical data and enabling outcome prediction for new patients with similar medical profiles.

Following retrieval of similar patient Trajectory Vectors, the Neuro-Oracle system employs a quantized version of the Llama-3-8B large language model to perform reasoning. This LLM analyzes the retrieved cases, identifying relevant patterns and evidence to inform a prediction regarding the new patient’s likely outcome. The system further incorporates an Age-Gap Filter during prediction, a mechanism designed to adjust for potential biases or confounding factors introduced by differences in patient age between the retrieved historical cases and the current patient, thereby refining the prediction’s accuracy and relevance.

The 3D Siamese Network within Neuro-Oracle is optimized using a combined loss function of Supervised Contrastive Loss and Focal Loss. Supervised Contrastive Loss encourages the network to learn embeddings where successful patient outcomes are clustered closely together and distinctly separated from unsuccessful outcomes. Focal Loss addresses class imbalance – a common issue in medical datasets where the number of successful cases may significantly outweigh unsuccessful ones – by down-weighting the contribution of easily classified examples and focusing learning on harder, more informative cases. This combined approach maximizes the inter-class separation between positive and negative outcome representations, improving the network’s ability to accurately distinguish between trajectories likely to result in success versus failure.

Receiver operating characteristic (ROC) curves demonstrate that the proposed methods, alongside several baselines, achieve varying levels of performance on the EPISURG dataset (<span class="katex-eq" data-katex-display="false">N=268</span>), with Area Under the Curve (AUC) values detailed in Table 1; the Siamese Diversity Ensemble (M6) was excluded from the visualization for clarity. — Receiver operating characteristic (ROC) curves demonstrate that the proposed methods, alongside several baselines, achieve varying levels of performance on the EPISURG dataset ( $N=268$ ), with Area Under the Curve (AUC) values detailed in Table 1; the Siamese Diversity Ensemble (M6) was excluded from the visualization for clarity.

Beyond Prediction: Towards Truly Personalized Epilepsy Care

Neuro-Oracle, a novel system for predicting seizure freedom following epilepsy surgery, demonstrably outperforms existing methods when evaluated on the comprehensive EPISURG dataset. This performance gain isn’t simply incremental; the system leverages advanced techniques to analyze patient data, identifying subtle patterns indicative of successful outcomes. Rigorous testing against established baseline models reveals a consistent and significant improvement in predictive accuracy, offering the potential to more reliably identify patients likely to benefit from surgical intervention. The ability to accurately forecast seizure freedom is crucial for patient selection and pre-surgical counseling, and Neuro-Oracle represents a substantial step forward in optimizing epilepsy care through data-driven prediction.

Evaluating predictive models in epilepsy surgery presents a unique challenge due to the inherent class imbalance – significantly fewer patients achieve complete seizure freedom than those who do not. To address this, researchers employed Balanced Accuracy and the Area Under the Receiver Operating Characteristic curve (AUC-ROC) as primary performance metrics. Balanced Accuracy provides a more reliable assessment than simple accuracy by averaging the recall for both classes, preventing the model from being biased towards the majority class. AUC-ROC, meanwhile, measures the model’s ability to distinguish between patients who will achieve seizure freedom and those who will not, across all possible classification thresholds; a higher AUC indicates better discriminatory power. Utilizing these metrics ensures a comprehensive and unbiased evaluation of Neuro-Oracle’s predictive capabilities, accurately reflecting its performance on the imbalanced EPISURG dataset.

Neuro-Oracle exhibits a nuanced predictive capability by discerning between patients scheduled for Temporal Lobectomy and those undergoing Non-Temporal Resection, adapting its assessments to the specifics of each surgical approach. This tailored analysis acknowledges the distinct neurological profiles and seizure patterns associated with different resection types, moving beyond a generalized prediction model. By incorporating this surgical distinction, the system achieves a higher degree of accuracy in forecasting seizure freedom, as the underlying causes and potential outcomes vary significantly between these two patient groups. This ability to refine predictions based on surgical procedure represents a crucial step towards personalized epilepsy care, potentially optimizing patient selection and improving post-operative outcomes.

The predictive capability of Neuro-Oracle was rigorously assessed using the EPISURG dataset, resulting in an area under the receiver operating characteristic curve (AUC) of 0.905. This score signifies a substantial advancement in accurately forecasting seizure freedom following epilepsy surgery when contrasted with existing baseline methods. The high AUC value indicates the model’s robust ability to differentiate between patients who will achieve long-term seizure control and those who will not, offering a potentially valuable tool for pre-surgical planning and patient counseling. Such a marked improvement suggests the approach effectively captures complex relationships within patient data, surpassing the predictive power of traditional analytical techniques and paving the way for more personalized epilepsy treatment strategies.

A critical measure of Neuro-Oracle’s efficacy lies in its demonstrated sensitivity, reaching 0.849 – a substantial improvement over the 0.396 achieved by the baseline ResNet-50 model. This heightened sensitivity indicates a markedly increased ability to correctly identify patients who will achieve seizure freedom following epilepsy surgery, minimizing the risk of false negatives. The model, specifically configuration M3, successfully captures subtle patterns within the EPISURG dataset that were previously missed, offering a more accurate and reliable prediction of positive outcomes and potentially impacting treatment planning for a greater number of patients.

A critical component of Neuro-Oracle’s predictive power lies in the reasoning agent’s unwavering reliability; evaluations confirm a 100.0% zero-hallucination rate. This means the system consistently generates predictions grounded in the provided clinical data, avoiding the fabrication of information – a common concern with large language models. Such fidelity is paramount in medical applications, where accuracy directly impacts patient care. By eliminating unfounded inferences, the system delivers trustworthy assessments of seizure freedom, bolstering confidence in its prognostic capabilities and offering clinicians a dependable tool for informed decision-making.

The system’s ability to reliably predict seizure freedom is underpinned by a robust feature representation, evidenced by a mean cosine similarity of 0.919. This high score indicates that data points belonging to the same class – patients with similar prognoses – cluster tightly together in the feature space, a direct result of the contrastive training methodology employed. Essentially, the model learns to distinguish nuanced patterns indicative of seizure freedom by maximizing the similarity of embeddings for positive examples and minimizing it for negative ones. This tight intra-class clustering not only enhances the model’s discriminatory power but also suggests a learned understanding of the underlying neurological factors contributing to successful epilepsy surgery outcomes, leading to more trustworthy and accurate predictions.

The pursuit of ‘interpretable reasoning’ in epilepsy surgical prognosis, as detailed in this Neuro-Oracle framework, feels… familiar. It’s always the same story. They build something elegant, incorporating longitudinal MRI data and LLMs, striving for sensitivity beyond static analysis. It’s impressive, certainly. But one can almost predict the future: someone will inevitably call it AI and raise funding, ignoring the inherent fragility. It reminds one of a quote by John McCarthy: ‘It is often easier to explain why something did not work than why it did.’ Because, let’s be honest, the moment production gets its hands on this, the carefully constructed ‘interpretable reasoning’ will become a black box of error messages and desperate workarounds. They’ll claim it’s a feature, naturally.

What’s Next?

The pursuit of interpretable prognoses, even framed within a trajectory-aware architecture like Neuro-Oracle, will inevitably encounter the limits of correlation. Longitudinal MRI, while offering a richer signal than static views, still captures a biological process understood incompletely. The framework achieves improved sensitivity, a metric often pursued with zealous optimism – yet sensitivity without commensurate specificity is simply an earlier alarm. Production environments, when faced with the sheer heterogeneity of epilepsy, will reliably unearth edge cases the contrastive learning objective missed.

Future iterations will likely focus on refining the LLM’s ‘reasoning’ – a term generously applied to pattern matching within a vast text corpus. The true challenge isn’t generating plausible explanations, but verifying them. Tests, after all, are a form of faith, not certainty. A more robust approach might integrate mechanistic modeling – attempting to simulate the underlying epileptogenic network – though that introduces a new class of errors, elegantly complex and infuriatingly opaque.

The eventual utility of such a system won’t be judged by benchmark datasets, but by its performance on Mondays. A model that elegantly predicts outcomes in research settings is irrelevant if it fails catastrophically when faced with real-world data drift and unexpected patient presentations. The goal, therefore, isn’t to build a perfect oracle, but a system that fails predictably, and can be patched quickly when it inevitably does.

Original article: https://arxiv.org/pdf/2604.14216.pdf

Contact the author: https://www.linkedin.com/in/avetisyan/

Beyond Snapshots: Tracking the Brain’s True Trajectory

Neuro-Oracle: Mapping the Brain’s Evolving Landscape

From Similarity to Insight: Learning from Past Cases

Beyond Prediction: Towards Truly Personalized Epilepsy Care

What’s Next?

See also: