Predicting Diabetic Foot Complications with Interpretable AI

Author: Denis Avetisyan


Researchers have developed a new deep learning model to forecast post-discharge foot complications for diabetic patients, prioritizing both accuracy and the ability to understand why predictions are made.

(a)Risk of Future Foot Complication (Risk 1) The study demonstrates that individuals exhibiting heightened sensitivity to foot ulceration-indicated by a plantar pressure exceeding $10 N/cm^2$-experience a statistically significant increase in the probability of developing future foot complications, thereby highlighting the critical importance of early pressure management in preventative podiatric care.
(a)Risk of Future Foot Complication (Risk 1) The study demonstrates that individuals exhibiting heightened sensitivity to foot ulceration-indicated by a plantar pressure exceeding $10 N/cm^2$-experience a statistically significant increase in the probability of developing future foot complications, thereby highlighting the critical importance of early pressure management in preventative podiatric care.

This work introduces CRISPNAM-FG, a neural additive model combining the Fine-Gray formulation for competing risks survival analysis with intrinsic interpretability features.

While deep learning excels at predictive power, its ‘black-box’ nature hinders clinical adoption in complex medical scenarios. This is addressed in ‘Interpretable Fine-Gray Deep Survival Model for Competing Risks: Predicting Post-Discharge Foot Complications for Diabetic Patients in Ontario’, which introduces CRISPNAM-FG, a novel model combining neural additive models with the Fine-Gray formulation for transparent, interpretable survival analysis. By predicting cumulative incidence functions with feature-level clarity, CRISPNAM-FG demonstrates competitive performance on benchmark datasets and, crucially, offers auditable predictions for post-discharge foot complications in diabetic patients across Ontario. Can this approach unlock wider trust and integration of AI-driven insights within routine clinical practice?


The Inherent Limitations of Conventional Survival Analysis

Conventional survival analysis, while foundational, frequently encounters limitations when applied to the intricacies of healthcare data. These methods often assume a single endpoint of interest, failing to adequately account for the frequent occurrence of competing events – instances where a patient experiences an outcome that prevents the occurrence of the event being studied. For example, a cancer patient might succumb to heart disease before experiencing cancer recurrence, or a patient undergoing a specific treatment might withdraw from the study due to unrelated complications. This simplification can lead to biased estimates of the true time-to-event, misrepresenting the factors influencing patient survival and hindering the development of effective interventions. Consequently, researchers are increasingly turning to more sophisticated techniques capable of simultaneously modeling multiple potential outcomes, providing a more realistic and nuanced understanding of patient trajectories and ultimately improving clinical predictions.

The accurate interpretation of time-to-event data in healthcare necessitates analytical methods capable of addressing competing risks – instances where a patient might experience an event that precludes the occurrence of the primary event of interest. Traditional survival analysis often assumes a single endpoint, which can lead to biased estimates when multiple outcomes are possible, such as death, remission, or disease progression. Sophisticated modeling techniques, including multi-state models and competing risks regression, allow researchers to simultaneously evaluate the probabilities of different events, providing a more comprehensive and realistic picture of patient trajectories. By accounting for these competing pathways, clinicians and researchers gain a nuanced understanding of factors influencing survival and can refine predictions of individual patient outcomes, ultimately leading to improved treatment strategies and resource allocation.

The ability to accurately forecast healthcare events – such as disease progression, readmission, or mortality – directly informs clinical practice and optimizes the deployment of finite resources. Precise predictions enable physicians to personalize treatment plans, proactively intervene before adverse events, and improve patient stratification for clinical trials. Furthermore, hospitals and healthcare systems leverage these forecasts to anticipate demand for services, allocate staff and equipment efficiently, and minimize costs associated with preventable complications. Beyond individual patient care, accurate event prediction underpins public health initiatives, allowing for targeted interventions and resource distribution during outbreaks or health crises, ultimately maximizing the impact of healthcare investments and improving population health outcomes.

Advancing Beyond Proportional Hazards: A Modern Toolkit

The Fine-Gray model addresses competing risks by shifting the focus from overall survival to the sub-distribution hazard. Traditional survival analysis assumes events are independent, which is often invalid when multiple potential terminating events exist; for example, death from a disease versus death from other causes. The sub-distribution hazard, $h(t)$, estimates the instantaneous risk of experiencing a specific event of interest, given that an event has not yet occurred, while accounting for the possibility of competing events. This is mathematically defined as $h(t) = \lim_{\Delta t \to 0} \frac{P(T = t | T \ge t, A = 1)}{\Delta t \cdot S(t)}$, where $T$ is time, $A$ indicates the event of interest, and $S(t)$ is the overall survival function. By modeling the hazard for each competing event individually, the Fine-Gray model provides more targeted and accurate estimates of the risk associated with the event of interest, offering a distinct advantage over standard Kaplan-Meier or Cox proportional hazards models in scenarios with competing risks.

Deep learning methods, including Neural Fine-Gray and DeepHit, build upon the foundational Fine-Gray model by incorporating neural networks to address limitations in modeling complex, non-linear relationships between covariates and sub-distribution hazards. Traditional Fine-Gray models rely on proportional hazards assumptions which may not hold true in many clinical scenarios. Neural Fine-Gray and DeepHit utilize neural networks to directly model the hazard function, allowing for more flexible and accurate risk predictions, particularly when dealing with high-dimensional data and intricate interactions. These techniques estimate the cumulative incidence function without requiring parametric assumptions about the hazard, leading to improved discrimination and calibration compared to traditional methods, as demonstrated in several comparative studies.

Neural network-based methods, such as Neural Fine-Gray and DeepHit, enhance risk prediction by modeling non-linear relationships and interactions within high-dimensional healthcare datasets. Traditional statistical models often assume proportional hazards or require explicit specification of interactions, limiting their ability to capture complex dependencies. Neural networks, conversely, automatically learn these interactions through multiple layers of interconnected nodes, allowing for a more flexible and data-driven approach. This capacity is particularly valuable in healthcare, where patient outcomes are influenced by numerous factors – genetics, lifestyle, comorbidities, and treatment effects – that may interact in non-obvious ways. The resulting models can therefore provide more refined and accurate risk predictions compared to conventional techniques, potentially improving clinical decision-making and patient care.

CRISPNAM-FG: A Pathway to Interpretable Deep Survival Analysis

CRISPNAM-FG addresses the need for interpretable machine learning in survival analysis by integrating Neural Additive Models (NAM) with the Fine-Gray framework. NAM decomposes model predictions into the additive contributions of individual features, allowing for direct assessment of each feature’s effect on predicted risk. This decomposition is achieved through a neural network architecture designed to estimate these individual feature effects, rather than learning complex interactions. The Fine-Gray framework is then utilized to model the cumulative hazard function, enabling accurate survival prediction while preserving the interpretability provided by the NAM component. This combination results in a model where the contribution of each feature to the predicted hazard is explicitly quantifiable, facilitating understanding of the factors driving risk assessment.

CRISPNAM-FG utilizes Neural Additive Models (NAM) to facilitate interpretability by expressing the predicted risk score as a sum of individual feature effects. This decomposition allows for a direct assessment of each feature’s contribution to the model’s output; the effect of each feature is estimated independently of other features, yielding a unique additive contribution. Consequently, the impact of each predictor on a patient’s risk can be quantified and inspected, providing transparency into the model’s decision-making process and enabling identification of key drivers of risk. This approach contrasts with “black box” deep learning models where feature importance is often inferred through post-hoc methods.

Performance evaluations using the GEMINI dataset indicate CRISPNAM-FG achieves competitive results when benchmarked against established survival analysis methods. Key performance indicators, including Time-dependent Area Under the Curve (TD-AUC) and Time-dependent Calibration Index (TD-CI), demonstrate strong predictive capability. Notably, CRISPNAM-FG surpasses the performance of alternative models specifically within the Framingham Heart Study (FHS) dataset, as measured by TD-AUC. These results suggest CRISPNAM-FG provides a viable and potentially superior option for risk prediction and survival analysis in relevant datasets.

Evaluation of CRISPNAM-FG reveals a generally higher Brier Score compared to the DEEPHIT model, suggesting a slight reduction in calibration accuracy. The Brier Score, a measure of the accuracy of probabilistic predictions, indicates that CRISPNAM-FG’s predicted probabilities are, on average, somewhat further from the actual outcomes than those produced by DEEPHIT. Despite this, CRISPNAM-FG maintains overall competitive performance when benchmarked against established survival analysis methods, demonstrating comparable predictive power despite the minor calibration difference. Performance metrics such as the Time-Dependent Area Under the Curve (TD-AUC) and Time-Dependent Calibration Index (TD-CI) confirm CRISPNAM-FG’s viability as a predictive model.

CRISPNAM-FG effectively identifies key risk factors within the GEMINI Foot Complication dataset, as demonstrated by the shape functions generated from ten randomly selected features among the most impactful.
CRISPNAM-FG effectively identifies key risk factors within the GEMINI Foot Complication dataset, as demonstrated by the shape functions generated from ten randomly selected features among the most impactful.

Beyond Prediction: Illuminating the ‘Why’ with Model Explainability

Although CRISPNAM-FG provides a degree of inherent understandability, the complexity of deep learning models frequently necessitates supplementary explanation techniques when applied to survival analysis. Methods like SHAP (SHapley Additive exPlanations) and LIME (Local Interpretable Model-agnostic Explanations) offer post-hoc insights into model behavior, revealing which features most strongly influence predictions for individual patients. SHAP, grounded in game theory, distributes feature importance based on their contribution to the prediction, while LIME approximates the model locally with a simpler, interpretable model. By combining intrinsically interpretable models like CRISPNAM-FG with these post-hoc methods, researchers can gain a more comprehensive understanding of why a model makes a specific prediction, fostering greater confidence and facilitating the translation of predictive results into meaningful clinical applications.

Post-hoc explanation methods, such as SHAP and LIME, dissect the decision-making processes within complex deep learning models, revealing which features most strongly influence predictions for individual cases. These techniques don’t simply identify overall feature importance; they pinpoint the specific drivers behind each prediction, offering a granular understanding of why a model arrived at a particular outcome. This detailed level of insight is paramount for building trust in predictive systems, particularly within high-stakes domains like healthcare, where clinicians require justification for algorithmic recommendations. By illuminating the “black box” of deep learning, these methods empower stakeholders to validate model behavior, identify potential biases, and ultimately, confidently integrate these tools into clinical workflows, fostering a collaborative relationship between artificial intelligence and expert judgment.

The ultimate value of predictive modeling in healthcare hinges not merely on accuracy, but on the capacity to transform forecasts into meaningful clinical action. Simply predicting a patient’s risk of an event is insufficient; clinicians require an understanding of why a prediction was made to confidently integrate it into treatment plans. Therefore, combining inherently interpretable models, such as CRISPNAM-FG, with post-hoc explanation techniques like SHAP and LIME is essential. This synergy illuminates the factors driving each prediction, offering clinicians a nuanced view beyond overall risk scores. By revealing feature importance and individual prediction drivers, these tools empower medical professionals to validate model outputs, identify potential biases, and ultimately, make more informed, patient-centered decisions, moving beyond prediction towards true, actionable intelligence.

The pursuit of predictive accuracy, as demonstrated by CRISPNAM-FG’s application to diabetic foot complications, echoes a fundamental tenet of robust engineering. The model’s emphasis on interpretability, achieved through neural additive models and the Fine-Gray formulation, isn’t merely a desirable feature but a necessary validation step. As Vinton Cerf once stated, “Optimizations without analysis are self-deception.” CRISPNAM-FG’s structure facilitates precisely this analytical rigor, allowing for verification of the model’s logic and a deeper understanding of the contributing factors to post-discharge complications – a level of transparency often sacrificed in the pursuit of ‘black box’ predictive performance. This analytical approach aligns with a mathematically sound and provable solution, rather than empirical success alone.

What Lies Ahead?

The presented CRISPNAM-FG model, while a step towards tractable inference in competing risks survival analysis, merely addresses the symptom, not the disease. The fundamental issue remains the reliance on empirical risk minimization. A truly elegant solution would derive from a first-principles formulation of the cumulative incidence function, perhaps leveraging techniques from optimal transport theory to enforce probabilistic constraints directly within the neural network architecture. The current approach, built upon approximating the Fine-Gray formulation, is inherently susceptible to biases introduced by the approximation itself.

Further investigation must address the limitations inherent in additive models. While feature-level interpretability is valuable, the assumption of independence – or even simple additivity – between risk factors is demonstrably naive. Exploring alternative architectures, such as those incorporating graph neural networks to model complex interactions, promises a more accurate – and, with careful design, still interpretable – representation of the underlying risk landscape. The question is not merely prediction, but understanding – and understanding demands a model that reflects the true generative process.

Ultimately, the field requires a shift in perspective. Survival analysis should not be treated as a specialized application of machine learning, but rather as a problem demanding mathematical rigor. The pursuit of ‘good enough’ performance, measured by metrics like C-statistics, is a distraction. The true benchmark is provable correctness – a model whose behavior can be formally verified, ensuring that its predictions are not merely statistically plausible, but logically sound.


Original article: https://arxiv.org/pdf/2511.12409.pdf

Contact the author: https://www.linkedin.com/in/avetisyan/

See also:

2025-11-19 01:41