Fragile Guardians: When Cybersecurity AI Fails and Why It’s Hard to Tell
![Feature sensitivity analysis, quantified through gradient-based methods [latex] (Eq.5) [/latex], reveals vulnerabilities in phishing website detection, and this susceptibility is further underscored by significant drift in mean SHAP attributions when subjected to adversarial perturbations [latex] (Eq.6) [/latex].](https://arxiv.org/html/2602.06395v1/figures/feature_vulnerability_2panel_hbar.png)
New research reveals that machine learning models protecting critical systems are surprisingly vulnerable to subtle attacks, and that improving their resilience can impact how easily we understand their decisions.
![The study demonstrates that state-of-the-art large language models exhibit varying performance in vulnerability detection and reasoning, as quantified by metrics including Precision [latex]RPR\_P[/latex], Recall [latex]RRR\_R[/latex], and [latex]F_1[/latex] score [latex]RF_1[/latex], and that performance is notably influenced by the prompting strategy employed-specifically, a comparison between basic prompting and the utilization of CWE-generalized prompts.](https://arxiv.org/html/2602.06687v1/x3.png)






