Genomic AI Under Attack: Securing Variant Prediction Against Subtle Manipulation

Author: Denis Avetisyan


New research reveals how carefully crafted prompts can mislead genomic foundation models, highlighting the need for robust security auditing.

Secure Agentic Genomic Evaluator-SAGE-assesses the robustness of genomic foundation models against subtle manipulations of input prompts, offering an interpretable and automated audit of potential misalignment without disrupting the model’s inherent learning processes-a necessary precaution given that every architectural choice foreshadows eventual failure within these complex ecosystems.
Secure Agentic Genomic Evaluator-SAGE-assesses the robustness of genomic foundation models against subtle manipulations of input prompts, offering an interpretable and automated audit of potential misalignment without disrupting the model’s inherent learning processes-a necessary precaution given that every architectural choice foreshadows eventual failure within these complex ecosystems.

This study introduces SAGE, an agentic framework for auditing the robustness of ESM-based variant predictors against adversarial soft prompt attacks and generating interpretable vulnerability reports.

While genomic foundation models excel at predicting variant effects, their susceptibility to subtle, adversarial manipulation remains a critical, largely unaddressed security challenge. This work, ‘Biosecurity-Aware AI: Agentic Risk Auditing of Soft Prompt Attacks on ESM-Based Variant Predictors’, introduces the Secure Agentic Genomic Evaluator (SAGE), an automated framework demonstrating that even state-of-the-art models like ESM2 are vulnerable to carefully crafted “soft prompt” attacks. SAGE provides interpretable risk assessments without altering the underlying model, revealing previously hidden weaknesses in these increasingly vital biomedical tools. Will proactive, agentic auditing become essential for ensuring the reliable and safe deployment of genomic foundation models in clinical settings?


The Fragile Foundation: Genomic Prediction Under Scrutiny

The promise of personalized medicine hinges on the ability to accurately predict how genetic variations – the subtle differences in an individual’s $DNA$ sequence – impact health and disease. However, current methods for variant effect prediction face significant hurdles in achieving both generalization and robustness. These approaches often excel at identifying effects within the datasets they were trained on, but struggle to reliably predict effects for novel variants or in diverse populations. This limitation stems from the complexity of genomic interactions – a single variant rarely acts in isolation – and the limited size and bias present in training datasets. Consequently, predictions can be inconsistent, leading to inaccurate risk assessments and potentially hindering the development of truly personalized therapies. Overcoming these challenges requires innovative approaches capable of capturing the intricate relationships within the genome and demonstrating consistent performance across varied genetic backgrounds.

Traditional genomic prediction methods frequently operate under limitations stemming from an inability to fully capture the intricate interplay between genetic variants and their effects on complex traits. These approaches often treat each variant in isolation, or consider only pairwise interactions, failing to account for the higher-order relationships and epistatic effects-where the effect of one gene is modified by another-that are pervasive within the genome. This simplification overlooks the fact that genomic data isn’t simply a collection of individual signals, but a highly interconnected network where the function of any given element is profoundly influenced by its context. Consequently, predictive accuracy suffers, particularly when attempting to extrapolate beyond the datasets used for training, hindering the development of truly personalized and robust genomic medicine.

Genomic Foundation Models, analogous to large language models but trained on the vast landscape of genomic data, represent a potential leap forward in predicting the effects of genetic variations. However, their complexity introduces a critical vulnerability: susceptibility to adversarial attacks. Subtle, intentionally crafted alterations to genomic input – imperceptible to human analysis – can mislead these models, resulting in inaccurate predictions of variant effects and potentially flawed clinical interpretations. This poses a significant risk, as reliance on compromised predictions could lead to misdiagnosis or inappropriate treatment strategies. Therefore, rigorous evaluation of these models’ robustness against adversarial perturbations is paramount, necessitating the development of robust auditing frameworks and adversarial training techniques before widespread clinical application can be responsibly considered.

The translation of genomic prediction into clinical practice demands more than just algorithmic advancement; it requires comprehensive and robust auditing frameworks. These frameworks must move beyond simple benchmark evaluations and actively probe for adversarial vulnerabilities – subtle genomic variations deliberately engineered to mislead predictions. Such assessments should encompass diverse datasets, representing a broad spectrum of ancestries and disease presentations, to identify biases and ensure generalizability. Rigorous auditing isn’t merely about confirming accuracy; it’s about establishing confidence intervals, quantifying uncertainty, and defining clear failure modes. Without such systematic evaluation, the potential for misdiagnosis or ineffective treatment stemming from flawed genomic predictions remains a significant concern, hindering the responsible implementation of personalized medicine and eroding public trust in these powerful new technologies.

SAGE: An Agentic System for Unmasking Model Weaknesses

SAGE’s agentic framework utilizes a distributed system of specialized agents to perform comprehensive model auditing. Each agent is assigned a specific role in the probing process, such as generating adversarial examples, evaluating model responses, or analyzing vulnerability patterns. These agents operate iteratively and collaboratively, with outputs from one agent serving as inputs for others, enabling a systematic exploration of potential model weaknesses. This decomposition of the auditing task into modular agents facilitates parallelization and scalability, allowing for more efficient and thorough evaluation compared to monolithic approaches. The framework also supports the dynamic addition or modification of agents to address evolving threat landscapes or specific model characteristics.

SAGE’s core functionality centers on ‘Soft Prompt Attacks’, a method of adversarial testing applied to Genomic Foundation Models. These attacks involve introducing subtle, human-imperceptible perturbations to input sequences – typically through the addition or modification of token embeddings – to assess model robustness. Unlike traditional adversarial attacks that create easily detectable noise, soft prompts aim to exploit vulnerabilities without drastically altering the input, making them more representative of real-world scenarios where minor data corruption or noise may occur. By systematically applying these soft prompt attacks and observing resulting model outputs, SAGE identifies potential weaknesses in the model’s predictive capabilities and highlights areas where even small input changes can lead to inaccurate or unintended results. This approach allows for the discovery of vulnerabilities that might be missed by more overt adversarial testing methods.

SAGE incorporates Large Language Models (LLMs) to automate the creation of audit reports, converting technical findings into accessible, human-readable summaries. These LLMs process the data generated during the vulnerability probing process – including input perturbations, model responses, and identified weaknesses – and synthesize it into a narrative format. Reports detail the specific tests conducted, the observed model behavior, the severity of identified vulnerabilities, and potential mitigation strategies. This automated report generation streamlines the auditing process, reducing the manual effort required to interpret results and communicate findings to stakeholders, and facilitating consistent documentation across multiple audits.

Traditional evaluations of Genomic Foundation Models often rely on static datasets and pre-defined metrics, limiting their ability to uncover subtle vulnerabilities or comprehensively assess model robustness. In contrast, an agentic framework, such as SAGE, facilitates a more nuanced assessment by employing multiple agents that systematically probe the model with varied and dynamically generated inputs. This approach allows for the exploration of a broader input space and the identification of weaknesses that static evaluations might miss. The iterative and adaptive nature of the agentic system enables a deeper understanding of model behavior under different conditions, providing a more comprehensive evaluation of its reliability and potential failure modes than is achievable with fixed, pre-defined test sets.

Targeted soft prompt attacks significantly alter the predicted label distribution across both the CM and ARM datasets, as evidenced by the change in predicted logit likelihood ratio (ΔPLLR) for each label.
Targeted soft prompt attacks significantly alter the predicted label distribution across both the CM and ARM datasets, as evidenced by the change in predicted logit likelihood ratio (ΔPLLR) for each label.

Benchmarking Resilience: A Comparative Analysis of Genomic Foundation Models

Evaluation of genomic foundation model robustness was conducted utilizing the Sequence Alignment Guided Enhancement (SAGE) framework. The models included in this assessment were ESM-1b, ESM2-150M, ESM2-650M, and ESM1v, representing a range of architectures and parameter sizes within the ESM family. SAGE facilitated a standardized approach to assess performance consistency and susceptibility to perturbations across these models, forming the basis for comparative analysis detailed in subsequent results. This evaluation setup allowed for quantifiable metrics to be generated for each model under identical testing conditions.

Model robustness evaluations utilized datasets focused on two distinct cardiac disease contexts: Arrhythmia and Cardiomyopathy. The Arrhythmia dataset represents genomic sequences associated with irregular heart rhythms, while the Cardiomyopathy dataset comprises sequences linked to diseases affecting the heart muscle. This dual-context approach allowed for assessment of model performance across varying genomic features and disease manifestations, providing a more comprehensive understanding of potential vulnerabilities beyond a single disease type.

Evaluation of model robustness utilized the Area Under the Receiver Operating Characteristic curve (AUROC) and Area Under the Precision-Recall curve (AUPR) as key performance indicators. Targeted soft prompt attacks against the ESM2-150M model resulted in a measurable decrease in performance on both the Cardiomyopathy (CM) and Arrhythmia (ARM) datasets; specifically, AUROC values dropped by 0.07 on the CM dataset and 0.10 on the ARM dataset following adversarial prompting. These results indicate a susceptibility to input perturbations even with current state-of-the-art Genomic Foundation Models.

Evaluation of model robustness demonstrated that targeted attacks on ESM-1b resulted in a 0.07 Area Under the Receiver Operating Characteristic (AUROC) decrease when tested on the Cardiomyopathy (CM) dataset. Similarly, attacks on ESM1v led to a 0.13 reduction in Area Under the Precision-Recall curve (AUPR) when assessed using the Arrhythmia (ARM) dataset. These results indicate that even currently advanced Genomic Foundation Models are vulnerable to adversarial perturbations, potentially impacting the reliability of predictions in clinical applications.

Unveiling the Limits of Scale: The Fragility of Large Models

Recent analyses demonstrate a surprising disconnect between model size, accuracy, and resilience against adversarial attacks. While larger language models consistently achieve higher scores on standard benchmarks, indicating improved performance on typical inputs, this enhanced accuracy doesn’t automatically translate to greater robustness. These models, despite their increased capacity and complex architectures, can be disproportionately vulnerable to subtly perturbed inputs – adversarial examples – designed to intentionally mislead them. This suggests that simply scaling up model parameters, a common strategy for improving performance, isn’t a sufficient solution for building truly reliable artificial intelligence systems, and that alternative approaches focusing on defensive mechanisms are crucial for addressing these vulnerabilities.

The foundation of a large language model’s vulnerability, or conversely, its resilience, is significantly influenced by the pretraining objective employed during its initial development. Specifically, Masked Language Modeling (MLM), a common technique where the model learns to predict intentionally hidden words within a sentence, appears to cultivate certain biases that adversarial attacks can readily exploit. While MLM effectively teaches models to understand contextual relationships, it doesn’t necessarily instill a robust understanding of semantic meaning, leaving them susceptible to subtle, yet impactful, perturbations in input text. This suggests that the very method used to build a model’s linguistic understanding can inadvertently create weaknesses, highlighting the need for careful consideration of pretraining objectives and the development of techniques to fortify models against targeted manipulations. The reliance on predicting masked tokens, rather than deeply understanding the underlying concepts, appears to be a key factor in determining a model’s overall robustness.

Recent investigations demonstrate that the modular fine-tuning pipeline, termed DYNA, offers a compelling pathway toward bolstering the resilience of large language models. This approach strategically decomposes the refinement process into distinct, adaptable modules, allowing for targeted interventions to address specific vulnerabilities. Through carefully designed training stages, DYNA aims to enhance a model’s robustness not by simply increasing scale, but by actively shaping its internal representations to be less susceptible to adversarial manipulation. Initial findings suggest that this modularity facilitates a more efficient and effective transfer of robustness, potentially offering a practical solution for mitigating risks associated with increasingly powerful, yet fragile, language technologies. The pipeline’s flexibility allows researchers to customize its components, tailoring the defense strategy to the unique characteristics of both the model and the anticipated threats.

Rigorous statistical analysis confirmed the demonstrable impact of the adversarial attacks employed in this study. A paired t-test, conducted on benign samples sourced from the CM dataset, yielded a highly significant p-value of $9.23 \times 10^{-4}$. This result provides strong evidence against the null hypothesis, indicating that the observed performance degradation was not due to random chance. The low p-value substantiates the claim that the attacks effectively disrupted the models’ predictive capabilities, highlighting a critical vulnerability even in systems exhibiting high initial accuracy. This quantitative validation underscores the necessity for developing robust defense mechanisms against such targeted perturbations.

Towards Trustworthy Genomic Prediction: A Path Forward

The robustness of genomic prediction systems, as evaluated by the SAGE framework, is poised for significant expansion. Future iterations will deliberately subject these systems to a more diverse and challenging array of adversarial attacks, moving beyond current methods to simulate increasingly sophisticated manipulation attempts. This broadened scope will not be limited to attack vectors; the evaluation will also incorporate a substantially wider range of genomic datasets, encompassing varied ancestries, data qualities, and disease complexities. By stress-testing genomic foundation models against a more realistic threat landscape and across diverse genomic backgrounds, researchers aim to identify previously unknown vulnerabilities and ultimately develop more resilient and trustworthy prediction tools. This continuous evaluation process is crucial for ensuring the reliable application of genomic information in clinical and research settings.

A critical step towards securing genomic prediction lies in understanding how vulnerabilities propagate between different Genomic Foundation Models. Research indicates that adversarial attacks – carefully crafted inputs designed to mislead the model – are not necessarily model-specific; a weakness exploited in one system may readily transfer to another, even with differing architectures or training data. This transferability poses a significant risk, as successfully breaching a single model could compromise an entire ecosystem of genomic tools. Consequently, a key research priority involves systematically evaluating the extent of this vulnerability sharing, identifying common failure modes, and developing defense strategies that generalize across various model types. Understanding how and why these vulnerabilities transfer is crucial for building robust and reliable genomic prediction systems, ensuring consistent performance and preventing widespread manipulation.

Addressing the identified vulnerabilities in genomic prediction models requires dedicated investigation into mitigation strategies. Current research prioritizes techniques like adversarial training, where models are intentionally exposed to subtly altered, malicious inputs during the learning process, effectively ‘inoculating’ them against future attacks. Complementary to this is robust regularization, a method that penalizes overly complex model parameters, encouraging solutions that generalize better and are less susceptible to manipulation. These approaches aim not simply to detect adversarial examples, but to fundamentally improve the model’s resilience, ensuring accurate predictions even when faced with deliberately deceptive data. The development of effective mitigation strategies is paramount to building trustworthy genomic prediction systems capable of reliable performance in real-world applications, and ongoing research focuses on optimizing these techniques for both accuracy and computational efficiency.

Establishing lasting confidence in genomic prediction necessitates more than just initial validation; a continuous auditing pipeline is crucial for maintaining trustworthiness over time. This pipeline would move beyond static assessments, actively monitoring model performance as new data becomes available and genomic understanding evolves. A key component of this ongoing evaluation focuses on refining benign variant classification – accurately identifying harmless genetic variations is paramount, as misclassification can lead to unnecessary medical interventions or overlooked disease risks. Such a system would proactively detect and address potential vulnerabilities, ensuring genomic prediction models remain reliable and aligned with current scientific knowledge, ultimately fostering greater trust in their application for personalized medicine and disease prevention.

The pursuit of robustness in genomic foundation models, as demonstrated by SAGE, echoes a fundamental truth about complex systems. One cannot simply build security, but must cultivate it through continuous auditing and adaptation. As Blaise Pascal observed, “All of humanity’s problems stem from man’s inability to sit quietly in a room alone.” This speaks to the need for constant self-assessment; SAGE, with its agentic risk auditing, provides a means of ‘sitting quietly’ with the model, identifying vulnerabilities before they manifest as critical failures. The framework doesn’t promise a perfect shield, but a process of forgiveness-allowing components to fail safely, and learn from those failures-ultimately fostering a more resilient ecosystem.

The Horizon Recedes

The pursuit of ‘robustness’ in these genomic foundation models feels less like fortification and more like charting the inevitable course of compromise. SAGE, as an agentic auditing framework, does not prevent the soft prompt attacks, it merely illuminates the cracks before they widen. Each successful audit is, implicitly, a forecast of future exploits, a temporary reprieve in an ongoing game of adaptation. The architecture isn’t structure-it’s a compromise frozen in time, and time, predictably, moves on.

The focus will inevitably shift from identifying vulnerabilities in specific models to understanding the systemic properties that create them. Technologies change, dependencies remain. The true challenge lies not in securing individual predictors, but in designing ecosystems that tolerate, and even anticipate, adversarial pressures. The field will likely need to embrace the notion of ‘graceful degradation’ rather than striving for absolute immunity – a difficult acceptance for those accustomed to building walls.

One can foresee a proliferation of ‘red teams’ employing increasingly sophisticated agentic frameworks, mirroring the very attacks they attempt to thwart. This will drive a cycle of escalating complexity, a dance of adaptation with diminishing returns. Perhaps, ultimately, the most valuable outcome will not be more secure models, but a deeper understanding of the inherent fragility of prediction itself.


Original article: https://arxiv.org/pdf/2512.17146.pdf

Contact the author: https://www.linkedin.com/in/avetisyan/

See also:

2025-12-22 19:50