Guarding AI Perception: A New Defense Against Activity Recognition Hacks

Author: Denis Avetisyan

Researchers have developed an autonomous agent that actively shields AI systems interpreting sensor data from malicious prompt injection attacks.

AegisAgent demonstrates superior performance against adversarial attacks, exceeding text-only, classical heuristic-based, and multimodal detection-only defenses as measured by detection accuracy (DA), robustness rate (RR), and attack success rate (ASR).

AegisAgent mitigates prompt injection vulnerabilities in LLM-based Human Activity Recognition systems through input sanitization, consistency checks, and robust reasoning.

While large language models are increasingly integrated into wearable sensing for nuanced human activity understanding, their vulnerability to prompt injection attacks presents a critical reliability challenge. This paper introduces AegisAgent: An Autonomous Defense Agent Against Prompt Injection Attacks in LLM-HARs, a novel system that moves beyond passive filtering to actively protect LLM-driven Human Activity Recognition (LLM-HAR) systems. AegisAgent functions as a cognitive guardian, autonomously reasoning about user intent and verifying inputs via a dynamic memory and multi-step repair plan, reducing attack success rates by 30% with minimal latency overhead. Can this approach pave the way for truly secure and trustworthy LLM-powered wearable applications?

The Inevitable Cracks in the Foundation

Recent advancements in human activity recognition are increasingly focused on LLM-HAR systems, which uniquely combine the power of Large Language Models with data from Inertial Measurement Units (IMUs). These systems move beyond traditional methods by not merely classifying actions, but by interpreting the context of movement sequences. IMU data, capturing acceleration and angular velocity, provides the raw sensory input, while the LLM acts as a sophisticated reasoning engine, capable of understanding nuanced patterns and temporal dependencies. This allows LLM-HAR to potentially identify complex activities, predict future actions, and even differentiate between similar movements with greater accuracy. The integration enables a shift towards more adaptable and intelligent systems, opening doors for applications in healthcare monitoring, athletic performance analysis, and human-robot interaction, where understanding the ‘why’ behind a movement is as crucial as recognizing the movement itself.

The integration of Large Language Models into human activity recognition systems, while innovative, inadvertently creates new avenues for malicious interference. These LLM-HAR systems are susceptible to a range of attack vectors, including prompt injection – where crafted inputs manipulate the LLM’s reasoning – and data poisoning, compromising the model’s training data to induce misclassifications. Adversarial examples, subtly altered sensor data, can also fool the LLM into incorrectly identifying activities, potentially leading to false alarms or, more critically, failures to detect genuine threats. Consequently, ensuring the integrity and reliability of LLM-HAR necessitates robust security measures focused on input validation, model hardening, and continuous monitoring to mitigate these vulnerabilities and maintain system trustworthiness.

Conventional human activity recognition (HAR) systems, often built on meticulously engineered features and established machine learning algorithms, demonstrate considerable robustness and reliability in controlled settings. However, these methods frequently struggle with the nuances of real-world data and exhibit limited capacity to generalize to previously unseen activities or adapt to changing environmental conditions. In contrast, systems integrating Large Language Models (LLMs) offer a paradigm shift, leveraging the models’ inherent reasoning abilities to interpret sensor data – such as that from inertial measurement units – in a more flexible and context-aware manner. This allows LLM-HAR systems to not only recognize established activities but also infer new ones, handle ambiguous data, and potentially even understand the intent behind an action – capabilities largely absent in traditional HAR frameworks.

Adversarial manipulation of sensor data in LLM-based human activity recognition (HAR) systems can compromise safety-critical applications by injecting false information into prompts and altering the LLM's interpretation of activities, such as misclassifying a fall as normal walking. — Adversarial manipulation of sensor data in LLM-based human activity recognition (HAR) systems can compromise safety-critical applications by injecting false information into prompts and altering the LLM’s interpretation of activities, such as misclassifying a fall as normal walking.

AegisAgent: Proactive Defense in a Fragile System

AegisAgent is an automated defense system engineered to mitigate adversarial attacks targeting Large Language Model Human Activity Recognition (LLM-HAR) systems. This agent operates autonomously, continuously monitoring input data and system responses to identify and neutralize potentially malicious inputs designed to compromise the accuracy or reliability of the LLM-HAR system. Its functionality centers on proactive threat detection and response, aiming to maintain consistent and accurate human activity recognition even when subjected to adversarial manipulation. The system is designed for deployment in real-time applications where the integrity of the LLM-HAR output is critical, such as robotics, assistive technologies, and security systems.

Input Sanitization within AegisAgent involves a preprocessing stage applied to both Inertial Measurement Unit (IMU) signals and text prompts to enhance the reliability of subsequent analysis. For IMU data, this includes outlier detection and removal, noise filtering using techniques such as Kalman filtering, and data normalization to a consistent scale. Text prompts undergo similar processing, encompassing removal of irrelevant characters, standardization of text case, and filtering of potentially adversarial keywords or phrases. This normalization aims to reduce the impact of noisy or manipulated inputs, improving the accuracy of cross-modal consistency checks and the overall robustness of the system against adversarial attacks.

Cross-Modal Consistency verification within AegisAgent operates by establishing a relationship between incoming Inertial Measurement Unit (IMU) sensor data and the corresponding natural language prompts. This process involves analyzing whether the textual interpretation of a command logically aligns with the physical movements or states indicated by the IMU data. Discrepancies, such as a text prompt requesting forward motion while the IMU reports stationary status, are flagged as potential adversarial attacks. The system employs dedicated modules to map IMU readings to semantic representations and compare these to the semantic understanding derived from the text prompt, quantifying the consistency level and triggering defensive actions when significant misalignment is detected.

AegisAgent’s Robust Reasoner enhances prediction reliability under adversarial conditions by implementing both Chain-of-Thought Reasoning and Self-Consistency. Chain-of-Thought Reasoning enables the agent to generate intermediate reasoning steps, providing transparency and allowing for error detection during the prediction process. Self-Consistency further strengthens this by generating multiple reasoning paths and aggregating the results; discrepancies between these paths are flagged, and a final prediction is derived from the most consistent outputs. This dual approach mitigates the impact of subtle adversarial perturbations by requiring multiple lines of evidence to support a given conclusion, increasing confidence in predictions even when input data is manipulated or noisy.

AegisAgent employs a pipeline incorporating Input Sanitization, a Consistency Verifier, and a Robust Reasoner to generate secure outputs, as illustrated by the system overview in the figure.

Validation and Performance on Benchmark Datasets

AegisAgent’s performance was evaluated using three publicly available datasets commonly utilized in Human Activity Recognition (HAR) research. The USC-HAD dataset contains data collected from sensors on a smartphone during various activities. The UCI HAR dataset consists of data acquired from accelerometer and gyroscope sensors while subjects performed activities like walking, walking upstairs, and lying down. The PAMAP2 dataset provides a more comprehensive set of sensor data, including triaxial accelerometers, gyroscopes, and magnetometers, collected during physical activities performed by subjects in a real-world environment. These datasets facilitated a standardized assessment of AegisAgent’s capabilities in diverse and representative scenarios.

Evaluation of AegisAgent against common adversarial attacks – specifically Text Path Attacks, Prompt Path Attacks, and their hybrid combinations – yielded an overall detection accuracy of 85%. This metric represents the percentage of malicious input samples correctly identified across a testing suite designed to simulate real-world attack scenarios. The accuracy was calculated based on correctly flagging manipulated data intended to compromise the system’s activity recognition capabilities. This performance indicates a substantial capacity to discern and neutralize attacks targeting LLM-HAR models.

AegisAgent demonstrates sustained activity recognition accuracy and reliability under adversarial conditions. Testing indicates that even when input data is intentionally manipulated to deceive the system, AegisAgent maintains a consistent performance level, preventing significant degradation in its ability to correctly identify human activities. This robustness is achieved through the agent’s internal mechanisms for detecting and neutralizing malicious inputs designed to alter the model’s predictions, ensuring consistent and dependable activity classification despite adversarial attacks.

Performance benchmarks indicate that AegisAgent effectively identifies and corrects malicious inputs targeting Large Language Model-based Human Activity Recognition (LLM-HAR) systems. Across evaluations using five distinct LLM-HAR models, AegisAgent reduced the average attack success rate by 30%. Furthermore, the system demonstrated a 56.3% improvement in recovery rate – the ability to correctly identify activity after a malicious input attempt – indicating a robust defense against various attack vectors and a significant enhancement in system resilience.

Prompt injection attacks substantially reduce human activity recognition (HAR) classification accuracy from 88-92% to around 45-53%, and while standard text defenses offer limited recovery to 49-58%, they prove insufficient against multimodal prompt injection vulnerabilities.

Towards Trustworthy LLM-HAR Systems: An Exercise in Delayed Failure

The development of AegisAgent represents a significant step towards realizing the full potential of Large Language Model Human-Activity Recognition (LLM-HAR) systems across diverse applications. By bolstering the security and reliability of these systems, AegisAgent directly addresses a critical barrier to their widespread implementation, particularly in sensitive domains like healthcare and personalized fitness. A more robust and trustworthy foundation enables the seamless integration of LLM-HAR into applications ranging from remote patient monitoring and preventative care to adaptive exercise programs and intuitive human-computer interfaces, ultimately fostering greater user acceptance and unlocking a new era of intelligent, responsive technology.

The successful integration of Large Language Model Human-Activity Recognition (LLM-HAR) systems hinges critically on establishing user trust, and this trust is directly proportional to the system’s demonstrated ability to reliably distinguish between legitimate inputs and adversarial attacks. When users are confident that a system can accurately interpret their intentions and effectively filter out malicious prompts – such as those designed to elicit harmful actions or compromise data – acceptance and sustained engagement naturally follow. This discernment isn’t merely a technical achievement; it addresses a fundamental psychological need for safety and control, paving the way for broader adoption in sensitive areas like healthcare, personalized fitness, and increasingly complex human-computer interfaces. A system perceived as vulnerable erodes confidence, whereas a demonstrably robust defense against manipulation fosters a positive user experience and unlocks the full potential of LLM-HAR technologies.

AegisAgent establishes a robust defense against adversarial attacks by integrating multiple data modalities – text, audio, and visual cues – into a unified analytical framework. This multi-modal strategy moves beyond the limitations of systems reliant on single input types, allowing the system to cross-validate information and identify inconsistencies indicative of malicious intent. By analyzing inputs through diverse channels, AegisAgent achieves a more holistic understanding of user queries, enhancing its ability to detect subtle manipulations and novel attack vectors. This architecture not only bolsters current security but also provides a flexible foundation for incorporating future data streams and adapting to increasingly sophisticated threats, paving the way for truly resilient AI systems capable of maintaining reliable performance in dynamic and adversarial environments.

Implementation of AegisAgent introduces an average latency of 78.6 milliseconds per query when operating on a NVIDIA RTX 3090 GPU workstation, a performance characteristic carefully considered during development to balance security enhancements with practical usability. Ongoing research prioritizes minimizing this overhead while simultaneously broadening the system’s defensive capabilities to address increasingly sophisticated attack vectors. Future iterations aim to integrate AegisAgent with live, real-time threat intelligence, allowing the system to proactively adapt to emerging vulnerabilities and maintain robust protection against evolving malicious inputs, ultimately fostering more secure and reliable LLM-HAR systems.

Adversarial interventions at various stages of the LLM-HAR pipeline-including the signal, text, and prompt paths-can compromise activity prediction accuracy by diverting the system from its intended objectives and reducing semantic fidelity.

The pursuit of absolute security, as demonstrated by AegisAgent, feels less like construction and more like tending a garden. The system, with its layered defenses against prompt injection attacks in LLM-HAR, attempts to anticipate vulnerabilities, but inevitably, complexity burrows in. As Andrey Kolmogorov observed, “The most important thing in science is not knowing many scientific facts, but knowing how to apply the scientific method.” AegisAgent embodies this – a constant application of scrutiny and adaptation, recognizing that the ‘perfect architecture’ is indeed a myth. The system’s strength lies not in eliminating risk, but in building a responsive ecosystem capable of weathering unforeseen threats – a testament to the transient nature of optimized solutions. It’s a reminder that scalability isn’t about size, but about graceful adaptation in the face of inevitable change.

What Shadows Remain?

AegisAgent, in its attempt to contain the chaos of prompt injection, reveals a deeper truth: security isn’t a wall, but a negotiation with entropy. The system successfully addresses current articulations of attack, yet each defense becomes a new surface for adversarial pressure. The very act of defining ‘safe’ input creates the contours of what will inevitably be subverted. This is not failure, merely the inevitable unfolding of a complex system.

The focus on Human Activity Recognition, while a useful proving ground, obscures the broader implications. Prompt injection isn’t about stealing data; it’s about seizing control of narrative. As LLM-HAR systems proliferate – mediating interactions with the physical world – the stakes shift from informational loss to tangible consequence. Future work must explore not just detection, but anticipation – modeling the attacker’s intent before it manifests as a crafted prompt.

Perhaps the most pressing question lies not within the algorithms, but within the assumptions. AegisAgent operates on the premise of ‘contextual integrity’ – a fragile construct in a world designed for seamless deception. The system is a lighthouse, briefly illuminating the treacherous reefs; it does not calm the storm. The true challenge isn’t building a perfect defense, but learning to navigate the inevitable darkness.

Original article: https://arxiv.org/pdf/2512.20986.pdf

Contact the author: https://www.linkedin.com/in/avetisyan/

The Inevitable Cracks in the Foundation

AegisAgent: Proactive Defense in a Fragile System

Validation and Performance on Benchmark Datasets

Towards Trustworthy LLM-HAR Systems: An Exercise in Delayed Failure

What Shadows Remain?

See also: