Speak Security’s Language: Automating Threat Queries Across Platforms

Author: Denis Avetisyan


A new framework automatically translates high-level threat descriptions into platform-specific queries for diverse security information and event management systems.

SynRAG establishes a retrieval-augmented generation framework where a model iteratively refines responses by dynamically querying and incorporating information from a knowledge source, effectively creating a feedback loop between generation and retrieval to enhance accuracy and coherence.
SynRAG establishes a retrieval-augmented generation framework where a model iteratively refines responses by dynamically querying and incorporating information from a knowledge source, effectively creating a feedback loop between generation and retrieval to enhance accuracy and coherence.

SynRAG leverages Retrieval-Augmented Generation to generate executable SIEM queries from YAML-defined threat specifications, simplifying cross-platform threat detection.

Despite the critical role of Security Information and Event Management (SIEM) systems in modern cybersecurity, analysts face increasing challenges monitoring heterogeneous platforms due to variations in architecture and query languages. This paper introduces SynRAG: A Large Language Model Framework for Executable Query Generation in Heterogeneous SIEM System, a novel approach that automatically translates platform-agnostic threat specifications into executable queries for multiple SIEMs. Our framework demonstrably improves query generation performance compared to state-of-the-art language models, streamlining threat detection and incident investigation. Could SynRAG represent a significant step toward unifying security operations across diverse IT infrastructures?


The SIEM’s Silent Struggle: A System Hampered by Complexity

Security Information and Event Management (SIEM) systems function as a central nervous system for cybersecurity, collecting and analyzing vast quantities of log and event data to identify malicious activity. However, the very power of these systems is often hampered by the intricacy of their query languages. Unlike standardized languages such as SQL, each SIEM platform – Splunk, QRadar, Sentinel, and others – typically employs a unique and often proprietary query syntax. This platform-specificity demands that security analysts become proficient in multiple languages, a considerable undertaking given the constant evolution of both threats and SIEM technology. The result is a significant barrier to efficient threat hunting and incident response; even skilled analysts can struggle to rapidly translate generalized threat intelligence into effective, platform-specific queries, leaving organizations vulnerable to increasingly sophisticated attacks.

The creation of effective Security Information and Event Management (SIEM) queries is a highly specialized skill, demanding deep knowledge of both threat landscapes and the intricacies of individual SIEM platforms. This manual process isn’t simply about knowing what to look for, but precisely how to ask the SIEM system for it, often utilizing a unique query language for each platform. Consequently, even experienced security analysts can introduce errors in query construction, leading to missed threats or, conversely, overwhelming false positives. This susceptibility to human error directly impedes rapid response times; the delay introduced by refining inaccurate queries can be critically disadvantageous when addressing fast-moving attacks, potentially allowing malicious activity to escalate before detection and mitigation can occur.

The effective utilization of threat intelligence often falters due to the significant challenges in adapting generalized findings to the specific demands of diverse Security Information and Event Management (SIEM) systems. Current methodologies frequently deliver threat indicators – such as malicious IP addresses or attack patterns – in formats that require substantial manual effort to convert into functional queries for each individual SIEM platform. This translation process is not simply syntactic; it demands a deep understanding of each system’s unique query language, data schemas, and limitations. Consequently, organizations struggle to rapidly disseminate and act upon critical threat intelligence, creating a window of vulnerability where attacks can proliferate undetected. The lack of standardized translation layers or automated conversion tools hinders the seamless integration of external intelligence feeds, forcing security teams to dedicate valuable resources to repetitive, error-prone tasks rather than proactive threat hunting and incident response.

The comparison demonstrates differences in output queries generated by the system.
The comparison demonstrates differences in output queries generated by the system.

SynRAG: A Framework for Automating the Art of the Query

SynRAG is designed as a platform-agnostic framework capable of automatically producing Security Information and Event Management (SIEM)-specific threat detection queries. This automation is achieved through the use of unified threat specifications, which serve as standardized inputs detailing potential threat scenarios. By decoupling threat definitions from specific SIEM syntax, SynRAG enables the generation of queries for diverse SIEM platforms without requiring manual adaptation. The framework accepts these threat specifications and translates them into actionable detection logic, streamlining the process of threat hunting and incident response across heterogeneous security environments.

SynRAG’s query generation process is fundamentally based on Retrieval Augmented Generation (RAG), a technique that combines pre-trained language models with information retrieved from an external knowledge source. Specifically, when a threat specification is provided, SynRAG first retrieves relevant contextual information – such as SIEM syntax, log field definitions, and detection logic examples – from its vector database. This retrieved information is then concatenated with the threat specification and provided as input to the language model. By augmenting the model’s knowledge with this retrieved context, SynRAG significantly improves the accuracy, relevance, and specificity of the generated SIEM queries, mitigating the limitations of relying solely on the language model’s pre-existing knowledge.

The SynRAG framework incorporates Chroma Database as its Vector Database to facilitate the rapid retrieval of contextual information relevant to threat detection. This database stores vector embeddings, numerical representations of the semantic meaning of SIEM documentation, including query languages, data schemas, and event field definitions. By converting textual documentation into these vector embeddings, SynRAG enables similarity searches, identifying documentation fragments most pertinent to a given threat specification. This approach bypasses traditional keyword-based searches, improving accuracy and reducing the time required to locate applicable knowledge for query generation. The use of a vector database is critical for scaling query generation across diverse SIEM platforms and complex threat landscapes.

Threat Specifications within the SynRAG framework are formalized using YAML (YAML Ain’t Markup Language) to ensure a consistent and machine-readable definition of potential threat scenarios. This standardized format allows for the declarative description of threat characteristics, including involved entities, observed behaviors, and relevant indicators. By employing YAML, SynRAG facilitates automated processing and translation of high-level threat intelligence into specific, executable queries for Security Information and Event Management (SIEM) systems. The use of a structured, human-readable format simplifies the creation, maintenance, and version control of threat definitions, promoting collaboration and reducing the potential for errors in query generation.

Validating the System: A Rigorous Assessment of Query Quality

Evaluation of SynRAG’s query generation capabilities utilized the Bilingual Evaluation Understudy (BLEU) score and the ROUGE-L score, both established metrics for assessing the similarity between machine-generated text and human-authored reference text. BLEU measures n-gram precision, effectively quantifying the overlap of n-grams between the generated query and the expert-authored queries, while ROUGE-L focuses on the longest common subsequence, providing a recall-oriented measure of similarity. These metrics provide quantitative assessments of the fidelity of SynRAG’s generated queries to those created by security analysts, indicating the system’s ability to produce syntactically and semantically comparable results.

SynRAG was subjected to testing within two distinct Security Information and Event Management (SIEM) platforms – QRadar and Google SecOps – to validate its operational flexibility. This evaluation confirmed the system’s ability to generate syntactically correct and executable queries across differing SIEM environments, each with unique query languages and data schemas. Successful operation in both QRadar and Google SecOps demonstrates that SynRAG does not rely on a specific SIEM’s proprietary structure and can be adapted to other platforms with minimal modification, increasing its potential for broad deployment.

Evaluation of SynRAG’s query generation capabilities using the BLEU and ROUGE-L metrics yielded scores of 0.1287 and 0.6039, respectively. The BLEU score assesses n-gram overlap with reference queries, while ROUGE-L focuses on the longest common subsequence, providing a measure of recall. These scores indicate that SynRAG’s generated queries exhibit a significant degree of similarity to those authored by security experts, surpassing the performance of alternative query generation methods tested under identical conditions. The reported values represent the average scores across a held-out test dataset, demonstrating the system’s ability to generalize beyond the training data.

SynRAG demonstrated an 85% Query Execution Success Rate during testing, signifying a high degree of syntactic correctness in the generated queries. This metric was determined by measuring the proportion of queries that were successfully executed within the target Security Information and Event Management (SIEM) systems – QRadar and Google SecOps – without encountering parsing errors or runtime failures. A success rate of 85% indicates a strong level of usability, as the generated queries are readily interpretable and actionable by the SIEM platforms, minimizing the need for manual correction or intervention by security analysts.

SynRAG’s performance was evaluated against several prominent Large Language Models (LLMs), specifically GPT-4o, DeepSeek-V3, and Llama-3, to establish a comparative baseline. Benchmarking against these models assessed SynRAG’s ability to generate effective security queries relative to current state-of-the-art LLM capabilities. The results of this comparative analysis demonstrate that SynRAG achieves competitive performance in query generation, indicating its viability as a solution alongside existing LLM technologies for security operations tasks. Specific performance metrics from these comparisons are detailed in subsequent sections.

The SynRAG system incorporates a Syntax Service designed to enforce query validity across different Security Information and Event Management (SIEM) platforms. This service maintains curated lists detailing the acceptable syntax and semantic rules for each supported query language, including those used by QRadar and Google SecOps. By referencing these pre-defined constraints during query generation, the Syntax Service minimizes the production of syntactically incorrect or semantically invalid queries, directly contributing to the system’s 85% Query Execution Success Rate. The service’s database of valid syntax is regularly updated to reflect changes in the supported SIEM environments and their respective query languages.

Beyond Automation: The Implications and Future Trajectory of Intelligent Querying

SynRAG alleviates a critical pain point for security teams by automating the complex process of crafting queries for Security Information and Event Management (SIEM) systems. Historically, this task demanded significant analyst time and expertise, requiring deep understanding of both threat landscapes and the specific query languages of each SIEM platform. By intelligently generating these queries, SynRAG dramatically reduces the manual workload, allowing analysts to focus on interpreting results and responding to genuine threats. This acceleration in query generation directly translates to faster incident response times and a broadened scope of threat coverage, as analysts can investigate a larger volume of potential security incidents with the same resources. The system’s automation not only improves efficiency but also minimizes the risk of human error in query construction, leading to more accurate and reliable threat detection.

SynRAG’s architecture deliberately avoids reliance on proprietary systems, fostering broad compatibility across diverse security infrastructures. This platform-agnostic design enables seamless integration with a wide range of Security Information and Event Management (SIEM) tools and threat intelligence platforms, dismantling traditional data silos. By facilitating the standardized exchange of threat queries and responses, SynRAG empowers organizations to share valuable insights and collaborate more effectively, regardless of their chosen security technologies. This interoperability not only streamlines threat analysis workflows but also enhances collective defense capabilities within the wider cybersecurity community, creating a more resilient and responsive security posture.

Continued development efforts are directed towards broadening the applicability of the system by integrating support for a wider array of Security Information and Event Management (SIEM) platforms and their respective query languages. This expansion will not be limited to mere compatibility; researchers also intend to implement sophisticated query optimization techniques, aiming to drastically reduce analysis times and improve the efficiency of threat hunting. Furthermore, the incorporation of advanced anomaly detection methods promises to move beyond simple pattern matching, enabling the identification of previously unknown or subtle threats that might otherwise evade detection, ultimately bolstering proactive security measures and response capabilities.

Google SecOps leverages a Unified Data Model to establish a consistent and standardized approach to threat analysis. This model normalizes data ingested from diverse security tools and sources – including logs, alerts, and network traffic – into a common format. By eliminating data silos and inconsistencies, the Unified Data Model enables more accurate correlation of events, simplified investigations, and the development of robust threat detection rules. The resulting clarity empowers security analysts to quickly identify and respond to emerging threats, reducing mean time to detection and improving overall security posture within the Google SecOps platform. This foundational structure facilitates automation and scalability, allowing for efficient processing of large volumes of security data and proactive threat hunting.

SynRAG, as detailed in the study, fundamentally challenges the status quo of threat detection. It doesn’t simply accept existing SIEM query languages as fixed boundaries, but rather actively translates high-level threat specifications into platform-specific instructions. This aligns perfectly with John McCarthy’s assertion: “If you can’t break it, you don’t understand it.” The framework’s ability to deconstruct threat definitions and reconstruct them for diverse SIEM platforms demonstrates a deep understanding of the underlying systems – a willingness to ‘break’ the limitations of single-platform queries to achieve a more comprehensive and adaptable security posture. By automating cross-SIEM query generation, SynRAG actively tests and expands the boundaries of what’s possible within threat detection workflows.

Beyond the Query: Where Does This Lead?

SynRAG, in its essence, doesn’t solve security monitoring-it relocates the problem. The framework deftly translates intent into platform-specific language, but the underlying chaos of threat landscapes remains. The true challenge isn’t articulation, but validation. A perfectly formed query against flawed data yields only elegant falsehoods. Future work must therefore rigorously address the trustworthiness of the retrieved knowledge – how does one quantify ‘relevance’ when facing an adversary actively crafting deceptive signals?

The YAML specification, while providing a useful abstraction layer, invites a predictable question: how readily can this be subverted? Threat actors, presented with a defined interface, will inevitably probe its boundaries. The framework’s robustness isn’t merely a function of query accuracy, but its resilience against malicious specification. One envisions a future arms race-sophisticated threat definitions countered by adversarial YAML designed to overload or mislead the system.

Ultimately, SynRAG represents a step toward automating the tedious aspects of security analysis, freeing human intellect for tasks requiring genuine insight. But the black box, once opened, reveals further layers of complexity. The next iteration isn’t about generating more queries, but about understanding what those queries fail to detect-a pursuit that demands a healthy dose of skepticism and a willingness to dismantle assumptions.


Original article: https://arxiv.org/pdf/2512.24571.pdf

Contact the author: https://www.linkedin.com/in/avetisyan/

See also:

2026-01-03 11:40