Author: Denis Avetisyan
A new approach combines code history, knowledge graphs, and AI agents to pinpoint the root causes of bugs with greater accuracy.

This work introduces AgenticSZZ, leveraging temporal knowledge graphs and LLM agents to improve bug-inducing commit identification through enhanced causal analysis of software evolution.
Identifying the precise commit that introduces a bug is surprisingly difficult, despite decades of research into software blame assignment. This paper, ‘Beyond Blame: Rethinking SZZ with Knowledge Graph Search’, addresses limitations in current bug-inducing commit (BIC) identification approaches by moving beyond reliance on traditional git blame. We introduce AgenticSZZ, which reframes BIC identification as a graph search problem leveraging Temporal Knowledge Graphs and LLM agents to expand the search space and enable more effective causal reasoning-achieving up to 27% improvement over state-of-the-art methods. Does this shift toward graph-based techniques offer a pathway to a more comprehensive understanding of software evolution and defect patterns?
Pinpointing the Source: The Challenge of Bug Localization
Pinpointing the exact commit responsible for introducing a software bug-the ‘bug-inducing commit’-represents a foundational challenge in modern software development, yet remains largely a manual undertaking for many engineering teams. This process often involves painstakingly reviewing code changes, utilizing version control history, and attempting to reproduce the error across different revisions. While seemingly straightforward in theory, the task becomes exponentially more difficult with larger codebases, frequent commits, and collaborative development environments. The time investment required for accurate bug localization directly impacts development velocity and increases the overall cost of software maintenance, highlighting the need for automated and more efficient techniques to streamline this critical, yet often overlooked, aspect of the software lifecycle.
The widely-used SZZ algorithm, and similar bug localization techniques, fundamentally depend on ‘Git Blame’ to pinpoint the origin of faulty code. This approach traces each line of code back to the commit where it was last modified, assuming that commit introduced the bug. However, this reliance becomes problematic in modern software development where large-scale refactorings and complex code transformations are common. When code is moved or significantly altered without functional changes, Git Blame incorrectly identifies these non-bug-inducing commits as the source of errors. This leads to wasted effort investigating changes that aren’t actually related to the bug, dramatically reducing the efficiency of the localization process and highlighting the need for more sophisticated techniques that can discern between semantic and syntactic changes.

A Graph-Based Reasoning Approach: AgenticSZZ
AgenticSZZ utilizes a Temporal Knowledge Graph (TKG) to model software evolution as represented by commit history and inter-component dependencies. The TKG represents commits as nodes, with edges denoting relationships such as ‘authored by’, ‘modified’, ‘depends on’, and ‘introduced’. Temporal aspects are captured by associating each commit – and thus each edge – with a timestamp, allowing the system to reason about the order of changes and their impact over time. This graph-based representation facilitates the encoding of complex relationships beyond simple linear history, and enables querying for dependencies, identifying the origin of changes, and tracking the propagation of effects across the codebase. The TKG is constructed by parsing commit metadata and analyzing code diffs to establish connections between files, functions, and developers.
The Temporal Knowledge Graph facilitates LLM Agent navigation of complex codebases by representing code elements and their relationships – including modifications over time via commit history – as nodes and edges, respectively. This structured representation enables the Agent to identify potential bug sources by tracing dependencies and pinpointing commits that introduced changes near reported issues. Prioritization of investigation is achieved through contextual understanding derived from graph properties; for example, commits impacting frequently modified or critical code sections receive higher priority, and the Agent can assess the scope of changes introduced by each commit to estimate potential impact and guide focused debugging efforts.
To navigate commit history within the Temporal Knowledge Graph, the LLM Agent utilizes a suite of specialized tools. Structural Traversal enables exploration of relationships between commits – such as parent-child links or dependencies – to map the evolution of code. Property Query allows the Agent to filter commits based on metadata like author, date, or commit message, facilitating targeted searches. Finally, Candidate Enumeration systematically generates a list of potential bug-inducing commits based on the results of structural traversal and property queries, effectively narrowing the scope of investigation and prioritizing commits for deeper analysis.

Deep Reasoning: Leveraging Large Language Models
The LLM Agent leverages the DeepSeek-V3.2 model to execute causal analysis, a reasoning process designed to identify the originating source of software bugs. This capability moves beyond simple symptom detection by tracing the sequence of events leading to the bug, enabling accurate root cause identification. DeepSeek-V3.2’s architecture facilitates the analysis of code changes, commit messages, and associated data to establish causal links between modifications and the introduction of defects. This allows the Agent to not only flag the presence of a bug, but also to determine the specific code alteration responsible, improving debugging efficiency and reducing time to resolution.
The LLM Agent’s architecture allows for the integration of external language models to augment its reasoning capabilities. Specifically, we’ve successfully demonstrated the use of ‘GPT-4o-mini’ to perform tasks equivalent to those handled by the core ‘DeepSeek-V3.2’ model. This interoperability was achieved through a modular design, enabling the Agent to delegate specific reasoning sub-tasks to ‘GPT-4o-mini’ and incorporate the results into its overall analysis. Performance evaluations indicate that utilizing ‘GPT-4o-mini’ does not significantly degrade the accuracy of bug localization, providing a scalable approach to enhancing the Agent’s processing capacity and enabling the use of diverse LLM resources.
File History Traversal is a critical component in constructing the Temporal Knowledge Graph (TKG) utilized by the LLM Agent for bug localization. This process involves systematically examining the version control history of relevant files to identify code modifications made prior to the bug’s introduction. By analyzing commit logs, diffs, and author information, the system builds a TKG representing the evolution of the codebase. This graph provides the LLM Agent with temporal context, enabling it to correlate code changes with bug reports and effectively pinpoint the commit(s) that likely introduced the defect, thus improving the accuracy and efficiency of bug localization.

Demonstrated Efficacy: Broad Applicability and Impact
AgenticSZZ’s efficacy was rigorously tested across a diverse range of real-world software projects, utilizing datasets constructed from the Linux kernel, Apache Software Foundation codebases, and a broad collection of open-source GitHub repositories. This evaluation strategy ensured the model’s adaptability and performance weren’t limited to a specific project type or coding style; the ‘DS_LINUX’ dataset provided a benchmark against a mature, complex system, while ‘DS_APACHE’ and ‘DS_GITHUB’ offered exposure to a wider variety of project scales and development practices. By assessing AgenticSZZ’s capabilities across these varied environments, researchers aimed to demonstrate its potential for broad applicability in identifying bug-inducing commits within any substantial software project.
AgenticSZZ demonstrates a substantial advancement in pinpointing the source of software bugs, achieving an F1-score ranging from 0.48 to 0.74 when tested on real-world codebases. This performance consistently surpasses that of existing bug-identification methods, with improvements reaching up to 27%. Critically, this heightened accuracy isn’t limited to simple code; AgenticSZZ maintains its advantage even when analyzing complex scenarios characterized by intricate code changes and extensive developer contributions. The system’s robust performance suggests it can be a valuable asset in streamlining the debugging process and improving software reliability across a variety of projects.
Analysis of ‘Blame Complexity’ – a metric gauging the difficulty of attributing changes to specific commits – reveals a significant advantage for AgenticSZZ, especially when applied to the intricate codebase of Apache projects. This metric underscores the tool’s capacity to navigate complex code modifications effectively, demonstrating an improvement of up to 27% over current state-of-the-art methods like LLM4SZZ. Notably, AgenticSZZ achieved a strong F1-score of 0.645 on the DS_LINUX dataset, maintaining consistent performance regardless of the number of candidate commits considered – a crucial factor in real-world software development environments where efficiency and reliability are paramount.

The pursuit of identifying bug-inducing commits often descends into a labyrinth of intricate code dependencies, a situation where developers, in attempting comprehensive solutions, inadvertently create further complexity. AgenticSZZ, with its use of Temporal Knowledge Graphs, attempts to navigate this complexity by expanding the search beyond simple ‘blame’ assignment – a technique frequently reliant on the most recent modification. As Linus Torvalds once stated, “Most programmers think that if their code works, they’re finished. If it doesn’t work, they think somebody else is to blame.” This sentiment underscores the need for systems like AgenticSZZ, which don’t simply point fingers at the last change, but instead engage in a deeper causal analysis of software evolution, acknowledging that responsibility for bugs often lies not in what changed, but how changes interacted over time.
What’s Next?
The pursuit of identifying bug-inducing commits, even with approaches like AgenticSZZ, remains fundamentally a search for sufficient, not necessary, causes. The expansion of the search space, while logically sound, introduces the problem of scaling causal reasoning – a combinatorial explosion masked by the elegance of knowledge graphs. Future work must address the precision of LLM-driven analysis; simply broadening the net does not inherently refine the catch. A focus on negative evidence – demonstrably non-inducing commits – may prove more fruitful than ever-wider positive searches.
The current framing assumes a linear progression of causality within code history. This is, at best, a simplification. Software evolution is a complex adaptive system, exhibiting emergent behavior and feedback loops. True progress demands moving beyond identifying a single ‘inducing’ commit toward modeling the conditions that allowed the bug to manifest. The graph itself, however powerful, is a static representation of a dynamic process; temporal resolution remains a critical bottleneck.
Ultimately, the question is not merely where the bug originated, but why it persisted. Blame assignment, even when refined, is a solution to a symptom, not the disease. Future research should consider integrating AgenticSZZ with automated program repair techniques, shifting the focus from post-hoc analysis to preventative measures. Unnecessary precision is violence against attention; the goal is not exhaustive detail, but actionable insight.
Original article: https://arxiv.org/pdf/2602.02934.pdf
Contact the author: https://www.linkedin.com/in/avetisyan/
See also:
- Lacari banned on Twitch & Kick after accidentally showing explicit files on notepad
- YouTuber streams himself 24/7 in total isolation for an entire year
- The Batman 2 Villain Update Backs Up DC Movie Rumor
- Adolescence’s Co-Creator Is Making A Lord Of The Flies Show. Everything We Know About The Book-To-Screen Adaptation
- Gold Rate Forecast
- Rumored Assassin’s Creed IV: Black Flag Remake Has A Really Silly Title, According To Rating
- KPop Demon Hunters Just Broke Another Big Record, But I Think Taylor Swift Could Stop It From Beating The Next One
- Southern Charm Recap: The Wrong Stuff
- Avengers: Secret Wars Adds WandaVision Star to MCU Movie’s Cast
- TikTok star Scottykfitness apologizes for “lashing out” at hateful comments
2026-02-05 00:25