Smarter Financial Advice: How Knowledge Graphs are Leveling Up AI Recommendations

Author: Denis Avetisyan

A new framework efficiently integrates personal and market data to deliver more relevant and personalized financial asset recommendations using large language models.

The study demonstrates that FLARKO, and its RAG-enhanced variants-particularly Parallel RAG-FLARKO-achieve superior performance in simultaneously optimizing for both behavioral alignment (measured by Preference@3) and profitability (Prof@3), as evidenced by consistently higher Comb@3 scores-a metric quantifying the frequency of recommending assets exhibiting both qualities-with standard error accounted for in the analysis.

This review details RAG-FLARKO, a multi-stage retrieval-augmented generation system optimized for knowledge graph integration and behavioral alignment in financial recommendation engines.

Despite the promise of large language models for personalized financial advice, limitations in context windows and a lack of behavioral grounding hinder their effectiveness. This paper introduces RAG-FLARKO, a novel framework detailed in ‘Parallel and Multi-Stage Knowledge Graph Retrieval for Behaviorally Aligned Financial Asset Recommendations’, which addresses these challenges through efficient, multi-stage retrieval of relevant information from both personal transaction data and market knowledge graphs. Our approach demonstrably enhances recommendation quality, particularly for smaller, more resource-efficient models. Could this framework unlock truly scalable and grounded financial AI solutions for a wider range of applications and users?

The Limitations of Conventional Recommendation Systems

Conventional recommendation systems, while prevalent, frequently deliver underwhelming results due to a limited grasp of the complexities driving consumer choices. These systems often rely on simplistic algorithms that prioritize popularity or basic collaborative filtering – suggesting items purchased by similar users – without accounting for the dynamic interplay between individual preferences and broader market trends. This approach neglects crucial contextual factors like seasonality, current events, or even subtle shifts in consumer sentiment. Consequently, recommendations can feel impersonal, irrelevant, or simply miss the mark, failing to capitalize on opportunities to suggest truly compelling products or services. The resulting suboptimal experiences not only diminish user engagement but also represent a lost potential for both consumers and businesses seeking meaningful connections.

Current recommendation algorithms frequently operate with a fragmented understanding of user needs, often treating past purchases and present conditions as separate data streams. While a user’s transaction history provides valuable insight into long-term preferences, it fails to account for immediate influences like trending products, seasonal changes, or even temporary stock limitations. Conversely, systems relying solely on real-time market data can appear impersonal and miss opportunities to suggest items aligned with established user tastes. The challenge lies in developing methods that seamlessly synthesize these disparate information sources – a holistic approach capable of discerning not just what a user has bought, but why, and how current circumstances might modify those preferences for truly relevant recommendations. Effectively bridging this gap requires sophisticated models capable of weighting these factors dynamically, moving beyond simple correlations to achieve a more nuanced and predictive capability.

Constructing Context: A Multi-Stage Retrieval Pipeline

The system employs a Multi-Stage Retrieval pipeline to generate focused subgraphs for analysis. This process integrates data from two distinct knowledge graphs: a Personal KG containing user-specific transactional data, and a Market KG housing broader market information. By sequentially querying these graphs, the pipeline constructs a subgraph that combines individual user behavior with current market context. This staged approach allows for the efficient retrieval of relevant information, enabling a more informed and contextualized analysis than would be possible with a single, monolithic graph query. The resulting subgraph serves as the foundation for subsequent processing and reasoning.

Personal Transaction Retrieval initiates the multi-stage process by querying the Personal Knowledge Graph (Personal KG) to access a user’s complete transaction history. This retrieval is executed using the SPARQL query language, enabling precise data selection based on defined criteria within the graph structure. The Personal KG stores a detailed record of all user interactions, including purchase details, viewed items, and associated metadata. Utilizing SPARQL allows for efficient traversal of these relationships, extracting relevant transaction data to form the initial component of the contextual subgraph. This data is subsequently combined with market information to provide a more comprehensive understanding of the user’s needs and preferences.

Following personal transaction retrieval, the system performs market retrieval via SPARQL queries directed at the Market Knowledge Graph. This process integrates current market data into the constructed subgraph, specifically utilizing the TenWeekPriceSummary data element. The TenWeekPriceSummary provides a record of price fluctuations over the preceding ten weeks, allowing the system to contextualize personal transactions with relevant market performance indicators. This enrichment is crucial for providing a comprehensive view of user activity within the broader market landscape and supports downstream analysis requiring current market conditions.

The RAG-FLARKO pipeline utilizes a multi-stage retrieval process to enhance information access.

RAG-FLARKO: Augmenting Recommendations with Knowledge and Generation

RAG-FLARKO extends the existing FLARKO framework by integrating retrieval-augmented generation (RAG) with structured knowledge graph reasoning. This approach combines the benefits of both methodologies: RAG enables the incorporation of external, relevant information into the generation process, while knowledge graph reasoning allows for inferential steps based on relationships defined within a structured knowledge base. Specifically, RAG-FLARKO retrieves a subgraph from a knowledge graph, and this subgraph is then included as context when prompting a large language model. This combined approach aims to improve the quality and accuracy of generated recommendations by grounding them in both factual information and logical inference derived from the knowledge graph.

The RAG-FLARKO framework enhances asset recommendations by incorporating a retrieved knowledge graph subgraph directly into the context window of large language models, specifically Qwen3-0.6B and Qwen3-1.7B. This process allows the LLM to base its recommendations not only on its pre-trained knowledge but also on the structured relationships and entities present in the retrieved subgraph. By providing this additional, focused information, RAG-FLARKO facilitates the generation of more informed recommendations that are directly relevant to the user’s query and the specific assets represented in the knowledge graph.

Model quantization techniques were investigated to mitigate the computational demands of large language models (LLMs) used within the $RAG-FLARKO$ framework. This involved reducing the precision of the model’s weights and activations, typically from 16-bit floating point to 8-bit integer or even lower. By representing model parameters with fewer bits, both memory footprint and computational requirements for operations like matrix multiplication are decreased. Evaluations focused on determining quantization levels that minimize performance degradation – specifically, maintaining recommendation quality – while maximizing reductions in latency and resource usage. The goal was to enable deployment of the $RAG-FLARKO$ system on hardware with limited resources without substantial loss of accuracy.

Quantifying System Performance: Metrics for Precision and Profit

The performance of the $RAG-FLARKO$ recommendation system is rigorously evaluated through a suite of precision-focused metrics. $Hits@3$ measures whether a relevant item appears within the top three recommendations, directly assessing retrieval accuracy. $Pref@3$ gauges user preference alignment by determining if the recommended items are favored over others. Crucially, $Prof@3$ extends this evaluation to include profitability, quantifying the economic value of the recommended choices. By considering these three dimensions – relevance, preference, and profit – researchers gain a comprehensive understanding of $RAG-FLARKO$’s ability to deliver not just accurate, but also desirable and financially beneficial recommendations.

The metric Comb@3 represents a nuanced evaluation of recommendation quality by synthesizing both user preference and economic viability. Rather than assessing recommendations solely on whether they match expressed desires, or simply focusing on potential revenue, Comb@3 considers both simultaneously. Specifically, it measures the proportion of times a recommendation aligns with a user’s stated preferences and contributes positively to a defined profitability metric within the top three suggestions. This combined assessment offers a more holistic understanding of a recommendation system’s performance, moving beyond isolated measures to reflect real-world success where both user satisfaction and financial outcomes are crucial. By integrating these factors, Comb@3 provides a robust indicator of a system’s ability to deliver genuinely valuable and effective recommendations.

Evaluations reveal that the RAG-FLARKO system achieves notably higher Comb@3 scores when contrasted with established baseline methodologies. This composite metric, which synthesizes both preference alignment and profitability, indicates a substantial performance gain, suggesting the system not only recommends items users favor but also those that yield positive outcomes. The improvement is particularly pronounced when utilizing the Qwen3-0.6B model, demonstrating a synergistic effect between the retrieval-augmented generation framework and the capabilities of this specific language model. These findings collectively highlight RAG-FLARKO’s potential to optimize recommendation systems for both user satisfaction and tangible benefits, offering a holistic advancement over current approaches.

The pursuit of robust financial recommendations, as detailed in this work, necessitates a rigor often absent in applied systems. One finds resonance with Vinton Cerf’s assertion that “The internet is not just about technology; it’s about people.” While this paper focuses on the technical architecture of RAG-FLARKO-a multi-stage retrieval system leveraging knowledge graphs-it implicitly acknowledges this human element. The framework’s emphasis on behavioral alignment isn’t merely an optimization; it’s an acknowledgement that successful financial tools must align with individual needs and preferences. Optimization without understanding the underlying behavioral context, much like unanalyzed code, risks building a technically proficient but ultimately flawed system. The efficiency gained by injecting relevant knowledge into smaller LLMs is commendable, but only meaningful if the knowledge itself is pertinent to the user’s financial profile.

What’s Next?

The presented work, while demonstrating improvements in retrieval-augmented generation for financial recommendations, subtly highlights a persistent tension. Efficiency gains achieved through multi-stage retrieval and optimization for smaller models are, fundamentally, a concession. They address computational limitations, not the inherent probabilistic uncertainty embedded within both market data and behavioral prediction. The pursuit of ‘behavioral alignment’ remains a heuristic – a practical approximation of a truly predictive model of human financial irrationality. One anticipates future efforts will grapple with the question of how much approximation is permissible before the recommendations devolve into statistically-flavored noise.

A critical path forward lies not merely in scaling knowledge graphs or refining retrieval mechanisms, but in demanding greater mathematical rigor. The current reliance on Large Language Models as black boxes, even when ‘augmented,’ is intellectually unsatisfying. Future research should prioritize methods for incorporating explicit probabilistic reasoning and verifiable constraints into the recommendation process. This may necessitate a move beyond purely data-driven approaches, toward hybrid systems that combine the strengths of symbolic AI with the pattern-recognition capabilities of neural networks.

Ultimately, the true test of this line of inquiry will not be measured by benchmark performance on contrived datasets, but by the demonstrable robustness of the resulting recommendations in the face of genuine market volatility and unpredictable human behavior. A solution that functions adequately under ideal conditions is, regrettably, not a solution at all.

Original article: https://arxiv.org/pdf/2511.11583.pdf

Contact the author: https://www.linkedin.com/in/avetisyan/

The Limitations of Conventional Recommendation Systems

Constructing Context: A Multi-Stage Retrieval Pipeline

RAG-FLARKO: Augmenting Recommendations with Knowledge and Generation

Quantifying System Performance: Metrics for Precision and Profit

What’s Next?

See also: