Banking on Context: A New Foundation Model for Financial Understanding

Author: Denis Avetisyan

Revolut’s PRAGMA model introduces a novel approach to understanding user financial histories by leveraging masked modeling and heterogeneous data sources.

A user’s record is represented as an ordered event history and profile state, where each field is decomposed into a semantic type, associated values, and a temporal coordinate; keys and values are embedded from a shared lookup table, and value tokens receive positional embeddings within each field, allowing a Profile State Encoder to map the profile state-with time since life-long events encoded via RoPE-into a [USR] embedding, while an Event Encoder independently maps event tokens into a [EVT] embedding augmented with calendar features, and a History Encoder contextualizes the resulting sequence with time to the last event-also encoded via RoPE-to produce a comprehensive representation of the user record.

This paper details PRAGMA, a transformer-based foundation model pre-trained on event sequences and diverse financial data to achieve state-of-the-art performance across multiple banking tasks.

Despite the wealth of data generated by modern financial systems, extracting meaningful economic signals from disparate transactional records remains a significant challenge. This paper introduces ‘PRAGMA: Revolut Foundation Model’, a novel approach employing a Transformer-based architecture pre-trained via masked modelling on a large, heterogeneous corpus of banking event sequences. The resulting foundation model delivers strong performance across a range of downstream financial tasks-from credit scoring to fraud detection-using only learned embeddings, and can be further refined with minimal fine-tuning. Could this general-purpose representation layer unlock new capabilities in financial modeling and risk assessment?

Deciphering the Complexity of Modern Financial Histories

Modern banking generates vast and varied user histories, encompassing transaction details, online activity, and static account attributes – a data deluge that overwhelms traditional analytical methods. These conventional approaches, often reliant on simplified datasets or isolated data points, struggle to synthesize information from multiple sources and identify subtle patterns indicative of risk or individual preferences. Consequently, institutions face limitations in accurately assessing creditworthiness, detecting fraudulent activity, and delivering truly personalized financial services. The inability to effectively process this complex data hinders the development of sophisticated risk models and impedes the creation of tailored offerings that could significantly enhance customer experience and financial outcomes.

Effective analysis of banking user histories hinges on advanced modeling techniques capable of deciphering the interwoven patterns of financial behavior over time. These models must move beyond simple chronological ordering to grasp the meaningful relationships between transactions, recognizing that a sequence of events – a small purchase followed by a larger one, or a consistent pattern of bill payments – reveals more than isolated data points. Furthermore, the ability to integrate disparate data types – transaction amounts, merchant categories, account details, and even external economic indicators – into a unified user profile is crucial. Such holistic representations allow for a nuanced understanding of individual financial habits, facilitating both accurate risk assessment and the delivery of truly personalized financial services, ultimately moving beyond one-size-fits-all approaches.

Financial data presents a unique modeling challenge due to its inherent heterogeneity; a user’s complete profile isn’t simply a stream of numbers, but a complex interplay of transaction records, significant life events – such as loan applications or address changes – and static demographic attributes. This diverse information isn’t uniformly structured or scaled, demanding methods capable of integrating disparate data types into a cohesive and meaningful representation. Traditional approaches often struggle with this variety, forcing simplifications that lose crucial context or require extensive feature engineering. Consequently, building robust and generalizable models – those that accurately predict behavior across different users and over time – requires innovative techniques that can effectively capture the relationships between these varied data elements and avoid biases introduced by focusing solely on one aspect of a user’s financial life.

User interaction histories on the platform, consisting of sequentially recorded events like transactions and communications alongside contextual attributes, are uniformly represented with timestamps and key-value pairs to provide a comprehensive record of user behavior.

PRAGMA: A Foundation for Financial Intelligence

PRAGMA utilizes an encoder-style foundation model architecture and is pre-trained using masked modelling techniques. This pre-training process involves obscuring portions of multi-source banking user history data and tasking the model with reconstructing the missing information. By learning to predict masked values across diverse data streams – including transaction details, account information, and customer interactions – PRAGMA develops robust representations of user financial behavior. The encoder-style architecture focuses on creating a compressed, meaningful embedding of the input data, enabling efficient downstream task performance and generalization to unseen data patterns within banking user histories.

The Key-Value-Time Tokenisation scheme addresses the challenge of representing diverse financial data within a unified framework. This method encodes data points as triplets consisting of a semantic key identifying the data type (e.g., transaction amount, merchant category), the associated numerical or categorical value, and the precise timestamp of the event. By explicitly capturing these three dimensions, the scheme allows the model to differentiate between similar values with differing meanings or occurring at different times. This structured encoding facilitates efficient processing of heterogeneous financial histories and enables the model to learn relationships based on semantic type, magnitude, and temporal context, improving performance on downstream tasks requiring nuanced understanding of user behavior.

PRAGMA’s architecture utilizes three primary encoder components to generate comprehensive user profiles. The Profile State Encoder processes static user attributes, such as demographic information and account details, to establish a baseline representation. The Event Encoder focuses on individual transactional events – deposits, withdrawals, transfers – extracting features specific to each action. Crucially, the History Encoder integrates outputs from both the Profile State and Event Encoders, contextualizing current events within the user’s longitudinal financial history. This layered approach allows PRAGMA to move beyond isolated transactions and build a dynamic, time-aware representation of user behavior, facilitating more accurate financial intelligence applications.

Utilizing a pre-trained, frozen Nemotron-1B-v2 embedding allows direct mapping of text to a vector projected into the Transformer’s base dimension, contrasting with the trainable embedding lookup table and custom BPE tokenizer used in the PRAGMA approach.

Optimizing Efficiency Through Architectural Design

PRAGMA utilizes the Transformer architecture, a neural network design originally developed for natural language processing, due to its demonstrated efficacy in modeling sequential data. The Transformer’s core mechanism, self-attention, allows the model to weigh the importance of different elements within a sequence when generating representations. This is particularly advantageous for financial time series data where relationships between past observations are critical for predicting future values. The architecture’s inherent parallelism also facilitates efficient computation, enabling PRAGMA to process large datasets and complex relationships between financial signals. Furthermore, pre-trained Transformer models provide a strong foundation for transfer learning, reducing the need for extensive training from scratch and accelerating model development.

Sequence Packing is implemented to address the inefficiencies arising from variable-length input sequences common in financial time series data. Traditional methods often require padding shorter sequences to match the length of the longest sequence within a batch, leading to wasted computation. Sequence Packing mitigates this by concatenating multiple shorter sequences into a single, fully utilized sequence for processing. A dedicated “mask” is then used to indicate the boundaries between the original sequences within the packed sequence, ensuring correct processing and preventing information leakage. This approach maximizes GPU utilization and reduces computational overhead, directly improving processing speed without sacrificing accuracy.

Dynamic batching in PRAGMA adjusts batch sizes during processing to maximize throughput, specifically by grouping sequences of similar lengths together. This minimizes padding required to create uniform-sized batches, reducing computational waste and accelerating processing. Complementing this, Low-Rank Adaptation (LoRA) fine-tuning allows for efficient adaptation of the model to specific downstream financial tasks. LoRA achieves this by freezing the pre-trained model weights and introducing a smaller set of trainable, low-rank matrices, significantly reducing the number of trainable parameters and associated computational cost compared to full fine-tuning, while maintaining performance on target tasks.

Embedding probing was utilized to assess the quality of the representations learned by PRAGMA. This involved training simple probe classifiers to predict specific financial attributes from the model’s embeddings; performance on these classifiers indicates the extent to which PRAGMA’s internal representations encode relevant financial information. Evaluation metrics included precision, recall, and F1-score, demonstrating statistically significant correlations between embedding values and known financial signals, such as sector classification, risk level, and earnings volatility. These results confirm PRAGMA’s ability to effectively capture and represent meaningful financial data within its learned embeddings, facilitating downstream task performance.

A unified architecture, scaling from 10 million to 1 billion parameters, achieves superior performance compared to task-specific models across a range of tasks.

Demonstrating Impact Across Core Financial Operations

PRAGMA’s versatility extends beyond theoretical potential, demonstrably impacting core financial operations through successful implementation in critical downstream tasks. The model has been effectively utilized in applications ranging from accurately assessing credit risk and identifying fraudulent transactions to predicting customer lifetime value – all areas demanding high precision and reliability. This broad applicability showcases PRAGMA’s ability to learn and generalize patterns relevant to diverse financial challenges, offering a unified approach to traditionally siloed analytical problems and paving the way for more holistic risk management and customer engagement strategies.

PRAGMA’s success in crucial financial applications stems from its ability to discern subtle patterns in user behavior, a capability that consistently elevates its performance beyond that of conventional models. In credit scoring, for instance, PRAGMA has demonstrated a remarkable improvement, achieving up to a 130.2% increase in Precision-Recall Area Under the Curve (PR-AUC) compared to baseline methods. This substantial gain isn’t merely statistical; it translates directly into more accurate risk assessment and potentially expanded access to credit for deserving individuals. The model’s refined understanding of behavioral indicators allows it to differentiate between creditworthy applicants and those posing higher risks with greater precision, ultimately enhancing the efficiency and fairness of lending processes.

Evaluations reveal that PRAGMA significantly enhances the accuracy of communication engagement predictions, demonstrating a marked improvement over existing models. Specifically, PRAGMA achieves a 12.4% increase in Area Under the Receiver Operating Characteristic curve (ROC-AUC) when assessing communication engagement, indicating a superior ability to distinguish between engaged and disengaged users. This improvement extends to a substantial 20.4% gain in ROC-AUC for analyzing communication engagement, highlighting PRAGMA’s capacity to provide more nuanced and reliable insights into user interactions – a capability crucial for targeted marketing, customer service optimization, and personalized communication strategies.

Evaluations reveal a substantial performance gain associated with increasing the size of the PRAGMA model; the large model achieves a 35.2% improvement in Precision-Recall Area Under the Curve (PR-AUC) when applied to credit scoring tasks. This significant uplift underscores the benefits of model scale in capturing the complexities of financial data and refining predictive accuracy. The difference in performance indicates that a larger model capacity allows PRAGMA to learn more nuanced patterns from user behavior, ultimately leading to more reliable and effective credit risk assessments. This result emphasizes the potential for continued gains through further model scaling and optimization.

PRAGMA’s strength lies not only in achieving high performance on specific tasks, but also in its capacity to consistently deliver results across a wide spectrum of financial data. This generalization capability stems from the model’s learned representations, which appear to capture fundamental patterns in user behavior that transcend individual datasets or financial institutions. Rigorous testing on diverse financial datasets-encompassing credit applications, transaction histories, and engagement metrics-demonstrates that PRAGMA’s core understanding of financial signals remains remarkably consistent, even when exposed to previously unseen data distributions. This adaptability is a key differentiator, suggesting that PRAGMA is learning underlying principles rather than simply memorizing specific training examples, and promising a significant reduction in the need for extensive retraining or fine-tuning when deployed in new environments.

The consistent performance gains achieved by PRAGMA across crucial financial applications suggest a fundamental shift in how financial intelligence can be approached. By effectively capturing the subtleties of user behavior, this model doesn’t simply refine existing predictive capabilities – it expands them, yielding substantial improvements in areas like credit scoring, fraud detection, and lifetime value prediction. This isn’t merely incremental progress; the demonstrated enhancements – including a notable 130.2% improvement in PR-AUC for credit scoring – signal the potential to unlock deeper insights and more accurate assessments than previously possible. Consequently, PRAGMA offers a pathway towards more informed, data-driven decision-making, promising to reshape risk management, customer engagement, and ultimately, the landscape of financial strategy.

The development of PRAGMA underscores a principle central to robust system design: simplicity scales, cleverness does not. While the model incorporates a sophisticated architecture for handling heterogeneous financial data and event sequences, its core relies on the established transformer framework and masked modeling – proven techniques adapted to a specific domain. This pragmatic approach avoids unnecessary complexity, recognizing that a foundation model’s true strength lies in its ability to generalize across tasks, a feat best achieved through a scalable, understandable structure. As Marvin Minsky observed, “Questions are more important than answers.” The very act of framing financial user history as a sequence of events-a question about data representation-drives PRAGMA’s success, enabling effective pre-training and downstream task performance. Dependencies are the true cost of freedom; PRAGMA’s streamlined architecture minimizes these costs.

The Horizon Beckons

The presentation of PRAGMA, while a demonstrable step forward, merely sharpens the existing paradox inherent in foundation models. The architecture, a carefully constructed engine for extracting signal from sequential financial data, inevitably introduces new points of failure. Each optimization, each gain in predictive power, creates corresponding vulnerabilities – unforeseen biases amplified by the model’s scale, or sensitivities to novel, adversarial input. The system’s behavior over time is not defined by the initial design, but by the emergent properties of these tensions.

Future work will not be defined by increasingly elaborate architectures, but by a more holistic understanding of the data itself. Heterogeneous financial data, a strength of PRAGMA, is also a source of inherent instability. True progress lies not in forcing disparate signals into a unified representation, but in developing methods to explicitly model and manage their inherent conflicts. The challenge is to build systems that acknowledge their own limitations, rather than attempting to transcend them.

Ultimately, the pursuit of a singular, all-encompassing foundation model may be a category error. The financial landscape is not static, but a complex, evolving system. The value will not reside in predicting the future with certainty, but in building models that can adapt, learn, and gracefully degrade in the face of inevitable uncertainty.

Original article: https://arxiv.org/pdf/2604.08649.pdf

Contact the author: https://www.linkedin.com/in/avetisyan/

Deciphering the Complexity of Modern Financial Histories

PRAGMA: A Foundation for Financial Intelligence

Optimizing Efficiency Through Architectural Design

Demonstrating Impact Across Core Financial Operations

The Horizon Beckons

See also: