Adapting AI to Mortgages: A New Approach to Financial Language Models

Author: Denis Avetisyan

Researchers have developed a novel framework for fine-tuning large language models to excel in the complex domain of mortgage finance, blending specialized knowledge with general instruction-following skills.

Residual learning forms the architectural foundation for Track 1, enabling the propagation of information through deep networks by adding the input of each layer to its output, effectively creating shortcut connections.

The MortgageLLM framework employs residual instruction transfer, alignment tuning, and a task-specific routing mechanism via a mixture of experts to achieve domain adaptation.

While large language models excel in general domains, adapting them to specialized fields like mortgage finance presents a challenge in balancing domain expertise with reliable instruction following. This paper introduces ‘MortgageLLM: Domain-Adaptive Pretraining with Residual Instruction Transfer, Alignment Tuning, and Task-Specific Routing’, a novel dual-track framework employing instruction residuals and a self-routing mixture-of-experts architecture to overcome this limitation. Our approach achieves significant performance gains on mortgage-specific benchmarks, demonstrating a substantial improvement in both conversational Q&A and structured task completion. Could this dual-track specialization strategy offer a broadly applicable solution for effectively adapting LLMs across diverse, knowledge-intensive domains?

The Necessary Specialization of Language

Large language models, though remarkably versatile across a spectrum of tasks, often fall short when applied to highly specialized fields like mortgage finance. These models, trained on vast general datasets, lack the nuanced understanding of industry-specific terminology, regulations, and workflows crucial for accurate and insightful performance. Consequently, achieving true peak performance demands a process of adaptation, tailoring the model’s knowledge base to the intricacies of the target domain. This isn’t merely a matter of adding a glossary; it requires a recalibration of the model’s understanding, allowing it to not only recognize specialized terms but also to interpret their meaning within the complex context of mortgage finance, ultimately leading to more reliable and relevant outputs.

The inherent generality of large language models, while enabling broad applicability, frequently results in suboptimal performance when confronted with the nuances of specialized domains. These models, trained on vast corpora of general text, often lack the specific terminology, contextual understanding, and intricate knowledge bases required for accurate and insightful responses within fields like finance, medicine, or law. Consequently, direct application without adaptation often yields outputs that, while grammatically correct, are semantically imprecise, lack crucial details, or fail to capture the subtleties vital for informed decision-making. This deficiency underscores the necessity for targeted adaptation strategies to imbue these powerful tools with the domain-specific expertise needed to truly excel.

The process of tailoring large language models (LLMs) to function effectively within specialized fields presents substantial computational challenges. Domain adaptation isn’t merely a matter of retraining; it demands significant processing power, expansive datasets comprised of domain-specific terminology, and prolonged training times – resources often beyond the reach of many organizations. Consequently, research is increasingly focused on developing efficient adaptation techniques, such as parameter-efficient fine-tuning and knowledge distillation, which aim to minimize computational overhead while maximizing performance gains. These methods seek to transfer relevant knowledge from the general-purpose model to a specialized one without the need for extensive retraining of all parameters, offering a pathway to broader accessibility and deployment of LLMs in niche applications. The pursuit of these streamlined approaches is critical for unlocking the full potential of LLMs across a diverse range of industries and research areas.

Higher scores in the mortgage evaluation indicate a more favorable outcome.

Cultivating Domain Expertise: Adaptable Techniques

Continued pretraining represents a method of adapting a large language model (LLM), such as Meta-LLaMA-3.1-8B, to a specific domain by further training it on a corpus of unlabeled text from that domain. This process builds upon the general knowledge already embedded in the base LLM, exposing it to the vocabulary, syntax, and common patterns of the target domain. Unlike fine-tuning, which typically requires labeled datasets, continued pretraining leverages readily available unlabeled data, making it a scalable approach for knowledge injection. The model adjusts its internal parameters to better predict the statistical properties of the domain-specific text, effectively internalizing domain expertise without explicit task supervision.

Supervised fine-tuning is a process of further training a pretrained language model using a dataset of labeled examples specific to the target domain. This technique involves providing the model with input data paired with the correct, known output, allowing it to adjust its internal parameters to minimize prediction error on those labeled instances. The resulting model exhibits improved performance on tasks relevant to the labeled data, as the fine-tuning process optimizes the model’s ability to map inputs to the correct outputs within that specific domain. The quality and size of the labeled dataset are critical factors influencing the degree of performance enhancement achieved through supervised fine-tuning.

Parameter-efficient fine-tuning (PEFT) methods, such as Low-Rank Adaptation (LoRA), address the computational limitations of full fine-tuning by freezing the pretrained model weights and introducing a smaller number of trainable parameters. LoRA achieves this by approximating weight updates with low-rank matrices, significantly reducing the number of parameters requiring gradient calculation and storage. This reduction in trainable parameters lowers both the memory footprint and computational cost, enabling adaptation on resource-constrained hardware and facilitating more frequent experimentation. Consequently, PEFT techniques democratize access to LLM adaptation, allowing researchers and practitioners with limited resources to effectively tailor models to specific domains and tasks.

Combining continued pretraining, supervised fine-tuning, and parameter-efficient techniques like LoRA yields a Domain-Adapted Model with demonstrably improved performance. Specifically, application of these methods to a Mortgage Multiple Choice Question (MCQ) dataset resulted in an accuracy of 64.1%, representing a 16.3 percentage point increase over the 47.8% accuracy achieved by the initial MLM v1 model. This improvement indicates a substantial gain in the model’s ability to generalize and accurately respond to domain-specific queries following adaptation.

Direct fine-tuning, as implemented in Track 2, involves a straightforward architectural flow for model adaptation.

Aligning with Intent: Refining Model Behavior

Direct Preference Optimization (DPO) is a policy gradient-free method utilized to fine-tune the Domain-Adapted Model by directly maximizing the reward function based on human preferences. Unlike reinforcement learning from human feedback (RLHF) which requires a separate reward model, DPO reframes the optimization problem as a supervised learning task. This is achieved by formulating a loss function that directly penalizes the model for generating outputs that are dispreferred by human feedback. Specifically, DPO utilizes pairwise comparison data – instances where human raters indicate a preference between two model outputs for a given prompt – to train the model to generate more desirable responses. This approach simplifies the training process and improves stability compared to traditional RLHF methods, resulting in a model more closely aligned with desired characteristics and human expectations.

Instruction residual techniques facilitate the transfer of instruction-following abilities from a pre-trained model to the domain-adapted model. This is achieved by incorporating residual connections that preserve the original model’s capacity to interpret and execute instructions, even after fine-tuning on domain-specific data. The technique involves training the domain-adapted model to predict the residual difference between its output and the output of the pre-trained model when given the same instruction and input, effectively learning to “add” the pre-trained model’s instruction-following behavior to its own domain-specific knowledge. This approach avoids catastrophic forgetting of the initial instruction-following capabilities and accelerates the learning process on the new domain.

BERTScore was utilized as a key metric to quantitatively assess the semantic similarity between the model’s generated outputs and human-provided reference texts. This metric calculates precision, recall, and F1-score based on contextual embeddings from BERT, providing a robust evaluation of textual overlap beyond simple lexical matching. In comparative testing, our domain-adapted model achieved the highest BERTScore among all evaluated models, demonstrating superior alignment with desired output characteristics. Detailed results, including score distributions and comparative analyses, are visually represented in Figure 3, confirming the effectiveness of the implemented alignment strategies.

Subject matter experts (SMEs) evaluated the refined model’s outputs and indicated a 92.9% preference rate compared to baseline models. This assessment was conducted by presenting SMEs with paired outputs – one from the refined model and one from a baseline – and asking them to select the preferred response. The resulting preference rate demonstrates a statistically significant improvement in output quality and a stronger alignment with expected user preferences as judged by qualified evaluators. This metric provides quantitative evidence of the effectiveness of the model refinement process in achieving desired behavioral characteristics.

Scaling Intelligence: Efficient and Robust Inference

The Domain-Adapted Model benefits from a Dual-Expert Architecture, a design that moves beyond the limitations of single, generalized language models. This architecture establishes two distinct expert models, each specializing in a particular range of tasks or query types – for example, one expert might excel at summarization while the other focuses on question answering. By distributing the workload in this manner, the system achieves greater efficiency and accuracy; rather than forcing a single model to handle all requests, the Dual-Expert system leverages specialized knowledge for optimal performance. This approach not only streamlines processing but also allows for more nuanced and contextually relevant responses, ultimately enhancing the user experience and broadening the scope of applicable tasks.

The system’s architecture incorporates a self-routing mechanism designed to optimize query processing by dynamically assigning each request to the most qualified expert model. Rather than relying on a single, generalized model, the system intelligently assesses incoming queries and directs them to the specialized expert best equipped to handle the specific task or information request. This selective routing not only enhances the speed and accuracy of responses, but also improves overall efficiency by preventing unnecessary computations from less relevant models. The self-routing system continuously learns and adapts, refining its query assignment strategy to maximize performance and ensure that each request receives the most appropriate and effective processing path, ultimately leading to a more responsive and accurate user experience.

The pursuit of rapid and cost-effective deployment of large language models is significantly aided by vLLM, an open-source library meticulously engineered for LLM serving. Unlike traditional methods that often suffer from memory inefficiencies and suboptimal throughput, vLLM employsPagedAttention, a novel attention algorithm that dramatically reduces memory usage by only storing the necessary key-value states. This innovation, coupled with continuous batching of incoming requests and optimized CUDA kernels, enables substantially higher throughput and lower latency. Consequently, vLLM facilitates the serving of models with greater efficiency, allowing for more users to be served concurrently and reducing the overall cost of inference, paving the way for broader accessibility and real-time applications of powerful language models.

Recent innovations in language model architecture demonstrably enhance both the practical application and inherent safety of domain-adapted systems. By optimizing for scalability and responsiveness, these advancements allow for more efficient processing of complex queries and a faster turnaround for users. Critically, this progress extends to security, with the implemented architecture exhibiting a 66.4% improvement in resistance to prompt injection attacks – where malicious instructions are subtly embedded within user input – and an 80.7% increase in defense against the generation of harmful malware. This dual focus on performance and security positions these domain-adapted models as increasingly viable and trustworthy tools for a range of applications, moving beyond theoretical potential toward robust real-world deployment.

The pursuit of effective domain adaptation, as demonstrated in this work with MortgageLLM, echoes a fundamental principle of communication. Claude Shannon observed, “The most important thing in communication is to convey information accurately – not necessarily to convey it completely.” This sentiment applies directly to the model’s dual-track learning approach. Rather than attempting to instill exhaustive mortgage knowledge, the system prioritizes a balanced transfer of instruction-following skills alongside domain expertise. The architecture, with its self-routing mixture-of-experts, embodies this efficiency, selectively applying knowledge to ensure clarity and prevent overburdening the model with superfluous detail. This refined approach mirrors Shannon’s emphasis on signal-to-noise ratio – maximizing relevant information while minimizing extraneous complexity.

What’s Next?

The pursuit of domain adaptation often feels like chasing a receding horizon. This work, while promising, only clarifies the shape of the challenge. True generalization remains elusive. The mixture-of-experts architecture, a clever solution, introduces its own complexity. Every complexity needs an alibi. Future work must rigorously assess the cost of this increased model size and computational demand.

A critical limitation lies in the reliance on instruction tuning. Instructions age, principles don’t. The mortgage domain, like all regulated spaces, is subject to constant change. Models trained on today’s regulations will falter tomorrow. Research should explore methods for continuous learning, models that adapt not just to new data, but to evolving rules. This demands a shift from static training to dynamic refinement.

Ultimately, the value proposition isn’t simply improved performance on benchmark tasks. It’s about building trust. A model that appears knowledgeable is not the same as one that is reliable. Focus must turn to interpretability and robustness, ensuring these systems don’t merely mimic expertise, but genuinely possess it. Abstractions age, principles don’t.

Original article: https://arxiv.org/pdf/2511.21101.pdf

Contact the author: https://www.linkedin.com/in/avetisyan/

The Necessary Specialization of Language

Cultivating Domain Expertise: Adaptable Techniques

Aligning with Intent: Refining Model Behavior

Scaling Intelligence: Efficient and Robust Inference

What’s Next?

See also: