Merging Minds: Boosting Language AI for Finance and Thai

Author: Denis Avetisyan


A new approach combines open-source models to deliver enhanced performance in both Thai language understanding and financial domain expertise.

This paper details THaLLE-ThaiLLM, a model merging strategy leveraging Low-Rank Adaptation to improve open-source large language models for specialized applications.

Despite the growing potential of large language models (LLMs) across specialized domains, organizations often face a trade-off between deploying numerous single-capability models or incurring the substantial costs of training a universal one. This report, ‘THaLLE-ThaiLLM: Domain-Specialized Small LLMs for Finance and Thai — Technical Report’, explores model merging as a resource-efficient alternative, demonstrating that combining open-source LLMs can effectively enhance both Thai language proficiency and financial expertise. Our experiments reveal significant performance gains across multiple benchmarks through merging strategies, indicating a viable path toward creating high-performing, multi-capability LLMs without extensive retraining. Could this approach unlock wider access to powerful, domain-specific language models for industries with unique linguistic and regulatory needs?


The Imperative of Thai-Specific Language Models

Large Language Models, despite achieving remarkable success in English, frequently encounter difficulties when applied to languages with different linguistic structures, and Thai presents a particularly notable challenge. This disparity stems from several factors, including the relative scarcity of Thai text data used in pre-training these models, the complexities of the Thai writing system – which lacks explicit word boundaries – and the unique morphological and syntactic features of the language. Consequently, existing LLMs often exhibit reduced accuracy, fluency, and contextual understanding when processing Thai, hindering their effectiveness in tasks such as machine translation, text summarization, and sentiment analysis. This performance gap underscores the critical need for language-specific adaptations and the development of models specifically trained on, and optimized for, the nuances of the Thai language.

Despite the remarkable abilities of leading closed-source Large Language Models such as GPT-4 and Gemini, their inherent limitations pose challenges for localized applications, particularly within the nuances of the Thai language and cultural context. These models operate as “black boxes,” offering little insight into their decision-making processes and hindering efforts to adapt them to specific local requirements. This lack of transparency prevents researchers and developers from effectively fine-tuning the models for optimal performance in Thai, addressing unique linguistic features, and mitigating potential biases. Furthermore, the restricted access and customization options impede innovation, as stakeholders are unable to modify the models to suit specialized tasks or integrate them into tailored solutions for Thai-speaking communities. Consequently, a reliance on these closed systems can stifle progress and limit the potential benefits of LLMs for local contexts.

The current landscape of large language models presents significant hurdles for Thai language processing, prompting a crucial need for dedicated, openly available resources. Existing, proprietary models, while exhibiting general intelligence, often struggle with the nuances of Thai grammar, cultural context, and idiomatic expressions, hindering their effectiveness in local applications. Developing robust, open-source Thai LLMs addresses these limitations by enabling researchers and developers to tailor models to specific needs – from improving machine translation and chatbot accuracy to preserving and promoting the Thai language itself. This open approach fosters innovation, allows for community-driven improvements, and ensures broader accessibility, empowering a wider range of individuals and organizations to leverage the power of artificial intelligence within the Thai linguistic sphere.

Building Open-Source Foundations: A Pragmatic Approach

The ThaiLLM Initiative strategically utilizes openly available Large Language Models (LLMs) – specifically Qwen and LLaMA – to accelerate the development of Thai language processing capabilities. This approach avoids the substantial costs and restrictions often associated with proprietary models and allows for greater customization and community contribution. By building upon these existing architectures, the initiative focuses resources on adapting and enhancing the models with Thai-specific data and expertise, rather than constructing an LLM from the ground up. This foundation enables the creation of models tailored to the nuances of the Thai language, promoting accessibility and innovation within the local AI ecosystem.

ThaiLLM-8B is a foundational language model created by utilizing the Qwen3-8B-Base model as its starting point. To adapt this base model for Thai language processing, the team employed Continued Pre-Training (CPT). This process involved further training the Qwen3-8B-Base model on a large corpus of Thai language text data. CPT allows the model to learn the nuances of the Thai language, including its grammar, syntax, and vocabulary, without requiring labeled data. The resulting ThaiLLM-8B model serves as a general-purpose foundation for subsequent fine-tuning and adaptation to specific downstream tasks and applications requiring Thai language understanding and generation.

THaLLE-Finance-8B is a specialized language model derived from the ThaiLLM-8B foundation model, tailored for applications within the financial sector. This model was created using Supervised Fine-Tuning (SFT), a process that involves training the base model on a curated dataset of financial texts and labeled data. The datasets used for SFT included financial reports, news articles, and potentially regulatory filings, enabling the model to better understand and generate text relevant to financial contexts. This fine-tuning process optimizes the model’s performance on tasks such as sentiment analysis of financial news, extraction of key information from financial documents, and generation of financial reports or summaries.

Rigorous Validation: Quantifying Thai LLM Performance

The ThaiLLM Initiative utilizes a multi-faceted benchmarking approach to evaluate large language model (LLM) performance across critical dimensions. This includes ThaiSafetyBench, designed to assess the model’s adherence to safety guidelines and potential for generating harmful outputs; IFEval-TH, which specifically measures the consistency and coherence of generated Thai language; and Flare CFA, a benchmark focused on evaluating financial reasoning capabilities. These benchmarks provide quantifiable metrics for assessing model strengths and weaknesses, enabling iterative improvement and responsible deployment of Thai LLMs.

IFEval-TH is a benchmark designed to specifically assess the consistency of Thai language output from large language models. Evaluations using this benchmark have yielded high scores for both ThaiLLM-8B-Instruct (0.994) and THaLLE-0.2-ThaiLLM-8B-fa (0.982). These scores indicate a strong capability in generating coherent and consistent responses in Thai, suggesting these models effectively maintain contextual relevance and avoid contradictory statements within their output. The high performance on IFEval-TH contributes to validating the models’ proficiency in nuanced Thai language generation.

Performance evaluations demonstrate that merging models, specifically utilizing THaLLE-0.2-ThaiLLM-8B-fa, results in quantifiable improvements in benchmark testing. On the O-NET exam, a standardized Thai national exam, model merging yielded a 12.6% performance increase. Similarly, on the Flare CFA exam, designed to assess financial reasoning capabilities, the merged model achieved a 5.7% improvement over baseline performance. These results indicate that combining models can enhance performance across diverse evaluation metrics relevant to Thai language processing and reasoning.

Expanding the Horizon: Real-World Impact and Future Trajectories

The emergence of THaLLE-Finance-8B signals a significant advancement in the application of large language models to specialized domains within Thailand. This bespoke model, trained on financial data, showcases the potential to automate complex tasks such as generating detailed reports and providing preliminary investment advice. By focusing on the nuances of the Thai financial landscape, it moves beyond general-purpose LLMs, offering a level of precision and relevance previously unattainable. The development isn’t simply about automation; it’s about democratizing access to financial insights and potentially reshaping how financial services are delivered within the Thai market, offering tailored solutions informed by sophisticated data analysis and natural language processing.

A significant leap in performance was achieved through the merging of two large language models: THaLLE-0.2-ThaiLLM-8B-fa was combined with a base Qwen3-8B model, resulting in a remarkable 40% improvement on the challenging Thai Investment Consultant (IC) exam. This technique highlights the potential of intelligently combining specialized models to amplify their capabilities, effectively transferring knowledge and refining expertise in a targeted domain. The success demonstrates that, rather than solely relying on scaling model size, strategically merging models pre-trained on relevant datasets can unlock substantial gains in performance and accuracy, opening avenues for more sophisticated and reliable financial applications within the Thai market.

The openly accessible nature of THaLLE-Finance-8B and its foundational models is designed to accelerate progress through collaborative development and widespread adoption. This commitment to open-source principles allows researchers and developers to freely build upon existing work, customizing and refining the models for increasingly specialized applications. Recent evaluations demonstrate the tangible benefits of this approach; merging THaLLE-0.2-ThaiLLM-8B-fa with the base model resulted in a significant performance boost, evidenced by O-NET scores of 0.707 (M3) and 0.623 (M6)-a clear improvement over the performance of Qwen3-8B. This enhanced capability signals the potential for broader impact, fostering innovation and enabling the creation of novel solutions across various sectors.

The pursuit of enhanced language models, as demonstrated by THaLLE-ThaiLLM, echoes a fundamental principle of mathematical elegance: refinement through focused application. This work leverages model merging-a computationally efficient technique-to instill both Thai language proficiency and financial domain expertise into open-source LLMs. This approach isn’t merely about achieving incremental gains; it’s about constructing a demonstrably correct solution for a specific problem space. As Blaise Pascal observed, “The eloquence of the tongue consists not in its power to persuade, but in its ability to prove.” Similarly, THaLLE-ThaiLLM doesn’t simply aim for improved performance; it proves its efficacy through rigorous evaluation benchmarks, establishing a verifiable improvement in specialized language processing.

Future Directions

The demonstrated efficacy of model merging, while promising, merely shifts the computational burden. The true challenge lies not in achieving incremental gains through parameter adjustment, but in fundamentally rethinking the architecture of these large language models. Current approaches treat language as a black box, optimized for superficial pattern matching. A more rigorous path demands a deeper integration of linguistic principles and financial modeling – a move toward provable correctness, rather than probabilistic approximation.

Evaluation benchmarks, as presently constructed, remain a significant source of potential self-deception. Achieving high scores on curated datasets does not guarantee genuine understanding or reliable performance in real-world financial applications. Future work must prioritize the development of adversarial tests and stress-case scenarios designed to expose the inherent limitations of these models. The focus should be on identifying what these models do not know, not simply confirming what they appear to know.

Further exploration of low-rank adaptation (LoRA) is warranted, but only if coupled with a theoretical understanding of its limitations. The assumption that a low-dimensional subspace can adequately capture the nuances of financial language and Thai linguistic structure requires careful scrutiny. Optimization without analysis is, after all, a fool’s errand. The path forward necessitates a return to first principles – a mathematical elegance that transcends empirical observation.


Original article: https://arxiv.org/pdf/2601.04597.pdf

Contact the author: https://www.linkedin.com/in/avetisyan/

See also:

2026-01-10 03:16