Spotting DeFi Scams: A New Framework for Rug Pull Detection

Author: Denis Avetisyan

Researchers have developed a novel system that combines blockchain data with off-chain intelligence to identify and predict fraudulent ‘rug pull’ schemes in decentralized finance.

The receiver operating characteristic curves, presented with increased magnification, delineate the trade-off between true positive rate and false positive rate, demonstrating the discriminatory power of the classification model under scrutiny.

This work introduces a leakage-resistant framework fusing on-chain analytics with temporally aligned OSINT signals, leveraging a transformer-based model (TabPFN) for improved accuracy and calibrated predictions.

Despite the promise of decentralized finance, smart contract ecosystems remain vulnerable to fraudulent schemes like rug pulls, where project creators abscond with investor funds. This paper introduces the ‘LROO Rug Pull Detector: A Leakage-Resistant Framework Based on On-Chain and OSINT Signals’, a novel approach that integrates blockchain transaction data with external intelligence from social media and search trends to predict malicious intent. By constructing a temporally aligned dataset and employing the TabPFN transformer model, we demonstrate improved detection accuracy and calibrated risk assessment while mitigating the critical issue of temporal data leakage. Can this leakage-resilient framework provide a foundation for more robust and reliable security systems in the rapidly evolving landscape of decentralized finance?

Decoding the DeFi Mirage: Unmasking Rug Pulls

Decentralized Exchanges (DEXs), while revolutionizing financial access, have unfortunately become prime targets for malicious actors employing “rug pulls.” These schemes, a growing threat within the Decentralized Finance (DeFi) landscape, involve developers abandoning a project and absconding with investors’ funds. Typically, this occurs after artificially inflating the token’s price through marketing and liquidity mining incentives, luring unsuspecting individuals into a false sense of security. Once substantial investment is secured, the developers remove the liquidity – essentially emptying the trading pool – leaving investors with worthless tokens and significant financial losses. The anonymity and rapid deployment capabilities inherent in DeFi contribute to the prevalence of these scams, making it increasingly difficult for investors to distinguish legitimate projects from carefully constructed traps.

Conventional fraud detection systems, designed for centralized finance, are proving inadequate in the swiftly evolving landscape of Decentralized Finance (DeFi). These systems typically rely on established identities, credit histories, and regulatory oversight – elements largely absent in the pseudonymous and permissionless world of DeFi. The speed at which new projects launch – sometimes within hours – overwhelms traditional vetting processes, while the complex smart contract interactions and liquidity pool mechanics present unique analytical challenges. Consequently, malicious actors exploit these vulnerabilities, deploying sophisticated ‘rug pull’ schemes that can drain millions of dollars from unsuspecting investors before conventional fraud detection can even begin to assess the risk. This inherent mismatch between existing security protocols and the dynamic nature of DeFi creates a significant and growing vulnerability for participants.

Detecting DeFi rug pulls demands a comprehensive analytical strategy, moving beyond simple code audits. Researchers are increasingly focused on integrating blockchain transaction data – examining liquidity pool activity, smart contract ownership, and token distribution patterns – with off-chain information sources. This includes social media sentiment analysis, developer reputation assessments, and monitoring of centralized exchange listings. A truly robust system correlates on-chain anomalies – such as sudden liquidity withdrawals or hidden contract functionalities – with external signals of potential fraud, like anonymous team members or aggressive, unsubstantiated marketing claims. This multi-faceted approach aims to establish a risk score for each project, allowing investors to differentiate between legitimate ventures and deliberately deceptive schemes before committing capital.

Beyond the Surface: A Multimodal Lens on DeFi Risk

The analytical framework utilizes multimodal modeling by integrating data from on-chain sources with off-chain open-source intelligence (OSINT). On-chain analytics focuses on quantifiable blockchain interactions, including transaction volumes, token distribution, and liquidity pool metrics – specifically, changes in total value locked (TVL) and liquidity provider (LP) behavior. Complementary OSINT data encompasses publicly available information such as social media engagement, developer activity on platforms like GitHub, and search trends identified through tools like Google Trends. This combined approach allows for a more holistic assessment than relying solely on either on-chain or off-chain data in isolation, providing a broader contextual understanding of project health and potential risk factors.

Combining on-chain data with off-chain signals provides a more robust project risk assessment. Analysis of social media activity – including sentiment, follower growth, and engagement rates – complements on-chain metrics such as transaction counts and token distribution. Google Trends data, measuring search volume for project keywords, offers insight into public interest and potential hype cycles. Correlating these external indicators with on-chain behavior allows for the identification of discrepancies and anomalies that may signal increased risk, such as disproportionate marketing spend versus actual network usage, or a sudden spike in search interest unrelated to development milestones. This integrated approach yields a more complete and nuanced risk profile than relying solely on blockchain data.

The system’s efficacy rests on establishing temporal causality between predictive indicators and rug pull events. This necessitates analyzing data sequences to identify signals that consistently precede malicious activity, rather than simply detecting anomalies after a rug pull has begun. The methodology focuses on identifying leading indicators – changes in on-chain metrics combined with off-chain signals – that statistically predate project abandonment or liquidity withdrawal. This proactive approach contrasts with reactive fraud detection systems, which primarily flag suspicious activity post-event, and allows for potential intervention or user alerts prior to financial loss.

The Ghost in the Machine: Eliminating Temporal Leakage

Temporal data leakage in predictive modeling arises when information from the future, relative to the prediction target, is incorporated into the training dataset. This can occur through various mechanisms, such as including data points with timestamps reflecting events that haven’t yet occurred at the time the prediction is intended to be made. The consequence is an artificially inflated assessment of model performance during training and validation, as the model effectively “cheats” by having access to information it wouldn’t realistically possess during deployment. This leads to unreliable predictions and poor generalization to unseen, real-world data, where future information is unavailable, ultimately undermining the practical utility of the model.

The Leakage-Resistant Framework was designed to address the problem of temporal data leakage in predictive modeling. This was achieved through meticulous dataset construction, ensuring that all features used for training were derived from data available strictly before any liquidity withdrawal event occurred. This methodology prevented the inadvertent inclusion of future information that could artificially inflate model performance during training and lead to inaccurate predictions when deployed on live data. The hand-labeled dataset was specifically curated to only incorporate historical data points, effectively isolating the model to information that would have been accessible at the time of prediction in a real-world scenario, thereby increasing the robustness and reliability of the resulting predictions.

The Leakage-Resistant Framework was implemented using both TabPFN and Real-TabPFN, which are deep learning models specifically designed for tabular data, alongside more traditional machine learning algorithms for comparative analysis. XGBoost and LightGBM were selected as baseline models due to their widespread use and established performance in similar predictive tasks. This allowed for a direct assessment of the framework’s impact when paired with advanced architectures, quantifying the performance gains achieved through the prevention of temporal leakage relative to established methods. Model performance was evaluated across multiple metrics to ensure a comprehensive understanding of predictive accuracy, calibration, and robustness.

Evaluation of the Leakage-Resistant Framework demonstrated a peak accuracy of 98% in early rug-pull detection when tested on unseen data. This represents a substantial improvement over existing tools, with the LROO and FORTA systems achieving approximately 60% accuracy in the same testing environment. Furthermore, the TabPFN model yielded a ROC AUC of 0.997 and a PR AUC of 0.997, indicating strong discrimination and precision. Calibration metrics also favored the developed approach, with the TabPFN model achieving the lowest Brier Score and LogLoss values compared to benchmark algorithms, suggesting a more reliable estimation of prediction probabilities.

Beyond Detection: A Proactive Shield for DeFi’s Future

The decentralized finance (DeFi) space, while innovative, remains vulnerable to malicious actors employing schemes like rug pulls – where developers abandon a project and abscond with investor funds. This research addresses this critical security gap by offering a proactive detection tool designed to safeguard both investors and DeFi platforms. The system doesn’t simply react to fraud after it occurs; instead, it analyzes project characteristics before significant investment takes place, identifying patterns and anomalies indicative of potentially fraudulent intent. By providing an early warning system, this work aims to foster greater trust and stability within the DeFi ecosystem, allowing for more informed decision-making and mitigating the substantial financial risks currently associated with these emerging technologies. It empowers stakeholders to confidently navigate the DeFi landscape and protect their assets from increasingly sophisticated scams.

A novel approach to identifying malicious decentralized finance (DeFi) projects leverages the power of multimodal modeling within a leakage-resistant framework. This system doesn’t rely on a single data source; instead, it integrates insights from smart contract code analysis, on-chain transaction patterns, and token distribution metrics. By combining these diverse data streams, the model achieves a more comprehensive understanding of project risk. Crucially, the leakage-resistant design prevents attackers from manipulating the system by subtly altering project characteristics, ensuring a sustained level of reliability. The result is a robust defense against rug pulls and other fraudulent schemes, offering a significant advancement in proactive security for the rapidly evolving DeFi ecosystem.

This research transcends the immediate problem of rug pull detection within decentralized finance. The developed methodology establishes a framework capable of identifying broader anomalous behaviors across the DeFi ecosystem. By focusing on deviations from established norms-rather than specific attack signatures-the system can potentially flag a variety of security threats, including sophisticated exploits and previously unseen vulnerabilities. This generalizable approach offers a proactive security layer, moving beyond reactive measures and enabling platforms to assess risk holistically. Consequently, the methodology isn’t limited to safeguarding against fraudulent project launches, but also contributes to the overall stability and trustworthiness of the decentralized finance landscape by bolstering resilience against a wide range of potential attacks and unusual on-chain activity.

A comprehensive evaluation of Decentralized Finance (DeFi) project risk benefits significantly from a layered analytical approach. Beyond simply tracking transaction patterns, detailed examination of smart contract code via Ethereum Virtual Machine (EVM) Bytecode Analysis reveals hidden functionalities and potential vulnerabilities. This technique, when combined with the observation of on-chain behavioral patterns – such as liquidity provision and token swapping – provides a nuanced understanding of project activity. Crucially, assessing token concentration – the distribution of tokens among holders – highlights the potential for manipulation or sudden liquidity drains. Integrating these three distinct data streams allows for a far more refined risk assessment, moving beyond superficial indicators to identify genuinely malicious projects or those susceptible to exploitation, ultimately bolstering investor protection and platform security.

The pursuit of identifying malicious intent within decentralized finance, as detailed in this framework, echoes a fundamental tenet of systems analysis. This study doesn’t merely seek to prevent rug pulls; it actively probes the boundaries of predictability, leveraging both on-chain data and OSINT signals to anticipate deception. As John McCarthy aptly stated, “The question of what ought to be is best answered not by consulting the emotions but by consulting the facts.” The research meticulously examines temporal data leakage – essentially, the system’s vulnerabilities – and builds a model to detect anomalies. It’s a process of reverse-engineering trust, dissecting the architecture of deception to reveal the underlying connections, not unlike testing the limits of any complex system to understand its true behavior.

Beyond the Signal: Charting Future Exploits

The presented framework represents, at best, a localized exploit of comprehension regarding the phenomenon of rug pulls. It successfully fuses on-chain and OSINT data, but assumes this fusion is exhaustive. The inherent limitation lies in the very definition of ‘signal’ – what quantifiable metric truly encapsulates malicious intent? Future work must actively seek out the noise, the seemingly irrelevant data points that, when properly contextualized, reveal the pre-emptive indicators of deception. The current reliance on behavioral metrics risks becoming a game of cat and mouse, where actors adapt to evade detection based on known parameters.

A more profound challenge is the evaluation methodology itself. Leakage-resistant evaluation is commendable, yet it implicitly acknowledges the inevitability of temporal leakage. The question isn’t merely if information leaks, but how to build systems that degrade gracefully – that don’t offer false positives based on anticipated, rather than observed, behavior. The goal should shift from prediction to probabilistic assessment, understanding that certainty is an illusion in a system predicated on human agency.

Ultimately, this work highlights a fundamental truth: fraud isn’t a technical problem; it’s a consequence of incentive structures. The most effective ‘detector’ isn’t an algorithm, but a system that aligns incentives, making deception less profitable than cooperation. Until that shift occurs, the search for the perfect rug pull detector will remain a perpetual, and ultimately futile, exercise.

Original article: https://arxiv.org/pdf/2603.11324.pdf

Contact the author: https://www.linkedin.com/in/avetisyan/

Decoding the DeFi Mirage: Unmasking Rug Pulls

Beyond the Surface: A Multimodal Lens on DeFi Risk

The Ghost in the Machine: Eliminating Temporal Leakage

Beyond Detection: A Proactive Shield for DeFi’s Future

Beyond the Signal: Charting Future Exploits

See also: