Beyond Ratios: Reviving Bankruptcy Prediction with Data Analysis

Author: Denis Avetisyan

A new approach to Altman’s classic model leverages compositional data analysis and machine learning to enhance the accuracy of financial distress prediction.

Adapting the Altman model using log-ratio transformations and machine learning improves predictive performance, particularly sensitivity, when assessing financial ratios.

Conventional financial ratio analysis for bankruptcy prediction is often hampered by issues of data distribution, such as outliers and non-normality. This study, ‘Adapting Altman’s bankruptcy prediction model to the compositional data methodology’, addresses these limitations by applying compositional data analysis and log-ratio transformations to the classical Altman model. Results demonstrate that this approach, particularly when combined with machine learning algorithms like random forests and logistic regression, improves predictive performance-specifically sensitivity-compared to traditional methods. Could compositional data analysis offer a more robust framework for early warning systems in financial risk management?

The Illusion of Prediction: Why Traditional Models Fail

The ability to accurately forecast corporate bankruptcy is paramount to maintaining broader economic health, as widespread failures can trigger systemic risk and financial contagion. However, conventional bankruptcy prediction models frequently falter when confronted with the intricacies of modern financial data. These methods, often reliant on established accounting ratios and statistical techniques, struggle to discern subtle yet critical shifts in a company’s financial composition-changes that can signal impending distress long before they become readily apparent in summary figures. The increasing complexity of financial instruments, coupled with the sheer volume of available data, overwhelms the capacity of these traditional approaches, leading to both false positives and, more dangerously, false negatives in predicting which firms are most vulnerable to collapse.

Conventional bankruptcy prediction frequently leans on established financial ratios – metrics like debt-to-equity or current ratios – yet these can offer a deceptively stable picture of a company’s true health. These ratios often fail to detect crucial compositional changes within a firm’s assets and liabilities; a company might maintain a seemingly healthy ratio while strategically shifting towards riskier, less liquid assets, or accumulating hidden obligations. This subtle deterioration isn’t necessarily reflected in aggregate figures, masking increasing financial vulnerability. Researchers find that analyzing the changes in these compositional elements – the specific types of assets and liabilities, rather than just their total values – reveals earlier warning signs of distress. Essentially, a firm can appear solvent based on standard ratios while simultaneously undergoing internal shifts that dramatically increase its risk of insolvency, highlighting the limitations of relying solely on these traditional metrics.

Traditional bankruptcy prediction models frequently stumble when confronted with the messy reality of financial data, largely due to inherent assumptions about its distribution. These models often rely on statistical techniques that demand data conform to a normal distribution – a bell curve – and are highly susceptible to the influence of outliers, or extreme values. However, financial datasets rarely exhibit perfect normality; instead, they often feature skewed distributions and disproportionately large values stemming from unusual transactions or accounting practices. Consequently, a single anomalous data point – a large, unexpected loss, for example – can significantly distort the model’s predictions, leading to inaccurate assessments of a company’s financial health and potentially masking true signs of distress. This sensitivity undermines the reliability of these established methods, highlighting the need for more robust techniques capable of handling non-normal data and mitigating the impact of extreme values.

Beyond Ratios: Seeing the Composition of Risk

Traditional financial analysis often assesses ratios – such as debt-to-equity, current ratio, and profit margin – in isolation, treating each as an independent metric. However, these ratios are fundamentally compositional; they represent parts of a financial whole, and their values are inherently linked. Compositional Data Methodology (CoDa) addresses this by recognizing that the relevant information lies not in the absolute values of individual ratios, but in their relationships to each other within the complete set. Instead of interpreting a 2:1 debt-to-equity ratio in isolation, CoDa considers it relative to other ratios like asset turnover and return on assets. This holistic approach acknowledges that changes in one ratio necessarily impact the others, preventing misinterpretations arising from analyzing ratios as independent variables and allowing for a more accurate representation of a company’s financial health. The method’s power stems from its ability to model the entire compositional vector, capturing interdependencies often missed by conventional techniques.

Traditional financial ratio analysis often treats each ratio as an independent variable, neglecting the fundamental constraint that ratios represent proportions summing to one. This can lead to spurious correlations and inaccurate interpretations because changes in one ratio necessarily affect others. Compositional Data Methodology (CoDa) directly addresses this by recognizing the inherent interdependencies within the system of ratios; a change in one component inherently impacts the distribution of the others. Furthermore, standard statistical techniques applied to ratios assume an additive error structure which is inappropriate given their proportional nature; CoDa, through transformations like log-ratios, converts the data to an additive scale suitable for standard statistical modelling while preserving the relational information between components. This allows for a more accurate representation of financial performance and risk assessment by modelling the composition rather than the absolute values of the ratios.

Log-Ratio Transformations are necessary because standard statistical methods assume data points are independent and additive, conditions violated by compositional data where values are inherently interdependent and represent parts of a fixed sum. Specifically, Pairwise Log-Ratios – calculated as $log(x_i / x_j)$ for each pair of components i and j within a composition – transform the original data into a coordinate system where components are expressed relative to one another. This process avoids the spurious correlations introduced by analyzing ratios directly and allows for the application of standard multivariate statistical techniques such as principal component analysis and regression. The resulting log-ratio space ensures additivity and independence, fulfilling the assumptions of many statistical models and enabling meaningful inference from compositional datasets.

Validating the Approach: Machine Learning as a Lens

To evaluate the predictive capabilities of compositional data analysis in financial risk assessment, three supervised machine learning algorithms – Logistic Regression, K-Nearest Neighbors, and Random Forests – were implemented. These models were trained and tested using two distinct sets of financial ratios: standard ratios calculated from individual financial statement items, and compositionally transformed ratios derived from the proportions within each item. This comparative approach allowed for a quantitative assessment of whether leveraging the compositional structure of financial data improves model performance relative to traditional ratio analysis techniques. The models were subjected to rigorous testing to determine their ability to accurately classify financial outcomes.

Application of Logistic Regression, K-Nearest Neighbors, and Random Forests to financial ratios revealed that utilizing a compositional data approach consistently improved predictive accuracy compared to analyses using standard ratios. This improvement stems from the compositional method’s ability to account for the inherent interdependence within the ratio data, avoiding distortions caused by treating each ratio in isolation. Specifically, models trained on compositionally transformed ratios demonstrated superior performance across multiple evaluation metrics, indicating a statistically significant enhancement in the models’ ability to correctly identify both positive and negative cases.

Model performance was evaluated using Sensitivity, Specificity, and Balanced Accuracy. Results indicate that compositional Random Forests and Logistic Regression achieved the highest Balanced Accuracy scores. Importantly, these models maintained Specificity levels comparable to those obtained using standard financial ratios. However, compositional methods demonstrated significantly improved Sensitivity, indicating a greater ability to correctly identify positive cases compared to standard approaches. This suggests that leveraging compositional data enhances the predictive power of these machine learning models, particularly in correctly identifying relevant instances without sacrificing the ability to avoid false positives.

Beyond Prediction: The Emergence of Robust Financial Understanding

Bankruptcy prediction can be significantly improved by treating industry sector data-as represented by NACE codes-not as categorical variables, but as components of a compositional dataset. This approach, leveraging compositional data analysis, acknowledges that industry representation isn’t an isolated characteristic, but rather a part of a whole, influencing the interpretation of financial ratios. Traditional models often overlook these interdependencies, potentially leading to misclassification errors. By analyzing the proportions of activity within different industry sectors, rather than simply identifying a primary sector, predictive models gain a more nuanced understanding of a company’s risk profile. Consequently, credit risk assessments become more informed, enabling lenders and investors to make better-supported decisions and potentially reducing financial losses associated with corporate insolvency.

The integration of NACE (Nomenclature statistique des activités économiques dans la Communauté européenne) codes, representing a company’s primary economic activity, significantly refines bankruptcy prediction models when paired with traditional financial ratio analysis. This approach moves beyond purely quantitative assessments by incorporating crucial contextual information about a firm’s operational sector. Studies demonstrate that NACE codes act as powerful predictors, capturing industry-specific vulnerabilities often missed by generic financial metrics; for example, a downturn affecting a specific sector will be more readily identified. Consequently, models incorporating NACE data exhibit enhanced robustness and improved accuracy in forecasting financial distress, allowing for more nuanced and reliable credit risk assessments than those relying solely on financial ratios.

The analytical framework, successfully applied to bankruptcy prediction, holds considerable promise for broader applications within financial analysis. Researchers are increasingly interested in adapting compositional data techniques to the complex challenge of fraud detection, where identifying subtle shifts in transactional patterns – akin to changes in industry sector composition – could prove crucial. Furthermore, the methodology offers a refined approach to portfolio optimization, potentially allowing for more accurate assessments of asset correlations and diversification benefits beyond traditional ratio analysis. This expansion of scope signifies a move towards a more holistic understanding of financial data, treating economic activity not simply as isolated figures, but as interconnected components of a larger, dynamic system, ultimately leading to more robust and insightful financial modeling.

The pursuit of predictive accuracy in financial modeling, as demonstrated by this adaptation of Altman’s model, reveals a fundamental truth about complex systems. Stability and order emerge from the bottom up, shaped by the interplay of compositional data and machine learning algorithms. Rather than imposing a rigid, top-down structure based on traditional ratios, this research allows the data itself to reveal patterns indicative of financial distress. As Paul Feyerabend observed, “Anything goes.” This suggests that methodological pluralism – embracing diverse analytical techniques like compositional data analysis – isn’t merely a matter of choice, but a necessity for truly understanding and predicting complex phenomena. The enhanced sensitivity achieved highlights how acknowledging the inherent interconnectedness within financial data improves forecasting capabilities, challenging the illusion of control offered by conventional models.

Where Do We Go From Here?

The observed gains in predictive power through compositional data analysis aren’t surprising, merely an acknowledgement of inherent relationships within financial data. Robustness doesn’t emerge from meticulously crafted models; it arises from respecting the underlying geometry of the system. The Altman model, for decades treated as a prescriptive tool, reveals itself as a snapshot – a correlation, not a causation. The true signal wasn’t in the ratios themselves, but in how those ratios related to the whole.

Future work needn’t focus on refining prediction accuracy-a pursuit ultimately limited by the inherent noise of complex systems. Instead, attention should shift toward understanding the emergent properties revealed by these techniques. Can similar approaches illuminate the mechanisms of financial distress, beyond simply forecasting its arrival? The challenge lies in moving from prediction to diagnosis-identifying the specific compositional imbalances that indicate vulnerability.

Ultimately, the field will benefit from embracing the notion that small interactions create monumental shifts. Focusing solely on the aggregate obscures the subtle, yet critical, local rules that govern financial health. It’s not about finding the perfect equation, but about mapping the landscape from which both stability and collapse emerge.

Original article: https://arxiv.org/pdf/2603.24215.pdf

Contact the author: https://www.linkedin.com/in/avetisyan/

The Illusion of Prediction: Why Traditional Models Fail

Beyond Ratios: Seeing the Composition of Risk

Validating the Approach: Machine Learning as a Lens

Beyond Prediction: The Emergence of Robust Financial Understanding

Where Do We Go From Here?

See also: