Author: Denis Avetisyan
Researchers have developed a scalable method to build comprehensive firm-to-firm production networks from publicly available data, offering unprecedented insight into economic dependencies.

This work details an algorithm leveraging gravity models and Markov chains to reconstruct large-scale production networks from input-output tables and firm size distributions, enabling analysis of network scalability and economic shocks.
Understanding the complex interdependencies within modern economies requires detailed knowledge of firm-to-firm relationships, yet reconstructing these networks at scale remains a significant challenge. This paper, ‘Reconstructing Large Scale Production Networks’, introduces a scalable algorithm leveraging firm size and sectoral input-output flows to generate weighted production networks comprising millions of firms. The approach utilizes a gravity model and Markov chain techniques to create realistic network topologies while preserving key economic constraints. Could this methodology provide new insights into economic resilience and the propagation of shocks across global supply chains?
Unraveling the Web: Mapping Economic Interdependence
The intricate web of modern economies presents a significant challenge to traditional analytical methods. Contemporary production isn’t characterized by simple supply chains, but rather by deeply nested networks where firms rely on components and services sourced from numerous other firms, often across international borders. Existing economic models, frequently built on assumptions of isolated entities and linear production flows, struggle to capture this complexity. Consequently, these models often fail to accurately represent the ripple effects of disruptions-like factory closures or trade restrictions-and offer limited insight into the true vulnerability of economic systems. Reconstructing these production networks demands new approaches capable of handling vast datasets and revealing the hidden interdependencies that define modern economic reality, moving beyond aggregated statistics to focus on firm-to-firm connections.
Conventional economic modeling frequently employs simplification to manage computational complexity, yet these reductions can inadvertently conceal the intricate web of relationships defining modern production. By focusing on aggregated sectors and assuming limited linkages between firms, these models often fail to capture the ripple effects of disruptions or policy changes. This approach overlooks the reality that businesses are deeply embedded in supply chains, relying on a multitude of specialized inputs from diverse sources – a single component failure or trade restriction can cascade through multiple industries. Consequently, predictions derived from overly simplified models may underestimate systemic risk and misguide interventions, highlighting the need for more granular and interconnected representations of economic activity to accurately assess vulnerability and promote resilience.
The ability to accurately map global production networks is increasingly critical for anticipating and mitigating economic disruptions. These networks, characterized by intricate layers of suppliers and consumers, amplify the effects of even localized shocks – a factory closure in one region can quickly cascade into shortages and price increases elsewhere. Consequently, detailed reconstructions of these interdependencies allow economists and policymakers to model potential impacts with greater precision, moving beyond simplified assumptions that often underestimate systemic risk. This enhanced predictive capability is not merely academic; it directly informs strategic decisions regarding supply chain resilience, trade policy, and targeted interventions designed to buffer economies against unforeseen events, ultimately fostering greater stability in an interconnected world.
Constructing the Network Backbone: A Data-Driven Approach
The construction of the inter-firm network relies on quantifying sectoral flows using data from the World Input-Output Table (WIOT). The WIOT provides a detailed matrix of transactions between various economic sectors, representing the flow of goods and services between them. These flows are expressed in monetary units, allowing for the precise measurement of inter-sectoral linkages at a global scale. Specifically, the magnitude of flows from sector $i$ to sector $j$ determines the initial weight assigned to potential connections between firms operating within those sectors, serving as the foundational data for network topology.
The Gravity Model, applied to inter-firm relationships, posits that the probability of a connection between two firms is directly proportional to the product of their respective sizes – typically measured by sectoral output or employment – and inversely proportional to a distance metric representing the magnitude of transactions between them. Formally, the connection probability $P_{ij}$ between firms $i$ and $j$ can be expressed as $P_{ij} \propto \frac{S_i \cdot S_j}{F_{ij}}$, where $S_i$ and $S_j$ represent the sizes of firms $i$ and $j$, and $F_{ij}$ denotes the monetary or physical flow between them. This model, borrowed from physics, assumes larger firms and greater inter-firm flows increase the likelihood of a direct relationship, effectively simulating a ‘gravitational’ pull between economic actors.
The Bernoulli Ensemble is utilized to establish a binary network backbone by assigning a connection probability, $p$, to each potential link between sectors. This ensemble generates multiple network realizations, each representing a possible network structure based on the calculated sectoral flows. The probability, $p$, is not uniform; it is derived from the normalized magnitude of inter-sectoral flows, ensuring that stronger economic relationships have a higher likelihood of forming a connection in the backbone. By averaging across these realizations, a stable and robust network structure is created, minimizing the impact of individual flow fluctuations and providing a representative foundation for subsequent analysis. This process effectively filters noise and highlights the dominant, systemic connections within the economic network.
Markov Regularity is applied post-network construction to ensure analytical tractability by enforcing specific graph-theoretic properties. This process involves iteratively adjusting connection probabilities to guarantee irreducibility – meaning any node can reach any other node within the network – and aperiodicity, which confirms the absence of cyclical patterns in reachability. Specifically, the algorithm targets nodes with limited out-degree or those forming strong cycles, modifying connection weights to satisfy the Markovian property: the probability of transitioning from any state depends only on the current state, not on the path taken. These adjustments are critical for validating the network’s suitability for analytical techniques such as PageRank or community detection, as they prevent the algorithms from becoming trapped in specific subgraphs or exhibiting non-stationary behavior.
Extending the Network: From Firm to Factory Floor
The Factory Network Extension represents a shift from analyzing inter-firm relationships to a more granular examination of internal production networks. This involves modeling multiple geographically distributed production units – factories, warehouses, and distribution centers – as nodes within a network. Prior research typically focused on connections between companies; this extension analyzes the network within a single organization, acknowledging that production isn’t a singular entity but a distributed system. This approach allows for the identification of logistical bottlenecks, dependencies between facilities, and the impact of geographical distance on supply chain efficiency, moving beyond simple firm-level connectivity to a spatially-aware representation of production processes.
Geographic distance between factories is calculated using the Haversine formula, which determines the great-circle distance between two points on a sphere given their longitudes and latitudes. This formula accounts for the Earth’s curvature and provides a more accurate distance measurement than Euclidean distance, particularly for geographically dispersed production facilities. The calculation, expressed as $2 arcsin(\sqrt{sin^2(\frac{\Delta lat}{2}) + cos(lat_1) cos(lat_2) * sin^2(\frac{\Delta long}{2})})$, requires latitude and longitude coordinates for each factory and yields distance in the same units as the Earth’s radius. These distances are then incorporated into the network analysis, allowing for the assessment of spatial relationships and the identification of geographically concentrated or isolated production units.
Tarjan’s Algorithm is employed to identify Strongly Connected Components (SCCs) within the factory network. An SCC is a sub-graph where every vertex is reachable from every other vertex within that component. This is achieved through a depth-first search that tracks “low-link” values, representing the earliest visited node reachable from a given node and its descendants. Nodes with matching low-link and discovery times are designated as the root of an SCC, allowing for the efficient segmentation of the network into cohesive units. Identifying SCCs is crucial for understanding network resilience; a factory within a strongly connected component is less vulnerable to disruption as alternative pathways for resource flow exist. The algorithm’s $O(V + E)$ time complexity, where V is the number of vertices (factories) and E is the number of edges (connections), ensures scalability for large industrial networks.
Network edge weights are optimized using CPLEX, a mathematical programming solver, in conjunction with a Minimum Energy Weighting (MEW) approach. MEW aims to assign weights that reflect both the underlying network structure and the computational demands of analysis; higher weights indicate stronger relationships between factories but can increase solution complexity. The optimization process minimizes a cost function incorporating edge weights and network connectivity, thereby balancing the desire for a structurally representative network with the need for analytical tractability. Specifically, CPLEX iteratively adjusts edge weights subject to constraints derived from the network topology, achieving a weighted network suitable for subsequent analysis, such as identifying critical production pathways or assessing supply chain resilience. The resulting weights facilitate efficient computation without sacrificing the essential characteristics of the factory network.
Grounding Reality: Data Foundations and Network Fidelity
The foundation of a realistic economic network lies in accurately representing the distribution of firm sizes within the economy. Data from the Small Business Administration (SBA) serves as a critical input, providing the necessary parameters for calibrating the Gravity Model – a widely used tool for understanding inter-firm linkages. This model posits that the strength of a connection between two firms is proportional to their sizes, but only yields meaningful results when those sizes reflect actual economic realities. By leveraging SBA data to ensure the simulated firm size distribution mirrors the observed distribution, researchers can build a network that isn’t merely a theoretical construct, but a plausible representation of real-world production relationships, significantly enhancing the validity of simulations and shock analyses.
The reconstruction of a realistic production network relies not only on firm-level data, but detailed understanding of how sectors interact. Complementary data from the Bureau of Economic Analysis (BEA) provides crucial insights into these inter-industry flows, detailing the value of goods and services exchanged between different sectors of the economy. This information serves as a vital validation step, allowing researchers to assess the accuracy of the network’s sectoral linkages and refine the reconstruction process. By comparing the modeled flows within the network to the actual flows reported by the BEA, discrepancies can be identified and corrected, resulting in a more robust and reliable representation of the complex relationships between businesses. The integration of BEA data significantly enhances the network’s fidelity, moving beyond simple firm counts to capture the nuanced dependencies that define modern production systems.
Traditional economic models often rely on substantial simplifications of complex firm interactions, limiting their capacity to accurately predict responses to real-world disturbances. This research addresses this challenge by grounding network reconstruction in extensive, real-world data. By leveraging detailed information on firm sizes and sectoral flows, the methodology creates a production network far exceeding the scale and realism of prior work. Consequently, simulations built upon this network offer a significantly improved ability to model the propagation of economic shocks, providing insights into how disruptions-such as supply chain bottlenecks or shifts in demand-ripple through the economy and impact individual firms. This data-driven approach allows for a more nuanced understanding of economic resilience and vulnerability, moving beyond the limitations of highly abstracted representations.
The culmination of this research is a reconstructed production network encompassing over 5.4 million firms and 130 million links – currently the largest of its kind. This scale represents a significant advancement beyond prior efforts, notably exceeding the network of 105 firms constructed by Ialongo et al. (2024). Crucially, the algorithm underpinning this network achieves linear computational complexity – denoted as $O(N)$ – when employing firm binning techniques, ensuring scalability and efficient processing even with the inclusion of millions of entities. This algorithmic efficiency, combined with the network’s unprecedented size, positions it as a robust platform for investigating complex economic phenomena and simulating the propagation of shocks through modern supply chains.

The presented methodology for reconstructing production networks acknowledges inherent uncertainty – a crucial element often glossed over in economic modeling. The algorithm doesn’t claim to know the network perfectly, but rather builds a probabilistic representation based on available data and iteratively refines it. This mirrors a scientific approach to uncovering complex systems. As Isaac Newton observed, “I do not know what I may seem to the world, but to myself I seem to be a boy playing on the seashore.” The algorithm, like a diligent observer, collects data – the ‘pebbles’ washed ashore – and attempts to build a coherent picture, understanding that the ‘ocean’ of the economic landscape is vast and fundamentally unknowable with absolute certainty. The iterative refinement process, grounded in Markov Chains, directly addresses the need to constantly test and disprove assumptions, aligning with a rational, uncertainty-driven approach to network scalability.
What’s Next?
The capacity to reconstruct production networks at scale, as demonstrated, is less a solution than a sharpening of the questions. The algorithm itself is, after all, merely a formalized accounting – a sophisticated way to trace the inevitable dissipation of value. The true challenge lies not in building the network, but in acknowledging the inherent inaccuracies within it. Each reconstructed link represents a probability, not a certainty, and the aggregate effect of these uncertainties remains a significant, and largely unaddressed, source of error. Data isn’t the goal – it’s a mirror of human error.
Future work will undoubtedly focus on integrating additional data modalities – trade in services, for example, or the increasingly vital flow of information. However, a more fruitful avenue may lie in embracing the messiness of the real world. Current models tend to privilege neat, quantifiable relationships. Yet, the tacit knowledge embedded within firms, the informal agreements, the sheer unpredictability of human behavior – these remain stubbornly resistant to formalization. Even what we can’t measure still matters – it’s just harder to model.
Ultimately, the value of these reconstructed networks will be judged not by their fidelity to some abstract “true” state, but by their utility in anticipating systemic vulnerabilities. The capacity to simulate shocks is impressive, but the simulation is only as good as the willingness to accept that the most dangerous failures are, by definition, those that haven’t yet been imagined.
Original article: https://arxiv.org/pdf/2512.02362.pdf
Contact the author: https://www.linkedin.com/in/avetisyan/
See also:
- Zerowake GATES : BL RPG Tier List (November 2025)
- Clash Royale codes (November 2025)
- The Shepherd Code: Road Back – Release News
- It: Welcome to Derry’s Big Reveal Officially Changes Pennywise’s Powers
- Best Assassin build in Solo Leveling Arise Overdrive
- Gold Rate Forecast
- Where Winds Meet: March of the Dead Walkthrough
- Stephen King’s Four Past Midnight Could Be His Next Great Horror Anthology
- A Strange Only Murders in the Building Season 5 Error Might Actually Be a Huge Clue
- How to change language in ARC Raiders
2025-12-04 05:56