Simulating Disaster on Social Media: An Agentic Approach

Author: Denis Avetisyan

Researchers are using AI-powered workflows to generate realistic synthetic tweet datasets, overcoming the challenges of accessing real-time social media data during crises.

A system of interacting agents iteratively refines synthetic crisis-related tweets: a generator creates content, a compliance evaluator assesses its suitability, and a feedback augmenter relays evaluations back to the generator, with each accepted tweet accumulating over <span class="katex-eq" data-katex-display="false">n</span> rounds to form a dataset <span class="katex-eq" data-katex-display="false">\mathcal{D}\_{syn}</span>, demonstrating a cyclical process of refinement inherent in complex systems. — A system of interacting agents iteratively refines synthetic crisis-related tweets: a generator creates content, a compliance evaluator assesses its suitability, and a feedback augmenter relays evaluations back to the generator, with each accepted tweet accumulating over $n$ rounds to form a dataset $\mathcal{D}\_{syn}$ , demonstrating a cyclical process of refinement inherent in complex systems.

This work details the design and evaluation of an agentic workflow leveraging large language models for generating crisis-related synthetic tweet datasets to facilitate damage assessment and crisis informatics research.

Access to real-time social media data is increasingly constrained, hindering critical research in crisis informatics and the development of effective AI-driven disaster response systems. This paper presents ‘Design and evaluation of an agentic workflow for crisis-related synthetic tweet datasets’, detailing a novel approach to generate realistic, labeled social media data using large language models and an iterative agentic workflow. We demonstrate that this method successfully produces synthetic tweet datasets capable of evaluating AI systems for tasks like damage assessment and geolocalization, offering a scalable alternative to costly and limited real-world data curation. Could this approach unlock new possibilities for proactively training and evaluating crisis response technologies across diverse scenarios and societal contexts?

The Fragility of Immediate Knowledge

In the immediate aftermath of a disaster, the speed and precision of damage assessment directly correlate with the effectiveness of rescue efforts and resource allocation. Historically, these evaluations have relied on physical inspections and aerial surveys – processes that, while thorough, are inherently slow and demand significant logistical support. This traditional approach often struggles to keep pace with the evolving needs on the ground, creating delays in delivering aid to those most affected. The limitations become particularly acute in large-scale events or geographically challenging areas, where accessing impacted regions can be hindered, and the sheer volume of damage overwhelms available assessment teams. Consequently, a critical need exists for more agile and efficient methods capable of providing a near real-time understanding of the situation, enabling responders to prioritize interventions and maximize their impact.

The proliferation of social media platforms during times of crisis generates an unprecedented influx of real-time information, presenting a dual-edged sword for damage assessment. While platforms like Twitter and Facebook become immediate sources of on-the-ground reports – potentially offering insights into affected areas before traditional methods can be deployed – the sheer volume of data poses significant analytical hurdles. Distinguishing credible reports of damage from misinformation, irrelevant content, and the ‘noise’ of general communication requires sophisticated automated tools and algorithms. Effectively harnessing this data stream demands not only the capacity to collect and process vast quantities of text, images, and videos, but also to validate information and extract meaningful insights with speed and accuracy – a challenge that continues to drive innovation in the fields of artificial intelligence and crisis informatics.

Synthetic tweets successfully replicate the length distributions and hashtag frequencies-including crisis-related terms like <span class="katex-eq" data-katex-display="false">#earthquake</span>, <span class="katex-eq" data-katex-display="false">#napa</span>, and <span class="katex-eq" data-katex-display="false">#haiti</span>-observed in real tweet datasets. — Synthetic tweets successfully replicate the length distributions and hashtag frequencies-including crisis-related terms like $#earthquake$ , $#napa$ , and $#haiti$ -observed in real tweet datasets.

Reconstructing Reality: An Agentic Approach

The Agentic Workflow is a system designed to automatically produce synthetic data for crisis event simulation. It functions through the interaction of multiple software agents, each with a specific role in the data generation process. This approach moves beyond simple data augmentation or static dataset creation by enabling a dynamic and iterative refinement of the synthetic data. The workflow aims to create realistic and labeled tweets, mimicking the characteristics of social media communication during actual crisis events. This generated data can then be utilized for training and evaluating machine learning models designed for crisis response, disaster management, or social media analysis, without reliance on potentially sensitive or limited real-world data.

The synthetic tweet generation process leverages Large Language Models (LLMs) to create realistic crisis-related data. Specifically, the workflow employs models including gemma-3-1b-it, Qwen-3-0.6B, and Llama-3.2-1B-Instruct. These LLMs are utilized to produce the textual content of the synthetic tweets, with the aim of simulating authentic social media communication during crisis events. The selection of these models balances computational efficiency with the capacity to generate diverse and contextually appropriate text, forming the core of the data creation pipeline.

The synthetic tweet generation process incorporates a Tweet Evaluator to assess data quality based on three criteria: Location Correctness, Damage Level Correctness, and Textual Diversity. This evaluator iteratively filters generated tweets, providing feedback to refine subsequent outputs. Through three rounds of evaluation and refinement within the agentic workflow, acceptance rates – defined as the percentage of tweets passing all three compliance checks – reached up to 50%. This indicates that, following iterative feedback, approximately half of the generated tweets were deemed sufficiently realistic and accurate based on the defined evaluation metrics.

Using the 2014 Napa earthquake as a test case, the distribution of synthetic tweets varies significantly depending on the large language model (LLM) used for generation-specifically, <span class="katex-eq" data-katex-display="false">gemma-3-1b-it</span>, <span class="katex-eq" data-katex-display="false">Qwen-3-0.6B</span>, and <span class="katex-eq" data-katex-display="false">Llama-3.2-1B-Instruct</span>-as measured by the number of compliance checks passed. — Using the 2014 Napa earthquake as a test case, the distribution of synthetic tweets varies significantly depending on the large language model (LLM) used for generation-specifically, $gemma-3-1b-it$ , $Qwen-3-0.6B$ , and $Llama-3.2-1B-Instruct$ -as measured by the number of compliance checks passed.

Validating the Simulated: A Controlled Environment

The Synthetic Tweet Dataset was specifically designed to facilitate research in two key areas: post-earthquake damage assessment and the wider field of Crisis Informatics. For damage assessment, the dataset provides a scalable source of labeled data for training and evaluating machine learning models intended to automatically categorize the severity of damage reported in social media following a seismic event. In Crisis Informatics, the dataset serves as a controlled environment for investigating information propagation patterns, identifying emerging needs, and testing the efficacy of crisis communication strategies, all without the ethical and logistical constraints of utilizing real-time, user-generated data from active disasters. This controlled environment allows for repeatable experiments and systematic analysis of crisis-related information flows.

The Synthetic Tweet Dataset facilitates accurate Damage Level Prediction, a critical component of post-disaster response prioritization. Damage severity is consistently labeled using the Modified Mercalli Intensity (MMI) Scale, allowing for standardized assessment and comparison. Evaluation of the synthetic dataset demonstrates damage level prediction accuracy ranging from 97.96% to 98.85%, indicating a high degree of fidelity in replicating real-world damage reporting patterns and providing a robust resource for developing and testing damage assessment algorithms.

Geolocalization is integral to post-earthquake damage assessment, enabling the identification of affected geographic areas. Utilizing libraries such as SpaCy to process textual data, the synthetic dataset demonstrates a geolocalization accuracy ranging from 72.56% to 95.76%. This performance is compared to real-world data, which achieves a higher geolocalization accuracy of 91.36% to 99.14%. While the synthetic data exhibits a slight reduction in accuracy, it still provides a viable dataset for developing and testing damage assessment algorithms, particularly when real-world data is limited or unavailable.

Synthetic tweets generated by the <span class="katex-eq" data-katex-display="false"> ext{gemma-3-1b-it}</span> model at a temperature of <span class="katex-eq" data-katex-display="false"> au=1.4</span> demonstrate feedback based on assessed location correctness, damage level, and textual diversity for a target location of San Francisco and a damage level of 0. — Synthetic tweets generated by the $ext{gemma-3-1b-it}$ model at a temperature of $au=1.4$ demonstrate feedback based on assessed location correctness, damage level, and textual diversity for a target location of San Francisco and a damage level of 0.

Toward a Resilient Future: Augmenting the Real

Post-earthquake damage assessment is often hampered by a critical shortage of immediately usable data, delaying relief efforts and hindering accurate impact analysis. To address this, researchers are leveraging synthetic datasets generated from sophisticated models to substantially enhance existing, limited real-world information. This augmentation isn’t about replacing actual data, but bolstering it – providing a richer, more comprehensive picture of affected areas even when ground truth is scarce. By training algorithms on a combination of real and synthetic examples, damage assessment can proceed at a significantly faster pace, and with improved precision. This approach allows for quicker identification of critical infrastructure failures, more effective resource allocation, and ultimately, a more responsive and impactful disaster relief operation.

The system’s capacity for continuous learning hinges on the Feedback Augmenter, a crucial component within its agentic workflow. This mechanism doesn’t simply generate synthetic crisis-related tweets; it actively evaluates their quality based on simulated responses and expert feedback. This evaluation isn’t a one-time check, but an iterative process where the Tweet Generator is refined with each cycle. By analyzing the effectiveness of generated content – gauging factors like believability and information density – the system adjusts its parameters to produce increasingly realistic and useful synthetic data. Consequently, the training datasets become more robust and representative of actual crisis communications, significantly improving the performance of downstream applications designed for rapid damage assessment and emergency response coordination. This self-improving cycle ensures the system adapts and becomes more effective over time, without relying solely on limited real-world examples.

The innovative data generation framework transcends the specific challenge of post-earthquake damage assessment, presenting a versatile tool applicable to numerous crisis scenarios. By adapting the agentic workflow and Tweet Generator, researchers can simulate data reflective of diverse events – from hurricanes and wildfires to industrial accidents and public health emergencies. This scalability stems from the system’s capacity to refine its simulation parameters based on feedback, effectively learning the characteristics of different crises and producing increasingly realistic training data. Consequently, emergency response teams and AI developers gain access to a robust and adaptable resource, enabling improved preparedness and more effective AI models across a broad spectrum of potential disasters, even for low-frequency, high-impact events where real-world data is scarce.

Synthetic tweet acceptance rates improve with each round of compliance feedback across all six simulated earthquake events.

The pursuit of scalable experimentation, as detailed in the article, mirrors a fundamental truth about complex systems. Any improvement to a damage assessment workflow, built upon synthetic data generation, ages faster than anticipated, demanding constant refinement. This inherent decay is not a failure, but rather a characteristic of existence within time’s arrow. Andrey Kolmogorov observed, “The most important thing in science is not to be afraid of making mistakes.” This principle resonates deeply; the agentic workflow, while innovative, will inevitably require iterative adjustments-each iteration a necessary response to the system’s natural tendency toward entropy. The workflow’s adaptability is its strength, acknowledging that perfect stability is an illusion, and progress lies in gracefully navigating the inevitable process of change.

The Long View

This work establishes a method for fabricating crisis-related data, a necessary step given the inevitable decay of access to real-time social media streams. Logging is the system’s chronicle, and the current trajectory suggests a future where reconstructing past information environments relies increasingly on such synthetic recreations. However, the fidelity of these simulations remains a critical, and largely unaddressed, concern. The agentic workflow itself is merely a snapshot-a moment on the timeline-and its long-term utility depends on adapting to the evolving capabilities-and biases-of the underlying large language models.

The present study sidesteps the question of ‘ground truth’ – an understandable concession given the inherent ambiguity of crisis events. Yet, ignoring this fundamental problem risks building evaluation metrics on shifting sands. Future iterations must grapple with defining, and measuring, the ‘believability’ of synthetic crises, perhaps by focusing not on replicating specific events, but on preserving the statistical properties of information flow during times of disruption.

Ultimately, the success of this approach-and similar efforts-will not be judged by its ability to mimic the past, but by its capacity to anticipate the future. Crisis informatics, like all systems, is subject to entropy. The challenge lies in designing workflows that age gracefully, acknowledging that perfect reconstruction is an illusion, and that the most valuable data may be that which reveals the limits of our simulations.

Original article: https://arxiv.org/pdf/2603.13625.pdf

Contact the author: https://www.linkedin.com/in/avetisyan/

The Fragility of Immediate Knowledge

Reconstructing Reality: An Agentic Approach

Validating the Simulated: A Controlled Environment

Toward a Resilient Future: Augmenting the Real

The Long View

See also: