Tracking the Pulse of Protest: How AI Can Decode Social Movements

Author: Denis Avetisyan


New research introduces a computational tool that analyzes online conversations surrounding movements like #MeToo and #BlackLivesMatter, offering a deeper understanding of how public discourse evolves across different platforms.

Social movements, as measured by online discourse surrounding #MeToo and Black Lives Matter, exhibit fluctuating periods of heightened visibility-distinguished by a threshold of <span class="katex-eq" data-katex-display="false">\mu + 2\sigma</span>-that reveal the ephemeral nature of public attention and the difficulty of sustaining momentum beyond transient spikes in conversation.
Social movements, as measured by online discourse surrounding #MeToo and Black Lives Matter, exhibit fluctuating periods of heightened visibility-distinguished by a threshold of \mu + 2\sigma-that reveal the ephemeral nature of public attention and the difficulty of sustaining momentum beyond transient spikes in conversation.

This study details the development of SMART, a system leveraging natural language processing and time series analysis to track event-driven shifts in social movement discourse.

While understanding public discourse is crucial for reporting on impactful social movements, traditional analysis often lacks the granularity to connect shifting sentiments with external events. This research addresses this gap with ‘SMART: A Social Movement Analysis & Reasoning Tool with Case Studies on #MeToo and #BlackLivesMatter’, a system designed to track and forecast emotional responses within online social movement discussions. Through analysis of a novel 2.7M+ post dataset, we demonstrate that SMART effectively detects platform-specific discourse shifts surrounding key political events like the 2024 U.S. election, revealing the limitations of relying solely on volume measurements. Can nuanced, predictive analysis of online discourse fundamentally reshape how journalists cover and contextualize critical social issues?


The Echoes of Disquiet: Mapping the Landscape of Social Flux

Addressing contemporary global challenges – from climate change and political polarization to public health crises and economic inequality – increasingly demands a nuanced understanding of evolving social movements. However, traditional analytical approaches often fall short, frequently relying on siloed data sources and reacting to events after they unfold. This fragmented methodology hinders the ability to proactively identify emerging concerns, assess the underlying drivers of social change, and formulate effective interventions. Consequently, responses tend to be piecemeal and lack the systemic perspective necessary to address the complex interplay of factors shaping public discourse and collective action, limiting opportunities for preventative strategies and informed policy decisions.

A truly comprehensive understanding of evolving social landscapes necessitates the aggregation and analysis of data from a multitude of online platforms – social media, news outlets, blogs, forums, and more. This demand, however, quickly overwhelms traditional analytical methods, requiring a robust and scalable system capable of processing immense volumes of text in real-time. Such a system must not only handle the sheer quantity of information, but also effectively categorize, filter, and interpret diverse linguistic styles, emerging trends, and subtle shifts in public sentiment. The development of these systems represents a significant technological challenge, pushing the boundaries of natural language processing, machine learning, and data storage capabilities to unlock actionable insights from the constant stream of digital communication.

Current methodologies for analyzing public opinion often fail to capture the subtle evolutions in societal conversations, creating significant challenges for those seeking to understand and respond to emerging trends. Traditional approaches frequently rely on broad categorizations or keyword analysis, overlooking the critical context and semantic shifts that reveal deeper changes in public sentiment. This inability to discern nuance hinders proactive interventions, as responses are often formulated after a shift in discourse has solidified, rather than anticipating it. Consequently, policy-making becomes reactive, struggling to address the underlying causes of social change and potentially exacerbating existing tensions. A more sensitive and comprehensive system is therefore necessary, one capable of identifying not just what is being said, but how it is being said, and what those shifts signify for broader social dynamics.

The cumulative distribution of documents across filtering layers (<span class="katex-eq" data-katex-display="false">L_0</span> through <span class="katex-eq" data-katex-display="false">L_8</span>) reveals distinct patterns for #MeToo News (purple), #MeToo Reddit (pink), BLM News (teal), and BLM Reddit (cyan), with annotations at layer <span class="katex-eq" data-katex-display="false">L_5</span> highlighting the document sets used for analysis.
The cumulative distribution of documents across filtering layers (L_0 through L_8) reveals distinct patterns for #MeToo News (purple), #MeToo Reddit (pink), BLM News (teal), and BLM Reddit (cyan), with annotations at layer L_5 highlighting the document sets used for analysis.

SMART: An Ecosystem for Real-Time Discourse Analysis

The SMART system functions as a data aggregation platform for monitoring social movements aligned with the Sustainable Development Goals (SDGs). It achieves this by collecting data from a variety of publicly available online sources, including traditional news media outlets and social media platforms such as Reddit. This multi-source approach aims to provide a comprehensive view of discourse related to SDG-supporting movements, overcoming the limitations of relying on single information streams. Data is continuously acquired and integrated to enable real-time tracking of movement activity, framing, and public perception. The system is designed to be scalable, allowing for the incorporation of additional data sources and the analysis of a growing volume of online content.

Multi-Layer Filtering within the SMART system operates as a sequential process to identify and isolate online discourse related to social movements. Initial layers utilize broad keyword searches and source restrictions to capture a large volume of potentially relevant data. Subsequent layers apply increasingly specific criteria, including topic modeling, hashtag analysis, and the exclusion of unrelated content, to reduce noise and enhance precision. This progressive refinement ensures that the system focuses on content demonstrably linked to social movement activity, minimizing the inclusion of tangential or irrelevant online conversations. The filtering process is parameterized to allow adjustment of sensitivity and recall based on specific research objectives and data characteristics.

Data acquisition for the SMART system relies on automated tools to continuously collect relevant online content. The WorldNewsAPI is utilized to gather current news articles from a wide range of sources, providing a broad overview of global events. Complementing this, the PRAW (Python Reddit API Wrapper) library is employed to access and process data from Reddit, specifically focusing on discussions and posts within relevant communities. These tools operate continuously, ensuring a consistent and updated stream of textual data is available for subsequent Natural Language Processing and analysis within the SMART system.

Natural Language Processing (NLP) forms the analytical core of the SMART system, employing several techniques to interpret online discourse. Specifically, the system utilizes Keyword Extraction to identify central themes and topics within text data, enabling the categorization of content related to social movements. Complementing this, Emotion Analysis, also known as sentiment analysis, determines the affective tone expressed in online conversations, classifying text as positive, negative, or neutral. These NLP-derived insights – thematic prevalence and emotional valence – are then used to characterize the nature and intensity of online discussions surrounding Sustainable Development Goal-related movements, providing quantifiable metrics for tracking and understanding public engagement.

The SMART system integrates perception, planning, and control modules to enable autonomous robotic manipulation.
The SMART system integrates perception, planning, and control modules to enable autonomous robotic manipulation.

Echoes of the Past: Uncovering Historical Trends and Emotional Landscapes

Retrospective analysis within SMART involves the systematic examination of historical discourse data to identify temporal patterns associated with social movements. This process utilizes time-series data extracted from online platforms to chart the evolution of conversations, enabling researchers to observe shifts in topic focus, framing, and intensity over defined periods. By analyzing discourse volume and content before, during, and after key political events, SMART can reveal how external occurrences correlate with changes in online conversation. The methodology facilitates the identification of trends, such as escalating or diminishing interest in specific issues, the emergence of new narratives, and alterations in the emotional tone of the discourse, providing a granular understanding of how conversations unfold over time.

The analysis of online discourse within SMART is directly correlated with identified key political events to determine external influences on conversation patterns. By contextualizing online activity – encompassing metrics like discourse volume and emotional intensity – with these events, researchers can move beyond simply observing what is being said to understanding why specific conversations are occurring or changing. This approach allows for the quantification of the relationship between real-world occurrences and online engagement, enabling the assessment of how external factors drive or suppress certain narratives and sentiments within social movements. The system identifies these key events and then analyzes the preceding and subsequent discourse to establish a temporal link and measure the magnitude of the effect.

Emotion Analysis within the system utilizes computational linguistics to quantify the emotional intensity expressed in online text data. This process assigns numerical values to different emotional categories – such as anger, fear, joy, and sadness – based on the lexical content and contextual cues present in the discourse. The resulting emotional scores are then aggregated to provide a measure of the overall sentiment driving social movement conversations. Furthermore, these emotion-derived metrics directly inform the measurement of Discourse Volume; periods of heightened emotional expression frequently correlate with increased online activity, allowing for a more nuanced understanding of participation and engagement than simple post counts alone.

ChromaDB functions as a vector database integral to SMART’s semantic search capabilities. It stores text embeddings – high-dimensional vector representations of text – generated from the analyzed discourse. These embeddings capture the semantic meaning of text, enabling the system to perform similarity searches beyond simple keyword matching. By indexing these vectors, ChromaDB facilitates efficient retrieval of relevant content, even when the search query doesn’t explicitly contain the same terms as the stored text. This is crucial for identifying nuanced connections and patterns within large datasets of social movement discourse and allows for rapid analysis of semantic trends.

Analysis of discourse volume surrounding key political events demonstrates platform-specific responses. News media coverage exhibited a 78.1% increase during these periods, while concurrent activity on Reddit showed a decrease, as evidenced by a Cohen’s d of -0.51 for the #MeToo movement. Further investigation into the Black Lives Matter movement revealed a large effect size (Cohen’s d = 1.17) for increased news coverage during similar key events, indicating a substantial difference in how these platforms respond to and amplify discourse surrounding politically charged topics. These findings suggest that news media and Reddit function differently in the context of social movements, with news media demonstrating a tendency to increase coverage and Reddit exhibiting a decrease in activity during periods of heightened political awareness.

Analysis of Reddit discourse surrounding the #MeToo movement, encompassing 36 identified key political events, revealed statistically significant changes in conversation volume for 20 of those events (p < 0.001). This indicates that a substantial proportion of externally defined events correlated with measurable shifts in online conversation pertaining to #MeToo on the Reddit platform. The statistical significance, as determined by a p-value below 0.001, suggests these volume changes were unlikely due to random chance and represent a genuine response to the events analyzed.

Analysis of key political events reveals that discourse volume either anticipates events, with higher pre-event volume than post-event volume (blue points), or reacts to them, showing the opposite pattern (orange points), as indicated by deviation from the equality line (diagonal).
Analysis of key political events reveals that discourse volume either anticipates events, with higher pre-event volume than post-event volume (blue points), or reacts to them, showing the opposite pattern (orange points), as indicated by deviation from the equality line (diagonal).

The Looming Horizon: Forecasting the Future of Social Movements

SMART’s forecasting capabilities center on a robust analytical component that synthesizes both historical data and current, real-time information to model the likely evolution of social movement discussions. This isn’t simply tracking what has been said, but rather building predictive models based on patterns identified within vast datasets of online conversation. By analyzing the language, sentiment, and network structures of past movements, alongside immediate indicators like trending topics and shifts in emotional intensity, the system attempts to anticipate future discourse – identifying emerging narratives, potential escalations, and key inflection points. The power lies in recognizing that social movements aren’t random occurrences; they follow discernible patterns that, when understood, can offer valuable foresight into their trajectory and potential impact.

The predictive power of SMART’s forecasting analytics extends beyond simple observation, enabling stakeholders to move from reactive responses to proactive strategies. By identifying nascent issues as they gain traction in online discourse, the system facilitates early intervention, allowing organizations and authorities to address concerns before they escalate into widespread conflict. This capability isn’t limited to crisis management; it also informs the development of targeted communication strategies designed to shape public perception and foster constructive dialogue. Essentially, stakeholders can leverage these insights to anticipate potential flashpoints, preemptively mitigate negative outcomes, and ultimately guide social movements toward more peaceful and productive resolutions, fostering a more informed and responsive approach to societal shifts.

The system analyzes shifts in both the quantity and emotional charge of online conversations surrounding a social issue to pinpoint moments poised for significant change. Increases in discourse volume, coupled with rising emotional intensity – measured through natural language processing of textual data – can signal an approaching “tipping point,” where a previously niche concern rapidly gains widespread attention and potentially escalates into broader action. This proactive identification allows stakeholders to move beyond reactive responses and instead implement targeted interventions – such as strategic communication campaigns or resource allocation – designed to shape the trajectory of the discourse and mitigate potential negative outcomes, or to amplify positive momentum where appropriate. By recognizing these critical junctures, the system facilitates a shift from simply observing social movements to actively engaging with – and potentially influencing – their development.

The analytical framework underpinning SMART demonstrates versatility through its application to diverse social movements, notably including `#MeToo` and Black Lives Matter. This isn’t simply about retrospective analysis; the system dissects the evolving discourse within these movements, mapping shifts in key themes, influential voices, and emotional resonance over time. By identifying patterns in how these movements gain traction, respond to events, and interact with opposing viewpoints, researchers can gain nuanced insights into the underlying mechanisms driving collective action. This deeper understanding extends beyond specific instances, providing a comparative lens for analyzing the lifecycle of various social movements and potentially predicting future trajectories based on shared characteristics and contextual factors. The adaptability of the system promises a more comprehensive and predictive approach to the study of social change itself.

Analysis of key political events (KPEs) reveals that anticipatory emotional responses (above the diagonal) differ from reactive ones (below) across domestic policy (green), elections (red), and foreign policy (blue), with statistically significant differences highlighted by the borders.
Analysis of key political events (KPEs) reveals that anticipatory emotional responses (above the diagonal) differ from reactive ones (below) across domestic policy (green), elections (red), and foreign policy (blue), with statistically significant differences highlighted by the borders.

The development of SMART, as detailed in the research, feels less like construction and more like tending a garden. The tool doesn’t build understanding of social movements; it reveals patterns already present in the ecosystem of online discourse. It’s a humbling process, acknowledging that any attempt to quantify something as complex as #MeToo or #BlackLivesMatter is, inherently, a prophecy of simplification. As Donald Davies observed, “A system is only as good as its assumptions.” SMART’s value lies not in providing definitive answers, but in illuminating the limitations of those assumptions, forcing a recognition that volume alone doesn’t capture the nuance of a movement’s evolution across platforms.

What’s Next?

The pursuit of quantifying social movements, as demonstrated by tools like SMART, resembles less the construction of a precise instrument and more the tending of a garden. One does not build understanding; one cultivates it. The tool itself is merely the trellis. This work reveals, with increasing clarity, that the volume of discourse is a superficial metric. It is not the quantity of voices that matters, but the subtle shifts in their resonance, the patterns of agreement and divergence that bloom – or wither – in response to events. To mistake the noise for the signal is to chart the weather, not the climate.

The limitations lie not in the algorithms, but in the inherent complexity of the systems they attempt to model. A truly robust analysis demands a move beyond event detection – a focus on the relationships between events, and the way those relationships are framed and reframed across different platforms. Resilience lies not in isolating components, but in forgiveness between them – in acknowledging that every model is a simplification, and every prediction carries the seeds of its own failure.

Future work should embrace the messy, ambiguous nature of social phenomena. The goal is not to predict movements, but to understand the conditions that allow them to flourish – or fade. This requires a willingness to abandon the illusion of control, and to accept that the most valuable insights often emerge from the unexpected.


Original article: https://arxiv.org/pdf/2601.20986.pdf

Contact the author: https://www.linkedin.com/in/avetisyan/

See also:

2026-01-31 02:38