Smarter Seismic Insights: Bridging Expert Knowledge and Machine Learning

Author: Denis Avetisyan


A new framework leverages domain expertise to improve the accuracy and interpretability of seismic event classification, even with incomplete data.

The distributions of extracted score features, conditioned on class labels within simulated environments, demonstrate how a feature summary - constructed from <span class="katex-eq" data-katex-display="false">\ell\ell</span>-based scores - characterizes variations in these settings.
The distributions of extracted score features, conditioned on class labels within simulated environments, demonstrate how a feature summary – constructed from \ell\ell-based scores – characterizes variations in these settings.

This work introduces expert-guided, class-conditional goodness-of-fit scores for interpretable classification with informative missingness, demonstrated through application to seismic monitoring for the Comprehensive Nuclear-Test-Ban Treaty Organization.

Classification problems are often hampered by missing data and a need for transparent decision-making, particularly when incorporating domain expertise. This is addressed in ‘Expert-Guided Class-Conditional Goodness-of-Fit Scores for Interpretable Classification with Informative Missingness: An Application to Seismic Monitoring’, which introduces a novel framework that leverages expert knowledge through class-conditional models to construct interpretable goodness-of-fit features. The resulting method demonstrably improves classification accuracy-even with limited training data-and offers enhanced transparency, as shown in the context of seismic event monitoring for the Comprehensive Nuclear-Test-Ban Treaty Organization. Could this hybrid approach, combining expert guidance with data-driven learning, offer a broadly applicable solution for interpretable machine learning in other high-stakes domains?


Emergent Order from Seismic Complexity

The Comprehensive Nuclear-Test-Ban Treaty Organization (CTBTO) depends on the reliable identification of seismic events to verify adherence to the treaty, but automated processing of global monitoring data, resulting in the initial Standard Event List 1 (SEL1), isn’t foolproof. While algorithms efficiently scan for signals, a significant portion of these detections require expert human review to confirm their authenticity and characteristics. This manual validation is necessary because natural earthquakes, explosions, and even industrial blasts can generate similar seismic waves, creating ambiguity for the automated systems. The need for this secondary assessment introduces potential delays in confirming events and highlights the ongoing challenge of balancing rapid detection with unwavering accuracy in a complex global monitoring network.

Currently, confirming a potential seismic event often depends on a comparison with the Late Event Bulletin (LEB), a compilation of events painstakingly reviewed and curated by human analysts. This established method, while thorough, introduces unavoidable delays as analysts must examine data after initial automatic detections. More critically, the manual nature of LEB creation inherently introduces subjectivity; differing interpretations of complex seismic signals can lead to discrepancies in event validation. This reliance on human judgment, while currently necessary, limits the speed and scalability of the validation process, particularly as the volume of globally recorded seismic data continues to increase, demanding more efficient and objective approaches to event confirmation.

Distinguishing authentic seismic events from background noise presents a significant hurdle in monitoring global activity. Seismic data is inherently complex, not simply due to the sheer volume collected by the International Monitoring System, but also because of the Earth’s natural variability and the multitude of sources that contribute to ground motion. Signals from earthquakes, explosions, and even ocean waves can overlap and interfere with one another, making it difficult to isolate and accurately characterize each event. Further complicating matters are variations in geological structures and propagation paths, which alter signals as they travel through the Earth, potentially masking or distorting their characteristics. Consequently, automated systems struggle to consistently differentiate between genuine events and spurious signals, necessitating ongoing efforts to refine algorithms and incorporate more sophisticated data analysis techniques.

Addressing the limitations of current seismic event validation necessitates a shift toward more sophisticated analytical techniques. Researchers are actively exploring methods to integrate diverse data streams – including infrasound, hydroacoustic recordings, and even atmospheric observations – to create a more holistic picture of potential events. Machine learning algorithms, trained on extensive datasets of known events and noise patterns, offer the potential to automate much of the validation process, reducing reliance on manual review and accelerating the identification of genuine signals. These innovative approaches not only promise to enhance the accuracy of event detection but also to improve the timeliness of validation, allowing for more rapid and informed decision-making regarding compliance with the Comprehensive Nuclear-Test-Ban Treaty.

Calibration plots for station-level detection models on the SEL1 dataset reveal that both Weighted Reliability Assessment (<span class="katex-eq" data-katex-display="false">WRA</span>) and Brier Reliability Threshold Regression (<span class="katex-eq" data-katex-display="false">BRTR</span>) generally align predicted detection probabilities with observed frequencies for both valid and invalid SEL1 events, though deviations from perfect calibration (dashed diagonal) are present.
Calibration plots for station-level detection models on the SEL1 dataset reveal that both Weighted Reliability Assessment (WRA) and Brier Reliability Threshold Regression (BRTR) generally align predicted detection probabilities with observed frequencies for both valid and invalid SEL1 events, though deviations from perfect calibration (dashed diagonal) are present.

Guiding the System: An Expert’s Template

The Expert-Guided Model functions as a synthetic template representing anticipated seismic waveforms, derived from established geophysical principles and historical data. This model isn’t a single waveform, but rather a parameterized system capable of generating a range of expected signals based on adjustable inputs reflecting source mechanisms, propagation paths, and local geological conditions. Observed seismic events are then evaluated by comparing their characteristics – amplitude, frequency content, arrival times – against the outputs of this model, providing a structured basis for assessment beyond simple signal detection. The model incorporates expert knowledge to define plausible signal characteristics, thereby reducing false positives and improving the reliability of event validation.

The Expert-Guided Model utilizes adjustable Model Parameters to define its behavior and facilitate adaptation to varying conditions. These parameters encompass characteristics such as expected signal amplitude, frequency content, waveform shape, and relative phase arrival times. By modifying these values, the model can be tuned to reflect the specific properties of different seismic environments – including variations in geological structure, noise levels, and source mechanisms. Furthermore, adjustments to these parameters allow the model to accommodate diverse signal characteristics arising from different event types, such as earthquakes, explosions, or induced seismicity, thereby enhancing its ability to accurately represent and validate observed seismic data.

Model-Fit Score Features are a set of quantifiable metrics derived from the comparison of observed seismic data with predictions generated by the Expert-Guided Model. These features include, but are not limited to, the residual error between observed and predicted waveforms, the correlation coefficient quantifying waveform similarity, and spectral mismatch metrics. Each feature is calculated for individual waveforms and then aggregated to produce an overall model-fit score. This score provides a numerical assessment of event validity, with higher scores indicating a stronger alignment between the observed data and the expected seismic signal characteristics defined by the model. These features are then used in downstream analysis for event classification and validation, allowing for a data-driven approach to evaluating seismic event authenticity.

Traditional seismic event detection relies on exceeding predefined thresholds, often leading to both false positives and missed events. Focusing on model fit, however, shifts the emphasis from simple signal amplitude to the overall consistency between observed waveforms and an expected signal, as defined by a pre-established model incorporating prior knowledge. This approach allows for the characterization of event attributes beyond mere presence or absence, including source mechanism, location, and magnitude, even for signals that might not trigger traditional detection algorithms. Quantifying the degree of model fit provides a more robust and informative assessment, enabling discrimination between genuine seismic events and noise or non-seismic signals with similar amplitudes.

Histograms of <span class="katex-eq" data-katex-display="false">\log(a/T)</span> and arrival-time residuals reveal distinct distributions for valid and invalid events, demonstrating the effectiveness of the expert-guided residual models in differentiating between them.
Histograms of \log(a/T) and arrival-time residuals reveal distinct distributions for valid and invalid events, demonstrating the effectiveness of the expert-guided residual models in differentiating between them.

The Value of Absence: Informative Missingness

Non-detection of seismic signals at specific stations does not necessarily indicate sensor malfunction or unusable data; rather, it provides constraints on potential event locations and source characteristics. The absence of a detectable signal, given a station’s sensitivity and the expected signal strength based on preliminary location estimates, can effectively rule out certain areas as the event origin. This is particularly valuable in scenarios with sparse station coverage or when dealing with low-magnitude events. Incorporating non-detections into the analysis leverages the known spatial relationships between stations and the expected attenuation of seismic waves, ultimately improving the precision and accuracy of event localization and characterization. The pattern of non-detections, therefore, functions as an additional data point within the broader seismic analysis workflow.

The principle of informative missingness posits that the lack of a seismic signal at a given station is not random error, but potentially carries data regarding the source event and propagation path. Specifically, patterns of non-detection, when considered in relation to station geometry and known seismic velocities, can constrain the possible source location and magnitude. A station’s failure to record an event, given its expected sensitivity and the event’s characteristics, can indicate the event occurred outside its detection range, was attenuated by intervening materials, or originated from a direction creating a null in the station’s response; therefore, incorporating these non-detections into analysis improves model resolution and reduces ambiguity in event characterization.

Detection patterns, encompassing both detections and non-detections across a seismic network, are integrated directly into the model-fit assessment process. This incorporation moves beyond simply evaluating model agreement with observed arrival times; it assesses the probability of not observing a signal at stations where none was recorded. By treating non-detections as probabilistic outcomes dependent on event location, magnitude, and station sensitivity, the method effectively utilizes all available data. This approach significantly enhances the discriminatory power of the model, allowing for more accurate event localization and characterization, and reducing the potential for ambiguity when dealing with events at the limits of network detection capabilities.

Residual analysis assesses the discrepancy between observed seismic data and model predictions, extending beyond simply evaluating detected signals to include instances of non-detection. This process quantifies the difference between the expected signal amplitude (including zero for non-detections) and the actual observed amplitude at each station. By analyzing the distribution and patterns of these residuals, the model parameters are iteratively refined to minimize the overall residual error. Stations consistently reporting non-detections, when predicted, contribute to a more accurate assessment of the model’s predictive power and, consequently, to improved estimates of event location and magnitude; systematic deviations in residuals, even from non-detects, indicate potential model inadequacies and guide parameter adjustments.

Score-based representations consistently outperformed baselines across simulation scenarios, demonstrating improved performance-measured by AUROC and TNR at TPR=0.95-with increasing training sample size <span class="katex-eq" data-katex-display="false">n</span> and under both low and high missingness-informativeness levels λ, as indicated by consistently positive paired differences.
Score-based representations consistently outperformed baselines across simulation scenarios, demonstrating improved performance-measured by AUROC and TNR at TPR=0.95-with increasing training sample size n and under both low and high missingness-informativeness levels λ, as indicated by consistently positive paired differences.

Towards Automated Validation: A Stronger Signal

The ability to reliably distinguish between genuine seismic events and spurious signals-such as those caused by industrial activity or local disturbances-is crucial for effective monitoring. Recent research demonstrates a substantial improvement in this discrimination through the incorporation of Model-Fit Score Features into validation workflows. These features, derived from the quality of the model’s fit to the observed data, provide a nuanced assessment beyond traditional waveform characteristics. By quantifying how well a given model explains the seismic signal, these scores serve as powerful indicators of event validity, allowing for more accurate classification. This approach consistently outperformed standard methods, suggesting that evaluating model behavior itself offers a valuable, and previously underutilized, dimension for seismic event validation and represents a key step toward fully automated analysis.

The identification of genuine seismic events amidst background noise and spurious signals is crucial for accurate monitoring, and a newly developed, expert-guided model demonstrates a significant advancement in this area. Achieving an Area Under the Receiver Operating Characteristic curve (AUROC) of 0.925 using the LR-decomp model, the system substantially outperforms conventional machine learning techniques like Logistic Regression and Random Forest. This high AUROC score indicates the model’s exceptional ability to discriminate between valid and invalid events, suggesting a marked improvement in both sensitivity and specificity compared to standard approaches. The enhanced performance isn’t simply a marginal gain; it represents a substantial leap in the automated detection of true seismic signals, offering the potential for more reliable and efficient monitoring capabilities.

The achieved accuracy in discriminating valid seismic events translates directly into practical benefits for monitoring workflows. A True Negative Rate of 0.717 at a 95% True Positive Rate, as demonstrated by the LR-decomp model, signifies a substantial reduction in the number of events requiring time-consuming manual review. This heightened efficiency allows analysts to focus on more complex or ambiguous signals, accelerating the overall validation process and improving the speed at which critical data can be assessed. Consequently, the automated framework not only enhances the reliability of seismic monitoring but also optimizes resource allocation, ultimately contributing to a more responsive and effective global surveillance system.

A newly developed automated validation framework promises to significantly bolster the Comprehensive Nuclear-Test-Ban Treaty Organization’s (CTBTO) capacity for monitoring potential nuclear explosions. This system leverages advanced model-fit score features to discriminate between genuine seismic events and spurious signals, demonstrably improving accuracy compared to traditional methods like logistic regression and random forest algorithms. Notably, the LR-decomp model within this framework achieves an area under the receiver operating characteristic curve (AUROC) of 0.925, and exhibits superior performance, particularly when training data is scarce – a common challenge in global seismic monitoring. This enhanced capability translates directly into a reduction in the need for time-consuming manual review of events, allowing analysts to focus on critical investigations and ultimately strengthening the CTBTO’s ability to maintain a robust and efficient verification regime.

A random forest model incorporating the proposed features outperforms baseline logistic models and achieves the highest accuracy, as demonstrated by improved ROC curves on the test set.
A random forest model incorporating the proposed features outperforms baseline logistic models and achieves the highest accuracy, as demonstrated by improved ROC curves on the test set.

The research details a system where localized rules – in this case, expert knowledge integrated with machine learning algorithms – give rise to a global pattern of improved seismic event classification. This mirrors the sentiment expressed by Albert Camus: “In the midst of winter, I found there was, within me, an invincible summer.” The ‘invincible summer’ represents the robust performance achieved not through overarching control, but through the synergistic interplay of local expertise and data-driven insights. The framework doesn’t impose a singular solution; instead, it allows patterns to emerge from the interaction of these components, much like how the system handles informative missingness by adapting to the specifics of each data point, fostering a more resilient and interpretable model.

Beyond the Signal

The pursuit of robust classification, even – perhaps especially – with incomplete data, reveals a recurring truth: the system’s resilience isn’t built so much as it emerges. This work, by integrating expert guidance, does not impose order, but rather nudges the system towards configurations that are more likely to reveal inherent structure. The emphasis on class-conditional goodness-of-fit is a subtle acknowledgement that ‘correctness’ isn’t a global property, but a local assessment of consistency. The CTBTO application, while valuable, serves as a proof-of-concept; the true challenge lies in generalizing this approach to domains where expert knowledge is less codified, or where the very definition of ‘signal’ is contested.

Future work will likely focus on automating the elicitation of expert knowledge – a paradoxical endeavor, given the inherently subjective nature of such insights. A more fruitful direction, however, may be to treat expert knowledge not as a definitive input, but as a constraint within a larger evolutionary process. The system learns, but within boundaries defined by human intuition. This is not control, but influence-a recognition that system structure is stronger than individual directives.

The persistent issue of informative missingness hints at a deeper point. Data is never truly ‘missing’; it is merely unobserved. The system doesn’t need to correct for missing values, but to infer them from the relationships that already exist. Robustness emerges, it cannot be designed. The goal, then, is not to build perfect classifiers, but to create systems that are gracefully adaptive, capable of revealing patterns even in the face of uncertainty and incompleteness.


Original article: https://arxiv.org/pdf/2604.14809.pdf

Contact the author: https://www.linkedin.com/in/avetisyan/

See also:

2026-04-20 02:13