Stealing the Graph: Replicating Powerful AI with Public Data

Author: Denis Avetisyan

New research reveals that sophisticated graph-based AI models are surprisingly susceptible to attacks that allow attackers to recreate their capabilities using only publicly available information.

Graph-specific extraction attacks demonstrate a vulnerability wherein targeted extraction of structural information compromises network privacy, revealing details about the relationships between nodes rather than simply the nodes themselves, as evidenced by performance variations across different graph structures.

A systematic study demonstrates successful model extraction via embedding regression, enabling replication of zero-shot inference from large Graph Foundation Models without access to original training data or parameters.

Despite advances in graph machine learning and the emergence of powerful Graph Foundation Models (GFMs), the security implications of scaling these models remain largely unexplored. This paper, ‘A Systematic Study of Model Extraction Attacks on Graph Foundation Models’, presents the first systematic investigation into model extraction attacks targeting GFMs, revealing that an attacker can effectively replicate a victim model’s functionality-including zero-shot inference-by training a surrogate encoder on publicly accessible graph embeddings. Through six practical attack scenarios, we demonstrate this extraction is achievable with minimal query access and even without contrastive pretraining data, approximating the victim model’s performance at a fraction of the original training cost. These findings raise critical questions about the deployment of large-scale graph learning systems and highlight the urgent need for robust, deployment-aware security defenses.

Beyond Euclidean Space: The Relational Imperative

Conventional machine learning algorithms often treat data points as independent entities, prioritizing individual attributes for analysis. However, many real-world datasets are fundamentally relational – their meaning and predictive power reside not just in the characteristics of individual items, but in the connections between them. Consider social networks, molecular structures, or knowledge graphs; the relationship between users, atoms, or concepts is often as crucial, if not more so, than the attributes of each individual node. Traditional methods struggle to effectively capture these complex interdependencies, frequently requiring extensive feature engineering to represent relationships as additional attributes – a process that is both cumbersome and often loses vital information. This limitation hinders performance in tasks where understanding connections is paramount, creating a significant need for approaches designed to natively handle relational data.

Graph Machine Learning represents a significant departure from traditional methods by prioritizing the connections between data points, rather than treating them as isolated entities. This approach acknowledges that much of the world’s information is inherently relational – social networks, molecular structures, knowledge graphs, and transportation systems all rely on interconnectedness. By directly modeling these relationships as a graph – a collection of nodes and edges – GML algorithms can leverage the network’s structure to enhance predictive power and uncover hidden patterns. Consequently, this paradigm shift unlocks analytical capabilities previously inaccessible to conventional machine learning, allowing for tasks like link prediction, node classification, and graph clustering with improved accuracy and efficiency. The ability to reason about relationships, rather than solely attributes, positions GML as a powerful tool for tackling complex problems across diverse domains, from drug discovery to fraud detection and beyond.

The Graph Feature Mapping (GFM) approach pretrains a graph encoder with contrastive learning to map subgraphs into an embedding space aligned with textual label embeddings, enabling zero-shot node classification based on similarity scores.

Graph Neural Networks: Encoding Relational Structure

Graph Neural Networks (GNNs) represent a class of deep learning models specifically designed to operate on graph-structured data. Unlike traditional neural networks that assume data is arranged in Euclidean space, GNNs directly incorporate information about the relationships between data points, represented as edges in a graph. This allows for feature learning that considers not only the attributes of individual nodes ($x_i$) but also the network topology – how nodes are connected. The core principle involves iteratively aggregating feature information from a node’s neighbors, transforming and combining these signals to create a new node representation that encapsulates both intrinsic node properties and relational context. This process enables GNNs to learn complex patterns and dependencies inherent in graph data, which is crucial for applications where relationships are paramount.

Graph Convolutional Networks (GCNs) utilize spectral graph theory to define convolution operations directly on graph structures, effectively smoothing node features based on graph connectivity. Graph Attention Networks (GATs) introduce an attention mechanism to weight the importance of neighboring nodes during aggregation, allowing the model to focus on the most relevant features. GraphSAGE (Sample and Aggregate) employs a neighborhood sampling approach to enable inductive learning on large graphs, and supports various aggregator functions – mean, max-pooling, or LSTM – to combine information from sampled neighbors. Each method differs in how it calculates node embeddings based on the graph’s structure and node features, impacting performance on different graph-based tasks and scalability to varying graph sizes.

Graph convolutional networks (GCNs), graph attention networks (GATs), and GraphSAGE represent core architectures for encoding relational information in graph data. These models achieve this by iteratively propagating and aggregating feature information from a node’s neighbors, effectively capturing dependencies beyond immediate connections. The process involves a message-passing mechanism where each node receives information from its neighbors, which is then combined with the node’s own features. Different architectures vary in how these messages are constructed and aggregated; for example, GCNs use a weighted average based on graph adjacency, while GATs employ attention mechanisms to prioritize important neighbors. This aggregation process is repeated for multiple layers, allowing the network to capture increasingly complex, multi-hop dependencies within the graph structure and generate node embeddings that reflect both node attributes and relational context.

Graph Foundation Models: Generalization Through Scale

Graph Foundation Models (GFMs) achieve generalization by pre-training on extensive graph datasets, often consisting of billions of nodes and edges. This pre-training utilizes contrastive learning, a technique where the model learns to distinguish between similar and dissimilar graph structures or node embeddings. By maximizing the agreement between different views of the same graph-created through techniques like node masking or edge perturbation-and minimizing agreement between unrelated graphs, GFMs develop robust representations. These learned representations capture inherent relational information present in the large-scale data, enabling the model to adapt to downstream tasks with minimal or no task-specific training. The scale of both the graph data and the model parameters are critical factors in achieving this enhanced generalization capability, mirroring the success of large language models in natural language processing.

Effective Graph Foundation Models (GFMs) necessitate robust graph encoders to transform graph structures into meaningful vector representations. These encoders, typically based on Graph Neural Networks (GNNs), capture node embeddings and graph-level features crucial for downstream tasks. Increasingly, GFMs integrate text encoders, such as those utilizing Transformer architectures, to facilitate multimodal understanding. This integration allows the model to process and correlate information from both graph-structured data and associated textual descriptions, improving performance in tasks requiring reasoning across multiple modalities. The combination enables GFMs to leverage textual context for enhanced node and edge feature representation, and to perform tasks like knowledge graph completion and relation extraction with greater accuracy.

Graph Foundation Models (GFMs) demonstrate the capability of zero-shot inference by generalizing learned representations from extensive graph data to unseen tasks without requiring task-specific training. This is achieved through pre-training on large-scale graphs using contrastive learning objectives, which enable the model to learn robust node and graph embeddings. Consequently, GFMs can effectively perform predictions or classifications on novel graphs and tasks simply by leveraging these pre-learned representations, evaluating input graphs against the established embedding space without any gradient updates or parameter adjustments specific to the target task. The performance on these unseen tasks is directly correlated with the scale and diversity of the pre-training data and the effectiveness of the contrastive learning methodology.

Six distinct attack scenarios demonstrate the vulnerability of generative foundation models to model extraction.

Securing Graph Foundation Models: The Specter of Extraction

Graph Foundation Models (GFMs), despite their powerful capabilities, are vulnerable to model extraction attacks, which present a substantial security risk. These attacks allow adversaries to reconstruct a significant portion of the original model’s functionality by querying it multiple times. The reconstructed, or “surrogate,” model can then be used for malicious purposes, potentially bypassing access controls or replicating sensitive capabilities without requiring the substantial resources needed to train the original GFM. This poses a unique challenge as the extracted model can achieve performance comparable to the victim model, even with a drastically reduced parameter count – effectively democratizing access to advanced graph intelligence while simultaneously undermining the intellectual property and security of the original model developer.

Model extraction attacks frequently utilize embedding regression as a core technique, focusing on replicating the learned representations within a Graph Foundation Model (GFM). This process involves training a smaller, surrogate model – often another graph encoder – to predict the embeddings generated by the target GFM for a given set of nodes. By leveraging supervised learning, the attacker aims to distill the knowledge encoded in the victim model’s embedding space into a more manageable form. The effectiveness of this approach stems from the fact that these embeddings capture crucial structural and feature information about the graph, allowing the surrogate model to approximate the victim’s functionality without needing access to its internal parameters or architecture. Successful embedding regression can thus enable adversaries to reconstruct a functional proxy of the target GFM, posing a significant threat to intellectual property and data privacy.

Recent research highlights the concerning efficiency with which a functional replica of a substantial graph foundation model (GFM) can be constructed by an attacker. Through supervised embedding regression – training a smaller “surrogate” graph encoder – it is now possible to replicate a large GFM’s zero-shot performance with remarkably limited resources. Experiments demonstrate that this process requires a mere five minutes when targeting a GPS model, and under three minutes for GAT or GCN architectures. This swift reconstruction is achieved by learning to mimic the embeddings produced by the victim model, effectively distilling its knowledge into a significantly smaller network, offering a substantial advantage to potential adversaries seeking to exploit or reproduce complex graph-based intelligence.

Recent studies demonstrate the feasibility of creating substantially smaller “surrogate” graph neural networks that closely mimic the performance of much larger foundation models. Specifically, researchers have successfully trained models with just 11 million parameters (GPS), 2.9 million (GAT), and 0.95 million (GCN) parameters to replicate the capabilities of a victim model containing 128 million parameters. This significant reduction in model size is achieved through techniques like embedding regression, allowing for efficient knowledge transfer without substantial performance loss. The resulting surrogate models offer a compelling trade-off between computational cost and accuracy, potentially enabling wider accessibility and deployment of graph foundation models in resource-constrained environments.

Despite the successful reconstruction of graph foundation model functionality through embedding regression, the resulting performance degradation is remarkably small. Evaluations across diverse datasets reveal an average accuracy drop of only 0.15% for Graph Attention Networks (GAT) and less than 0.66% for Graph Convolutional Networks (GCN). This minimal loss suggests that adversaries can effectively distill a substantial portion of a large model’s knowledge into a significantly smaller surrogate model without incurring a substantial compromise in predictive power. The efficiency with which these attacks can replicate performance-achieving near-victim accuracy with models possessing a fraction of the parameters-highlights the vulnerability of current graph foundation models and underscores the need for robust defense mechanisms.

Mitigating the risk of model extraction attacks on Graph Foundation Models (GFMs) necessitates a multifaceted defense strategy. Researchers are actively exploring approaches that incorporate domain knowledge – leveraging specific characteristics of the graph data to obscure the underlying model – alongside knowledge distillation, a technique where a smaller, more robust model learns to mimic the behavior of the larger GFM. Further protections involve the application of differential privacy, adding carefully calibrated noise to the model’s outputs to prevent sensitive information leakage, and capitalizing on inherent graph properties such as homophily – the tendency of nodes to connect with similar nodes – to create models less susceptible to reconstruction. These defenses, often used in combination, aim to significantly increase the difficulty and cost for adversaries attempting to replicate the functionality of valuable GFMs, ensuring continued intellectual property protection and responsible AI development.

Attacker accuracy and fidelity generally improve with an increasing number of queries, as demonstrated by the GAT attacker.

The study meticulously details how Graph Foundation Models, despite their complexity, yield to replication through embedding regression – a process echoing a fundamental principle of mathematical elegance. As Ken Thompson famously stated, “Software is only ever 99% correct, and that last 1% is what keeps things interesting.” This ‘interesting’ final percentage, in the context of these models, lies in the vulnerability exposed by the attack. The attacker doesn’t need to understand the internal workings, only the observable outputs – the embeddings – to construct a functional equivalent. This mirrors a pure mathematical function; given the input and output, the underlying mechanism, while perhaps complex, becomes almost secondary. The research demonstrates that even powerful models, built on intricate architectures, are bound by the laws of mathematical determinism and can be effectively ‘reverse engineered’ through careful observation and reconstruction.

Beyond Mimicry: Charting a Course for Robust Graph Foundation Models

The demonstrated vulnerability of Graph Foundation Models to extraction via embedding regression is, predictably, not a refutation of existing theory – merely an illustration of its continued relevance. The ability to reconstruct functionality from observed outputs, even without knowledge of internal parameters, speaks to a fundamental limitation: information, once manifested in a sufficiently complete observable state, is inherently replicable. The focus, therefore, shifts from preventing extraction – a Sisyphean task – to designing models where the cost of replication approaches the cost of original construction. Simplicity, in this context, does not imply brevity of code, but rather non-contradiction and logical completeness in the model’s underlying representation.

Future work must move beyond superficial defenses, such as embedding obfuscation, which address symptoms, not causes. A fruitful avenue lies in exploring architectural constraints that intrinsically limit the expressiveness of embeddings, forcing a trade-off between zero-shot generalization and replicability. The current paradigm of simply scaling model size, while yielding impressive empirical results, offers no theoretical guarantees against this type of attack. Indeed, it may exacerbate the problem by providing a richer, more complete observable state.

Ultimately, the field requires a more rigorous mathematical understanding of what constitutes ‘knowledge’ within a graph representation. Until models are built upon provably minimal sufficient conditions for their observed behavior, the threat of functional replication will remain – a constant reminder that elegance, in the truest sense, lies not in complexity, but in the austere beauty of logical necessity.

Original article: https://arxiv.org/pdf/2511.11912.pdf

Contact the author: https://www.linkedin.com/in/avetisyan/

Beyond Euclidean Space: The Relational Imperative

Graph Neural Networks: Encoding Relational Structure

Graph Foundation Models: Generalization Through Scale

Securing Graph Foundation Models: The Specter of Extraction

Beyond Mimicry: Charting a Course for Robust Graph Foundation Models

See also: