Bridging the Gap: Aligning Language Models Without Sharing Data

Author: Denis Avetisyan

A new framework allows independent large language models to collaborate on inference tasks while preserving the privacy of their underlying data and weights.

A novel approach linearly aligns the hidden states of the Qwen and Llama language models, creating a hybrid system where Qwen’s encoding capabilities are leveraged with Llama’s decoding mechanism to generate coherent text without replicating the characteristics of either original model.

HELIX enables cross-silo inference via representation alignment using linear transformations and homomorphic encryption for secure computation.

Despite growing disparities in training data and architectures, large language models increasingly exhibit representational convergence. This phenomenon motivates the work ‘Secure Linear Alignment of Large Language Models’, which introduces HELIX, a privacy-preserving framework enabling cross-silo inference by aligning model representations via linear transformations. By combining this alignment with homomorphic encryption, HELIX achieves sub-second inference while safeguarding client data and models. Could this approach unlock collaborative AI applications previously constrained by security or competitive concerns, and further expand the possibilities for knowledge transfer between independently developed LLMs?

The Challenge of Unified Knowledge

Although natural language processing has seen remarkable progress, the synergistic combination of insights derived from diverse language models presents a persistent obstacle to achieving comprehensive knowledge integration. The field currently faces difficulties in effectively merging the distinct understandings each model possesses, not simply due to conflicting information, but also because of fundamental differences in how they process and represent language. This limitation prevents the creation of AI systems capable of drawing upon a truly broad base of knowledge, hindering advancements in areas requiring nuanced understanding and complex reasoning – effectively stalling the development of more adaptable and robust artificial intelligence that can synthesize information beyond the capabilities of any single model.

The seamless integration of knowledge across diverse language models is frequently hampered by fundamental architectural differences and varying approaches to tokenization. Models trained on disparate structures – some utilizing transformers, others recurrent networks, for instance – process and represent information in incompatible ways, creating barriers to direct knowledge transfer. Furthermore, inconsistencies in how text is broken down into tokens – the basic units of processing – can lead to misinterpretations and a loss of nuanced meaning when attempting to combine insights. This misalignment often results in performance degradation, as models struggle to reconcile conflicting representations and effectively leverage the strengths of their counterparts, highlighting a critical need for standardized approaches or sophisticated translation techniques.

Existing techniques for merging the capabilities of multiple language models often fall short of their potential, largely because simple approaches like ensembling treat each model as a largely undifferentiated contributor. While ensembling can offer modest gains, it fails to capitalize on the specialized knowledge or unique architectural strengths that individual models possess – a model excelling at nuanced sentiment analysis, for instance, isn’t fully utilized if its output is merely averaged with that of a model designed for factual recall. This limitation arises because current methods typically focus on combining predictions at the surface level, rather than deeply integrating the internal representations and reasoning processes that drive each model’s performance. Consequently, valuable information and distinctive capabilities are lost in translation, hindering the development of truly synergistic AI systems that can leverage the best aspects of diverse modeling approaches.

The current inability of diverse language models to seamlessly interact represents a critical bottleneck in the pursuit of genuinely robust and adaptable artificial intelligence. Without effective interoperability, each model operates as a silo of knowledge, limiting the potential for synergistic learning and comprehensive understanding. This fragmentation hinders the development of AI systems capable of generalizing beyond their training data and responding effectively to novel situations, as the combined strengths of multiple models cannot be fully realized. Consequently, progress towards AI that exhibits human-like flexibility and resilience is significantly slowed, demanding innovative solutions to bridge the gap between isolated models and unlock the power of collective intelligence.

Cosine similarity between cross-model generated text and native model outputs, measured using OpenAI’s embedding-001, indicates that high similarity correlates with coherent text generation, while low similarity suggests incoherence.

Harmonizing Representations for Knowledge Transfer

Cross-model alignment addresses the challenge of transferring knowledge between distinct machine learning models by establishing a shared representational space. This is achieved by transforming the feature embeddings generated by each model into a common vector space, allowing for direct comparison and utilization of learned features. The core principle involves mapping the high-dimensional representations – often derived from layers within neural networks – such that semantically similar concepts are located close to each other regardless of the originating model. This facilitates the application of knowledge gained in one model – such as a large language model – to another, potentially smaller or more specialized model, without requiring extensive retraining or fine-tuning of the target model. The effectiveness of this approach hinges on the ability to accurately map and preserve the semantic relationships within the original feature spaces during the transformation process.

Linear transformation techniques, such as those employing weight matrices, are frequently used to project feature spaces from a source model to a target model during cross-model alignment. These transformations aim to minimize the reconstruction error-and therefore information loss-between the original features and their projected counterparts. The process involves learning a linear mapping $f(x) = Wx + b$ , where x represents the source features, W is the weight matrix, b is the bias vector, and f(x) represents the projected features. Optimization strategies, including minimizing the Frobenius norm or employing techniques like Canonical Correlation Analysis (CCA), are used to determine the optimal weight matrix W and bias b, thereby preserving the most salient information during the projection process.

The effectiveness of cross-model alignment, and therefore the quality of knowledge transferred, is directly correlated to the degree of representational similarity between the source and target models. Higher similarity, measured by metrics assessing the correlation of feature representations-such as cosine similarity or Canonical Correlation Analysis (CCA)-indicates that the models encode information in comparable ways. This facilitates a more efficient and accurate transformation of features during alignment, minimizing the loss of semantic content. Conversely, low representational similarity requires more complex transformations and increases the risk of transferring irrelevant or distorted information, negatively impacting downstream task performance. Consequently, assessing representational similarity before alignment is a crucial step in determining the feasibility and potential efficacy of knowledge transfer between models.

Model Stitching builds upon representational alignment by enabling dynamic and context-dependent knowledge transfer between models. Rather than a static, one-time projection of features, Model Stitching techniques allow for the selective combination of representations from multiple models based on input data. This is achieved by learning a gating mechanism or attention weights that determine the contribution of each model’s features to the final output. Consequently, Model Stitching facilitates more complex interactions, enabling models to collaboratively solve tasks by leveraging complementary strengths and adapting to varying input characteristics, exceeding the capabilities of simple linear transformations for knowledge transfer.

Cross-model alignment demonstrates that test loss plateaus around 4,000 training samples, indicating diminishing returns as training data exceeds this threshold.

HELIX: A Foundation for Secure Collaboration

The HELIX framework facilitates Cross-Silo Inference, a method of collaborative machine learning, without requiring the direct exchange of sensitive data between participating parties. This is achieved by enabling computations to be performed directly on encrypted data held locally by each silo. Instead of sharing raw data, each silo trains a model independently and then contributes to a collective inference process while maintaining data confidentiality. This approach mitigates privacy risks associated with centralized data collection and allows for model collaboration without compromising the security of individual datasets. The framework’s architecture is designed to support distributed inference tasks where data remains under the control of its owner throughout the entire process.

HELIX utilizes Homomorphic Encryption (HE) to enable computation directly on encrypted data, thereby preserving data confidentiality throughout the inference process. Specifically, the framework employs the CKKS scheme, a leveled HE scheme optimized for approximate computations on real or complex numbers. CKKS encrypts data into ciphertexts allowing for additions and multiplications to be performed on these ciphertexts without decryption. The result of these operations is also an encrypted ciphertext, which can then be decrypted by an authorized party to reveal the result of the computation. This avoids the need to decrypt data for processing, mitigating the risk of exposure during cross-silo inference and ensuring that sensitive information remains protected.

Secure aggregation is a cryptographic technique integrated into HELIX to facilitate the combination of model predictions computed on encrypted data without revealing individual model outputs. This process involves each participating model encrypting its prediction and submitting it to a central aggregator. The aggregator then applies a homomorphic encryption-compatible summation to these encrypted predictions, producing an encrypted aggregate. This aggregate can then be decrypted by an authorized party to obtain the combined prediction, while the individual model contributions remain confidential. The use of secure aggregation minimizes the risk of data leakage and enhances the overall privacy guarantees of cross-silo inference within the HELIX framework.

HELIX achieves low-latency, sub-second inference speeds by strategically applying encryption only to linear operations within the neural network. This approach significantly reduces computational overhead compared to encrypting all operations, as linear layers constitute a substantial portion of the total computation but are relatively inexpensive to encrypt using Homomorphic Encryption. By focusing encryption on these linear transformations, while performing non-linear operations on decrypted data, HELIX optimizes the trade-off between privacy and performance, enabling efficient cross-model inference without incurring prohibitive latency costs. This selective encryption strategy is crucial for practical deployment in latency-sensitive applications.

Linear alignment within the HELIX framework maintains both classification accuracy and the ability to reliably identify out-of-distribution (OOD) data. This preservation of performance is achieved by focusing alignment on the linear layers of trained models, which represent the majority of parameters and contribute significantly to both tasks. Evaluations demonstrate that aligning only these linear components does not introduce significant performance degradation on standard classification benchmarks, while simultaneously retaining the capability to accurately flag inputs that deviate from the training data distribution. This is crucial for real-world applications where model robustness and the detection of novel or adversarial inputs are paramount.

Evaluation of the HELIX framework indicates a significant correlation between the compatibility of tokenizers used by independently trained models and the quality of generated text. Specifically, an Exact Token Match Rate of 0.898 was observed, indicating a high degree of overlap in tokenized sequences. Further analysis using the Jaccard Index, which measures the similarity between sets of tokens, yielded a score of 0.822, confirming substantial agreement in the vocabulary and tokenization strategies employed across the models tested. These metrics suggest that consistent tokenization is a critical factor in achieving high-quality text generation when performing cross-silo inference with HELIX.

Analysis within the HELIX framework demonstrates a significant degree of shared linear structure amongst independently trained models. This observation is quantitatively supported by Linear Centered Kernel Alignment (CKA) similarity scores, which ranged from 0.595 to 0.881. These values indicate a substantial overlap in the learned linear transformations across different models, suggesting that a common underlying feature space is being captured despite independent training processes and potentially differing architectures or datasets. The consistent presence of this shared structure is a key finding enabling the efficiency of cross-model inference within HELIX.

Inference within the HELIX framework is performed using a Linear Classifier, allowing for efficient computation on the processed, encrypted data. The framework’s efficacy has been demonstrated through validation on a diverse set of benchmark datasets, including MNLI (Multi-Genre Natural Language Inference), MRPC (Microsoft Research Paraphrase Corpus), RTE (Recognizing Textual Entailment), SST-2 (Stanford Sentiment Treebank), AG News, TREC (Text REtrieval Conference), and DBpedia. This multi-dataset validation confirms the broad applicability and robustness of the linear classification approach implemented within HELIX.

A privacy-preserving protocol enables two parties to perform aligned embedding inference, where the client encrypts their data, computes an encrypted cross-covariance with the provider's data, and locally decrypts the alignment weights before encrypting and sending aligned embeddings to the provider for homomorphic classification. — A privacy-preserving protocol enables two parties to perform aligned embedding inference, where the client encrypts their data, computes an encrypted cross-covariance with the provider’s data, and locally decrypts the alignment weights before encrypting and sending aligned embeddings to the provider for homomorphic classification.

Toward a Unified Understanding of Intelligence

The emerging Platonic Representation Hypothesis proposes that large language models, despite differing architectures and training data, ultimately converge on remarkably similar internal representations of concepts and knowledge. This suggests a fundamental, underlying structure to human language and thought that these models are independently discovering. If true, this convergence dramatically simplifies the challenge of AI alignment; instead of needing to individually steer countless unique model ‘personalities’, efforts can focus on influencing a relatively small set of shared, foundational understandings. Researchers theorize that identifying and refining these ‘platonic ideals’ within LLMs could allow for more robust and predictable behavior, making it easier to ensure these powerful systems consistently act in accordance with human values and intentions. This perspective offers a hopeful pathway toward building safe and beneficial artificial intelligence, shifting the focus from controlling individual models to shaping the common ground they all inhabit.

The HELIX framework introduces a novel approach to collaborative artificial intelligence development, allowing multiple parties to contribute to model training and refinement without directly exposing their sensitive datasets. This is achieved through a sophisticated system of encrypted model updates and secure aggregation techniques, ensuring that individual data remains private while still benefiting from the collective knowledge embedded within the combined model. By decentralizing the development process and prioritizing data security, HELIX not only mitigates risks associated with centralized data storage but also unlocks opportunities for broader participation and accelerated innovation in the field of AI, potentially fostering a more inclusive and rapidly evolving ecosystem for machine learning advancements.

The developed framework extends beyond simply securing AI development; it establishes a foundation for advancements in traditionally data-sensitive fields. Federated learning, where models are trained across decentralized datasets without direct data exchange, becomes markedly more viable as HELIX safeguards individual contributions. Similarly, secure data analytics, crucial for industries handling confidential information like healthcare or finance, can benefit from this collaborative environment without compromising data privacy. By enabling joint analysis and model building on distributed, encrypted data, the framework unlocks the potential for broader datasets and more robust insights, fostering innovation while upholding stringent security standards and paving the way for previously inaccessible analytical opportunities.

Continued development centers on refining both the computational cost of encryption protocols and the precision of alignment techniques within the HELIX framework. Current research investigates methods to minimize the performance overhead associated with secure multi-party computation, enabling broader accessibility and scalability. Simultaneously, efforts are underway to extend the framework’s capabilities to accommodate increasingly complex model architectures, such as those incorporating sparse activation functions or transformer variants with greater parameter counts. Successfully addressing these challenges promises to unlock the potential for collaborative AI development across a wider range of applications and datasets, ultimately fostering innovation while preserving data privacy and security.

HELIX mapping effectively aligns client embeddings with those of the data owner, as demonstrated by comparable performance to a baseline using data owner embeddings on an IMDB dataset (averaged over five model pairs).

The pursuit of alignment, as detailed in this work concerning HELIX and cross-model inference, echoes a fundamental principle of efficient thought. One strives not for convoluted complexity, but for the simplest viable representation. Andrey Kolmogorov observed, “The most interesting discoveries often occur at the intersection of different fields.” This framework, uniting cryptography and large language models, exemplifies that very intersection. HELIX achieves secure computation through linear transformations-a testament to the power of reduction. The elegance of aligning representations via this method suggests clarity is indeed the minimum viable kindness, allowing for privacy-preserving inference without sacrificing utility.

Where Do We Go From Here?

The pursuit of cross-model alignment, as demonstrated by HELIX, invariably exposes the sheer redundancy inherent in current large language model architectures. Each model, a monument to computational expense, strives toward an identical, yet ultimately diffuse, representation of knowledge. The framework offers a pathway toward leveraging this distributed effort, but it does not resolve the fundamental question: how much intelligence is truly necessary? Future work must confront the diminishing returns of scale, and prioritize methods for distilling core competencies, rather than simply accumulating parameters.

The reliance on linear transformations, while elegant in its simplicity, implies a certain rigidity in the representational spaces of these models. It begs the question: are genuinely disparate models ever truly aligned by such a constraint, or merely projected into a common, and potentially impoverished, subspace? Exploring non-linear mappings, and accepting the inevitable loss of perfect correspondence, may yield more robust, and ultimately more insightful, cross-model interactions.

Finally, the invocation of homomorphic encryption, while crucial for privacy, introduces a computational overhead that cannot be ignored. The ideal solution is not merely secure computation, but minimal secure computation. The challenge lies in identifying the irreducible core of operations necessary for alignment, and shielding only those, leaving the bulk of inference unencrypted. Such parsimony is not a limitation, but a testament to a deeper understanding.

Original article: https://arxiv.org/pdf/2603.18908.pdf

Contact the author: https://www.linkedin.com/in/avetisyan/

The Challenge of Unified Knowledge

Harmonizing Representations for Knowledge Transfer

HELIX: A Foundation for Secure Collaboration

Toward a Unified Understanding of Intelligence

Where Do We Go From Here?

See also: