Beyond English: Assessing Generative AI Risks with Korean Culture

Author: Denis Avetisyan


A new dataset and risk framework, AssurAI, tackles the crucial need for culturally nuanced safety evaluations of generative AI models, moving beyond the limitations of English-centric benchmarks.

This paper details the construction of AssurAI, a multimodal Korean dataset and associated risk taxonomy designed to identify and mitigate potential harms in generative AI systems.

Despite the rapid advancements in generative AI, comprehensive safety evaluations are hindered by a notable lack of non-English datasets, particularly those attuned to specific socio-cultural contexts. This limitation motivates the work presented in ‘AssurAI: Experience with Constructing Korean Socio-cultural Datasets to Discover Potential Risks of Generative AI’, which introduces a novel, quality-controlled multimodal dataset and associated risk taxonomy designed to rigorously assess the safety of AI systems within the Korean cultural landscape. Comprising over 11,000 instances across text, image, video, and audio, AssurAI enables a more nuanced understanding of potential harms than currently available resources. Will this culturally-informed approach to AI safety evaluation prove essential for building truly reliable and responsible generative models for diverse global communities?


The Evolving Landscape of Generative AI and Its Inherent Risks

The swift evolution of generative artificial intelligence, especially within large language models, is unlocking creative possibilities previously confined to human imagination. These models, trained on vast datasets, demonstrate an accelerating capacity to produce original text, images, and even code with remarkable fluency and coherence. This isn’t merely replication; contemporary systems exhibit emergent properties, generating novel combinations and styles that push the boundaries of artistic and intellectual expression. The implications span numerous fields, from accelerating scientific discovery through automated hypothesis generation to democratizing content creation by empowering individuals with powerful tools for storytelling and design. This rapid advancement signifies a paradigm shift, suggesting a future where AI serves not just as a computational aid, but as a collaborative partner in the creative process, reshaping how content is conceived and experienced.

The accelerating capabilities of generative AI systems present a growing spectrum of risks beyond simple technical failures. These models, trained on vast datasets, can inadvertently – or deliberately, if compromised – produce content that is harmful, biased, or misleading. This includes the generation of convincing disinformation, hate speech, and personally identifiable information, raising serious concerns about privacy violations and the potential for manipulation. Furthermore, the ease with which these systems can create realistic but fabricated content – from text and images to audio and video – erodes trust in information sources and complicates efforts to discern truth from falsehood, posing a significant challenge to societal stability and individual well-being. The proliferation of such tools necessitates proactive development of robust safeguards and ethical guidelines to mitigate these emerging threats.

Current evaluations of generative AI safety frequently stumble due to a reliance on universal benchmarks that disregard the intricate tapestry of socio-cultural norms. A statement deemed harmless in one region might be deeply offensive or inciteful in another, yet existing tests often fail to recognize these vital distinctions. This lack of contextual awareness extends beyond language; cultural symbols, historical references, and even humor vary dramatically across societies, creating potential for misinterpretation and harm. Consequently, a Large Language Model passing a standardized safety check might still generate problematic content when deployed in a specific cultural setting, highlighting the urgent need for evaluation frameworks that prioritize localized sensitivity and nuanced understanding of diverse perspectives. The limitations of these generalized assessments underscore the complexity of building truly safe and responsible AI systems for a global audience.

AssurAI: A Comprehensive Multimodal Safety Dataset

The AssurAI dataset addresses a critical need for comprehensive safety evaluation of generative artificial intelligence models by extending beyond single-modality assessments. It provides a benchmark comprised of data spanning text, image, video, and audio formats, allowing researchers to assess potential risks arising from multimodal outputs. This multi-faceted approach is essential as generative AI increasingly integrates these modalities, and safety concerns are not always transferable between them. Evaluating safety across all relevant modalities provides a more holistic understanding of a model’s potential for generating harmful or inappropriate content, and facilitates the development of more robust safety mechanisms.

The AssurAI dataset consists of 11,480 individual instances designed to provide a statistically significant and comprehensive benchmark for evaluating the safety performance of generative AI models. This instance count allows for rigorous testing across various safety criteria and facilitates the development of more reliable and robust AI systems. The dataset’s size enables meaningful comparisons between different models and algorithms, moving beyond anecdotal evidence towards data-driven assessments of safety capabilities. The breadth of instances supports statistically powered analysis, reducing the impact of random variations and increasing confidence in the reported results.

The AssurAI dataset’s construction employed a multi-faceted methodology to maximize both diversity and challenge. Initial annotations were performed by subject matter experts to establish a high-quality baseline and define safety criteria. This was then scaled through crowdsourcing, leveraging a larger pool of annotators while maintaining quality via established inter-annotator agreement metrics. Finally, data augmentation techniques were applied to expand the dataset’s coverage and introduce variations, increasing the robustness of the benchmark and mitigating potential biases present in the originally sourced data. This combination of expert input, broad participation, and synthetic data generation results in a comprehensive and rigorously constructed resource for evaluating generative AI safety.

The AssurAI dataset distinguishes itself through its deliberate focus on the Korean socio-cultural context, enabling the evaluation of generative AI safety risks that are uniquely prevalent within that region. Unlike many existing safety benchmarks trained on Western datasets, AssurAI incorporates scenarios, prompts, and potential harms specifically relevant to Korean cultural norms, social sensitivities, and legal frameworks. This localized approach allows for the identification of biases and unsafe outputs that might otherwise be missed, such as those related to Korean history, social hierarchies, or specific cultural practices. The dataset addresses potential harms arising from culturally-specific misuse or misinterpretation by generative AI models operating within the Korean context, providing a more nuanced and regionally-appropriate safety assessment.

To establish the reliability of the AssurAI dataset’s safety labels, inter-annotator agreement was systematically quantified. Cohen’s Kappa was employed as the primary metric, with scores averaging 0.82 across all modalities and risk categories, indicating a high level of agreement between annotators. Disagreements were resolved through discussion and adjudication by expert reviewers, ensuring label consistency. Furthermore, Krippendorff’s Alpha was calculated to validate the robustness of the agreement measurement, yielding comparable results. These rigorous measurements demonstrate the dataset’s high-quality annotation and suitability for benchmarking generative AI safety.

A Granular Taxonomy for Understanding AI-Related Harms

The AI Risk Taxonomy comprises 35 identified risk factors pertaining to generative artificial intelligence systems. These factors are categorized to provide a granular understanding of potential harms, moving beyond broad risk classifications. The taxonomy details risks across multiple dimensions, including but not limited to bias and fairness, privacy violations, security vulnerabilities, misinformation and manipulation, and societal impacts. This classification scheme was developed in conjunction with the creation of the AssurAI dataset to facilitate consistent and comprehensive risk evaluation of generative AI models and outputs. Each risk factor is specifically defined to ensure clarity and enable standardized assessment procedures.

The AI Risk Taxonomy differentiates itself from prior risk assessment frameworks by integrating culturally specific harms identified through deep contextual analysis of diverse datasets and regional variations in societal norms. This expansion moves beyond universal risk categories-such as bias or misinformation-to encompass risks that are acutely sensitive to cultural context, including the potential for AI to perpetuate localized stereotypes, exacerbate existing social inequalities within specific communities, or generate outputs that are offensive or harmful according to regional ethical standards. This contextualization is achieved through analysis of both the data used to train AI models and the intended deployment environments, allowing for a more nuanced and accurate evaluation of potential harms beyond those detectable through purely technical assessments.

The AI Risk Taxonomy functions as a hierarchical system enabling the systematic identification of potential harms stemming from generative AI deployments. This framework facilitates risk assessment by providing defined categories and subcategories, allowing for consistent evaluation of AI systems across diverse applications such as content generation, code synthesis, and decision support. Mitigation strategies can then be tailored to address specific identified risks, focusing on preventative measures and response protocols; the taxonomy’s granularity supports targeted interventions and allows for tracking the effectiveness of implemented safeguards. This structured approach moves beyond generalized risk assessments towards a more precise understanding and management of AI-related harms.

The AssurAI dataset’s labeling process is directly guided by a 35-factor taxonomy of generative AI risks, ensuring consistent and standardized evaluations. Human annotators utilize this taxonomy to identify and categorize specific harms present in AI-generated outputs, moving beyond broad risk assessments to granular categorization. This structured approach improves inter-annotator reliability and allows for quantifiable measurements of risk across different AI models and applications. The taxonomy serves as the foundational ontology for the dataset, defining the parameters against which AI outputs are judged, and facilitating the development of more targeted mitigation strategies.

Rigorous Evaluation and the Pursuit of Safer AI Models

To rigorously assess the safety of large multimodal models, the AssurAI dataset serves as a critical foundation for automated evaluation. This dataset is paired with sophisticated judging models, including the latest iterations like GPT-4o and GPT-5-mini, enabling a scalable and efficient approach to safety testing. By leveraging these advanced models, researchers can move beyond manual review and consistently analyze a vast quantity of generated content – text, images, and video – identifying potentially harmful outputs with greater speed and reliability. This automated pipeline not only accelerates the evaluation process but also establishes a standardized framework, reducing subjectivity and ensuring a more comprehensive assessment of model safety across diverse input scenarios.

Automated safety evaluations, conducted using the AssurAI dataset and advanced judging models, reveal a remarkably consistent performance across four distinct language models. These evaluations, focused on text-based inputs, yielded average safety scores ranging from 3.3 to 3.9, demonstrating the robustness of the evaluation framework itself. Crucially, the low coefficient of variation – remaining below 9% – indicates minimal fluctuation in scoring, suggesting that the assessments are reliable and not unduly influenced by random variation. This stability is a key indicator of a well-designed evaluation process, capable of providing consistent and trustworthy insights into model safety.

Evaluations focusing on image processing capabilities reveal that Gemini 1.5 Flash proactively mitigated harm by safely blocking roughly 40% of potentially dangerous content. This assessment involved presenting the model with a diverse range of images designed to test its ability to identify and refuse processing of harmful visual material. The 40% blockage rate indicates a significant level of built-in safety mechanisms within the model, suggesting a robust defense against malicious or inappropriate visual prompts. This outcome is particularly notable as image-based harms are increasingly prevalent, and automated detection remains a complex challenge for artificial intelligence systems. The findings demonstrate Gemini 1.5 Flash’s capacity to contribute to a safer online environment by limiting exposure to potentially damaging imagery.

Evaluations utilizing Veo 2.0, a state-of-the-art video generation model, reveal a safely blocked rate of 15.8% when presented with potentially harmful prompts. This indicates the model proactively prevented the creation of video content deemed unsafe-a crucial metric for responsible AI development. While not a complete barrier to harmful content generation, the 15.8% blockage rate suggests inherent safety mechanisms are functioning within Veo 2.0. This finding is particularly important given the increasing sophistication of generative video models and the potential for misuse, highlighting the ongoing need for robust safety evaluations and refinement of blocking capabilities to mitigate risks associated with AI-generated video content.

To rigorously assess the safety of these advanced AI models, researchers employ red teaming – a proactive security testing technique. This involves simulating adversarial attacks, where skilled testers intentionally attempt to elicit harmful, biased, or inappropriate responses from the system. By actively probing for vulnerabilities and potential failure modes, red teaming goes beyond passive evaluation metrics. The process identifies weaknesses in the model’s safeguards, revealing how it might be exploited to generate unsafe content or bypass safety protocols. This targeted approach allows developers to refine the model’s defenses, strengthening its resilience against malicious inputs and improving its overall safety performance before deployment, ensuring a more robust and reliable AI system.

The evaluation pipeline incorporates robust toxicity detection methods to proactively identify and flag offensive or inappropriate content generated by the models. This process extends beyond simple keyword filtering; it utilizes nuanced algorithms capable of assessing contextual meaning and identifying subtle forms of harmful language, including hate speech, threats, and abusive remarks. By integrating these methods, researchers can systematically quantify the prevalence of toxic outputs, refine model safety protocols, and ensure a more responsible development trajectory for generative AI. This automated flagging allows for efficient large-scale analysis, providing a critical layer of oversight in evaluating model behavior and mitigating potential harm.

The construction of AssurAI, as detailed in the paper, exemplifies a dedication to focused design. The project deliberately centers on Korean socio-cultural nuances, eschewing the temptation to broadly apply existing, English-centric datasets. This approach mirrors a core tenet of efficient system building. As Donald Knuth observes, “Premature optimization is the root of all evil.” The team resisted the urge to prematurely leverage readily available resources, instead prioritizing a meticulously crafted dataset specifically attuned to the risks present within a distinct cultural context. This commitment to focused, culturally relevant evaluation benchmarks showcases a deep understanding of the complexities inherent in generative AI safety.

Future Vectors

The construction of AssurAI, while addressing a demonstrable scarcity, merely clarifies the magnitude of the problem. The assumption that safety is a transferable property – that mitigating risk in one linguistic or cultural context adequately prepares a system for another – appears increasingly fragile. A proliferation of such datasets, each meticulously grounded in specific socio-cultural realities, is not a solution, but a necessary mapping of the failure surface. Emotion, after all, is a side effect of structure, and a system ignorant of that structure will inevitably misinterpret the signals.

The taxonomy presented is, by necessity, incomplete. Risk is not static; it evolves in concert with both technological advancement and societal adaptation. Future iterations must prioritize not just the identification of harms, but the dynamic modeling of their propagation. The focus should shift from ‘safe AI’ – a sentimental ambition – to ‘predictable failure.’

The ultimate limitation remains the human element. Datasets can enumerate potential harms, but they cannot preemptively inoculate against malicious intent or unforeseen exploitation. Clarity, in this context, is not a promise of security, but compassion for cognition – a recognition that the most dangerous vulnerabilities are often those we fail to perceive.


Original article: https://arxiv.org/pdf/2511.20686.pdf

Contact the author: https://www.linkedin.com/in/avetisyan/

See also:

2025-11-28 18:21