Author: Denis Avetisyan
New research reveals how strategically designed assessments, built around interconnected problems, can effectively evaluate genuine understanding and resist being solved by generative AI.

A theoretically grounded framework demonstrates that assessments emphasizing complex problem-solving and cognitive load are more resilient to AI completion and provide a more accurate measure of student skills.
Traditional modular assessments in computing education are increasingly vulnerable to circumvention by generative AI, creating a mismatch between evaluation and authentic skill demonstration. This paper, ‘Designing AI-Resilient Assessments Using Interconnected Problems: A Theoretically Grounded and Empirically Validated Framework’, establishes that assessments built around interconnected, multi-step problems-where outputs build upon prior stages-are demonstrably more resistant to AI completion and better reflect students’ integrative thinking. Empirical validation across four university courses reveals substantial AI-driven score inflation on standard homework, contrasted by maintained performance on these interconnected projects. Can this framework offer a viable pathway to restore academic integrity while fostering the very skills needed in an AI-augmented world?
The Erosion of Traditional Assessment: A System Under Stress
The integration of generative artificial intelligence is fundamentally reshaping both professional fields and the methods used to evaluate competency within them. These tools, capable of producing human-quality text, code, and other outputs, are no longer confined to research labs; they are becoming commonplace in workplaces and, increasingly, accessible to students. This proliferation presents a significant challenge to traditional educational assessment, which often relies on easily replicable tasks now within the capabilities of AI. The speed of this change demands a critical re-evaluation of how genuine understanding and skill are measured, as current methods may inadvertently assess an AI’s capabilities rather than a student’s. Consequently, educators and institutions are compelled to explore innovative assessment strategies that prioritize higher-order thinking, application of knowledge, and demonstrably original work-skills less susceptible to automated completion.
The convenience of traditional modular assessments – quizzes, short essays, and problem sets completed outside of a controlled environment – is now significantly challenged by the rise of sophisticated generative AI tools. These tools can readily produce responses that, while appearing original, may not reflect a student’s actual comprehension of the material. This susceptibility isn’t simply a matter of academic dishonesty; it fundamentally questions the validity of these assessments as accurate gauges of learning. Because AI can effectively ‘complete’ the tasks, the demonstrated performance no longer reliably indicates a student’s knowledge, skills, or ability, creating a disconnect between grades and genuine understanding. This poses a substantial problem for educators striving to fairly and accurately evaluate student progress and identify areas needing further support.
The advent of sophisticated generative AI has introduced a measurable phenomenon termed the ‘AI Inflation Effect’ within educational assessment. Studies reveal a significant discrepancy between student performance on readily completable, modular assignments and their performance under more controlled, proctored exam conditions. This gap, reaching as high as 30 percentage points in some instances, suggests that a considerable portion of the success observed on modular work is attributable to AI assistance, rather than genuine student mastery. This inflation doesn’t necessarily indicate a decline in overall student ability, but rather highlights the susceptibility of traditional assessment formats to circumvention by these powerful tools, prompting a re-evaluation of how authentic understanding is reliably measured.
The observed discrepancy between scores on easily completed, modular assignments and those from proctored examinations underscores a critical challenge to evaluating authentic student knowledge. This “AI Inflation Effect” isn’t merely a matter of grade inflation; it represents a fundamental breakdown in the ability of traditional assessments to gauge genuine understanding and skill acquisition. Consequently, educators are compelled to explore and implement assessment strategies that move beyond easily automated tasks and focus instead on higher-order thinking, critical analysis, and the application of knowledge to novel situations. These revised approaches may include project-based learning, oral examinations, in-class writing assignments, and assessments designed to specifically target the limitations of current AI technologies, ensuring that evaluation accurately reflects a student’s true capabilities and preparedness.

Designing for Resilience: The Architecture of True Assessment
Increasing cognitive load and complexity in assessments is a primary strategy for mitigating the impact of artificial intelligence on evaluating genuine student understanding. This approach centers on designing tasks that demand more than simple recall or application of isolated facts; instead, assessments should require students to synthesize information, engage in multi-step reasoning, and apply knowledge in novel situations. By exceeding the current capabilities of most AI models to perform complex cognitive tasks, educators can more effectively differentiate between AI-generated responses and authentic student work. The effectiveness of this strategy is supported by empirical data demonstrating a 30% increase in score variability – measured by standard deviation (21.93) – in interconnected assessments compared to traditional, open-ended projects (standard deviation of 16.83), indicating a greater capacity to discern varying levels of student proficiency.
Interconnected assessments, characterized by tasks demanding multi-step reasoning and the integration of knowledge from multiple domains, demonstrably improve assessment resilience compared to isolated, modular problems. These assessments require students to synthesize information and apply it across various problem-solving stages, increasing cognitive demand. This approach contrasts with modular assessments, where each component can be addressed independently, potentially allowing AI to solve sections without demonstrating genuine understanding. Data indicates that interconnected projects exhibit a 30% increase in score variability – a standard deviation of 21.93 compared to 16.83 for open-ended projects – suggesting a greater capacity to differentiate student abilities and identify genuine mastery of complex concepts.
The design of AI-resilient assessments benefits from principles established by Cognitive Load Theory, which posits that learning is optimized when cognitive demands are appropriately managed. This is mathematically formalized in Theorem 1, demonstrating that interconnected assessment problems – those requiring multiple integrated reasoning steps – exhibit greater resilience to automated completion by current AI models. Specifically, the theorem provides a quantitative basis for understanding why AI struggles with tasks demanding complex cognitive synthesis, as opposed to isolated knowledge recall. The theorem’s validation relies on the increased computational complexity required to accurately model and solve interconnected problems, exceeding the capabilities of algorithms optimized for simpler, modular tasks.
Element Interactivity serves as a quantifiable metric for the cognitive demand imposed by interconnected assessment tasks and directly correlates with their resilience to automated completion. Analysis indicates that projects designed with high Element Interactivity demonstrate a 30% increase in score variability compared to traditional open-ended projects; specifically, the standard deviation of scores for interconnected projects is 21.93, whereas open-ended projects yield a standard deviation of 16.83. This heightened variability suggests a greater capacity for these assessments to differentiate between varying levels of student understanding and skill, indicating a more robust and nuanced evaluation of student abilities.
Increased score variability in interconnected assessments, demonstrated by a standard deviation of 21.93 compared to 16.83 for open-ended projects, indicates a superior capacity to differentiate between student levels of understanding. This heightened variability suggests that interconnected tasks are more sensitive to nuanced differences in cognitive skills and knowledge application. Traditional, modular assessments often result in compressed scoring distributions, potentially masking genuine variations in student ability. The expanded range of scores observed with interconnected tasks provides a more granular and informative profile of student performance, allowing educators to more accurately identify both high-achieving students and those requiring additional support.

The Power of Structure: Constraining the Algorithm
Semi-open-ended projects offer a more dependable assessment of student competency by balancing creative freedom with clearly defined structural constraints and objective success metrics. Unlike fully open-ended assignments, which allow for a wider range of approaches and potential AI-generated solutions, semi-open-ended projects limit the solution space while still requiring students to apply their knowledge and skills. This focused approach enables a more precise evaluation of specific competencies, minimizing ambiguity in grading and increasing the validity of the assessment. The deterministic nature of the success criteria allows for consistent and reliable measurement of student performance, reducing the impact of subjective interpretation and enhancing the comparability of results across students and assessments.
The design of semi-open-ended projects directly addresses documented limitations in Large Language Models (LLMs). Current LLMs, while proficient in pattern recognition and text generation, exhibit weaknesses in tasks requiring genuine complex reasoning, novel problem-solving, and the application of nuanced judgment. As detailed in “LLM Limitations,” these models often struggle with tasks demanding multi-step inference, particularly when faced with ambiguity or the need to synthesize information from diverse sources. Consequently, assessment strategies relying heavily on these skills are susceptible to artificially inflated scores resulting from LLM-generated responses that appear correct but lack substantive understanding. Semi-open-ended projects, by providing a structured framework and deterministic success criteria, mitigate this risk by focusing assessment on demonstrable execution within defined parameters, rather than solely on the generation of potentially superficial content.
Theorem 2 provides a mathematical framework demonstrating the increased reliability of semi-open-ended projects as a measure of student competency in the presence of generative AI. The theorem establishes that the probability of accurately assessing a student’s skills, denoted as $P(A)$, is significantly higher for semi-open-ended projects ($P_s$) compared to fully open-ended assignments ($P_o$) when AI assistance is factored in. Specifically, $P_s = 1 – (k * p)$, where $k$ represents the complexity of the task and $p$ is the probability of AI successfully completing the task without genuine student understanding. The theorem posits that the defined structure within semi-open-ended projects limits the scope for AI to generate complete solutions without student input, thereby reducing the error rate in assessment and increasing the validity of the results.
Data analysis demonstrates a strong positive correlation (0.954) between the use of interconnected projects and the accurate measurement of student skills. This contrasts with modular assignments, which exhibit only a moderate correlation (0.726) to overall skill assessment. The significant difference in correlation coefficients suggests that interconnected projects provide a more reliable and valid method for evaluating student competency, as the completion of one task directly informs and is dependent upon the successful completion of preceding and subsequent tasks, creating a more holistic assessment of applied knowledge and problem-solving abilities.
Open-ended projects, while intended to foster creativity, present a significant vulnerability to completion via Large Language Models (LLMs) due to their lack of inherent constraints on solution approaches. This reliance on generative AI compromises the ability to accurately gauge a student’s genuine competency. Conversely, projects designed with a defined structure-semi-open-ended projects-limit the scope for AI-driven completion, as the AI is constrained by the pre-defined framework and success criteria. This reduction in AI influence directly improves the validity of assessment outcomes by providing a more reliable signal of a student’s actual skills and understanding, as opposed to the capabilities of an external AI tool.
Toward Authentic Evaluation: A System Designed to Challenge
The core tenets of interconnectedness, structured complexity, and limited freedom, initially explored within the framework of Semi-Open-Ended Projects, represent a broader paradigm for effective learning and evaluation. These principles suggest that true understanding isn’t fostered through isolated skills or rote memorization, but through tasks that mirror the intricate challenges of the real world. By designing experiences where elements are intentionally linked, problems aren’t neatly defined, and solutions aren’t predetermined, assessment moves beyond simply testing knowledge to evaluating a student’s ability to synthesize information, adapt to constraints, and navigate ambiguity. This approach isn’t limited to formal evaluations; it influences instructional design, curriculum development, and even the creation of learning environments, ultimately preparing individuals for complex problem-solving in any field.
The pursuit of authentic assessment represents a fundamental shift in educational evaluation, aiming to bridge the gap between academic learning and professional practice. This approach prioritizes tasks that closely resemble the challenges encountered in real-world careers, demanding students apply knowledge and skills in interconnected ways rather than through isolated recall. By structuring assessments around complex, multi-faceted problems – mirroring how professionals tackle ambiguity and integrate diverse information – educators can move beyond measuring memorization to evaluating genuine competency. This necessitates a move from contrived exercises to scenarios that require critical thinking, collaboration, and creative problem-solving, ultimately providing a more valid and meaningful measure of a student’s preparedness for future endeavors.
Assessment gains substantial power when it moves beyond simply testing for memorization and instead focuses on demonstrable understanding and practical application of knowledge. Valid assessments aren’t about reciting facts, but about skillfully employing concepts to solve problems mirroring those encountered in real-world scenarios. This approach necessitates tasks that demand critical thinking, creativity, and the ability to synthesize information – qualities that truly indicate competency. Meaningful assessment, therefore, becomes a powerful indicator of a student’s readiness to navigate complex challenges, offering a more accurate and insightful measure of their overall learning than traditional methods reliant on rote recall.
The evolving landscape of knowledge and skill demands a perpetual refinement of assessment strategies; static evaluations quickly become misaligned with contemporary competencies. Effective measurement now necessitates embracing emerging technologies – from adaptive learning platforms and data analytics to simulations and virtual reality – not simply as tools for delivery, but as integral components of the evaluative process itself. This isn’t merely about incorporating new gadgets, but about fundamentally redesigning assessments to prioritize demonstrable skills-problem-solving, critical thinking, creativity, and collaboration-over rote memorization. Consequently, the focus shifts towards creating dynamic, performance-based tasks that mirror the complexities of real-world challenges, ensuring that evaluations accurately gauge a student’s capacity to apply knowledge and thrive in a rapidly changing professional environment.
The pursuit of assessment design, as outlined in this work, mirrors a fascinating form of intellectual reverse engineering. The researchers don’t simply accept current assessment methods; they dissect them, identifying vulnerabilities to increasingly sophisticated AI. This deliberate ‘breaking’ of the system – exposing the ease with which isolated problems can be solved by generative AI – isn’t destructive, but diagnostic. It echoes G.H. Hardy’s sentiment: “A mathematician, like a painter or a poet, is a maker of patterns.” The researchers, much like Hardy’s mathematician, are crafting a new pattern of assessment-one built on interconnectedness-to reveal genuine understanding beyond mere pattern recognition. The increased cognitive load demanded by these multi-step problems isn’t a bug, but a feature, forcing a deeper engagement with the material and a more robust demonstration of skill.
What’s Next?
The demonstrated efficacy of interconnected assessment tasks, while promising, merely shifts the locus of the challenge. Generative AI doesn’t simply fail at isolated problems; it excels at pattern recognition and, given sufficient data, will inevitably learn to navigate complex, multi-step problems as well. The true test isn’t preventing completion, but discerning understanding from sophisticated mimicry. Future work must focus on methods to probe the ‘why’ behind an answer, not just the ‘what’ – demanding explicit justification and the transparent articulation of reasoning processes.
This research implicitly acknowledges a fundamental truth: assessment is an adversarial game. It is not about finding a foolproof system-such a thing is inherently illusory-but about continually raising the bar. The emphasis should move from constructing ‘AI-resistant’ tasks to designing assessments that reveal the limits of the AI’s comprehension, highlighting the qualitative difference between algorithmic processing and genuine cognitive flexibility.
Furthermore, the exploration of cognitive load, while present, requires deeper investigation. The interconnectedness designed here isn’t merely about difficulty; it’s about mirroring the inherent messiness of real-world problem-solving. Future research could explore how varying degrees of ‘productive failure’-allowing students to grapple with ambiguity and refine their approaches-impact both AI performance and the development of robust, transferable skills. The goal isn’t to beat the machine, but to cultivate a form of intelligence that remains uniquely human.
Original article: https://arxiv.org/pdf/2512.10758.pdf
Contact the author: https://www.linkedin.com/in/avetisyan/
See also:
- Zerowake GATES : BL RPG Tier List (November 2025)
- Super Animal Royale: All Mole Transportation Network Locations Guide
- How Many Episodes Are in Hazbin Hotel Season 2 & When Do They Come Out?
- T1 beat KT Rolster to claim third straight League of Legends World Championship
- Shiba Inu’s Rollercoaster: Will It Rise or Waddle to the Bottom?
- Terminull Brigade X Evangelion Collaboration Reveal Trailer | TGS 2025
- Riot Expands On Riftbound In Exciting Ways With Spiritforged
- I Love LA Recap: Your Favorite Reference, Baby
- Pokemon Theme Park Has Strict Health Restrictions for Guest Entry
- Where Winds Meet: March of the Dead Walkthrough
2025-12-13 01:55