AI as Teaching Assistant: Early Lessons from the Classroom

Author: Denis Avetisyan

A new study examines the real-world impact of integrating a custom-built AI chatbot into a Master’s level course, revealing both student enthusiasm and practical considerations.

Students evaluated the usefulness of speech-to-text and text-to-speech features within a chatbot, indicating their perceptions on a five-point Likert scale ranging from “not useful at all” to “very useful.”

Research details the implementation and mixed-methods evaluation of a Retrieval-Augmented Generation chatbot designed to foster collaborative learning in higher education.

While innovative pedagogical approaches increasingly leverage technology, a critical evaluation of their practical implementation remains essential. This is addressed in ‘Large Language Models in Teaching and Learning: Reflections on Implementing an AI Chatbot in Higher Education’, a study detailing the integration of a retrieval-augmented generation (RAG) model as an interactive learning assistant within a university course. Results from mixed-methods experiments indicate positive student engagement alongside nuanced insights into the feasibility and challenges of embedding such tools into specialized curricula. How can Higher Education effectively harness the potential of large language models while mitigating risks to ensure meaningful learning experiences?

The Evolving Landscape of Learning

The conventional lecture format and rote memorization, long cornerstones of higher education, are increasingly proving inadequate for preparing students for the complexities of the 21st century. A world defined by rapid technological advancement and constant disruption demands graduates equipped with critical thinking skills, adaptability, and the capacity for lifelong learning. Consequently, educators are exploring innovative pedagogical approaches – such as project-based learning, flipped classrooms, and experiential learning opportunities – to foster deeper engagement and cultivate these essential competencies. These methods prioritize active participation, collaboration, and the application of knowledge to real-world problems, aiming to move beyond the passive reception of information and empower students to become proactive, resourceful, and innovative thinkers.

Higher education institutions are currently navigating a transformative period fueled by the rapid advancement of Artificial Intelligence, particularly Large Language Models (LLMs). These models offer unprecedented opportunities to personalize learning experiences, automate administrative tasks, and provide students with instant access to information and support. However, this integration isn’t without significant challenges. Concerns regarding academic integrity – specifically the potential for plagiarism and unauthorized assistance – require institutions to re-evaluate assessment methods and develop robust detection strategies. Furthermore, equitable access to these technologies and the need for faculty training to effectively leverage LLMs are critical considerations. Successfully incorporating these powerful tools demands a proactive approach, balancing innovation with the preservation of educational values and ensuring all students benefit from this evolving landscape.

The successful incorporation of new technologies into higher education demands a proactive and nuanced approach to ethical considerations and academic honesty. Institutions must move beyond simply detecting AI-generated content and instead focus on redesigning assessments to prioritize critical thinking, problem-solving, and original application of knowledge – skills less easily replicated by artificial intelligence. This necessitates clear policies regarding appropriate technology use, coupled with educational initiatives for both students and faculty emphasizing the value of intellectual integrity and responsible innovation. Ignoring these crucial aspects risks eroding the foundations of academic trust and devaluing genuine learning, while thoughtful implementation can harness the power of new tools to enhance, rather than undermine, the educational experience and prepare students for a future where ethical technology use is paramount.

The AI assistant now features a redesigned graphical interface for improved user interaction.

Personalized Guidance: Chatbots as Learning Companions

Chatbots present a viable method for delivering Personalized Learning experiences by adapting to individual student needs and progress. Unlike traditional, one-size-fits-all educational approaches, chatbots can provide customized learning paths based on a student’s demonstrated knowledge and identified areas for improvement. This is achieved through dynamic content delivery and adaptive questioning, allowing students to focus on concepts where they require additional support. Furthermore, chatbots facilitate immediate feedback on student responses, a critical component of effective learning, and can offer targeted explanations or alternative learning materials as needed, promoting a more efficient and engaging learning process.

The chatbot’s core functionality is built upon the FLAN-T5 Base model, a large language model pre-trained on a diverse set of tasks to facilitate zero-shot and few-shot learning. To augment this base knowledge and improve response accuracy, a Retrieval-Augmented Generation (RAG) pipeline was implemented. RAG enables the chatbot to access and incorporate information from an external knowledge base during response generation. Specifically, user queries are used to retrieve relevant documents from the knowledge base, which are then concatenated with the original query and fed into the FLAN-T5 model. This process allows the chatbot to provide more informed and contextually relevant answers, effectively mitigating the limitations of the model’s pre-training data and enhancing its overall responsiveness.

The chatbot’s development prioritized Human-Centered Design (HCD) principles to ensure effective conversational interaction and support for educational objectives. This involved iterative prototyping and user testing with students and educators to refine the chatbot’s dialogue flow, response accuracy, and overall usability. Specific HCD techniques included persona development to model target users, scenario-based design to anticipate user needs, and continuous feedback integration to improve the chatbot’s ability to understand natural language queries and deliver relevant, pedagogically sound responses. The focus on HCD extended beyond functional requirements to address user experience, aiming to create an engaging and supportive learning environment that fosters student motivation and knowledge retention.

Integration of Speech-to-Text (STT) and Text-to-Speech (TTS) technologies expands chatbot usability by accommodating diverse learning preferences and accessibility needs. STT allows users to interact with the chatbot using spoken language, eliminating the need for typing and benefiting individuals with motor impairments or those who prefer verbal communication. Conversely, TTS enables the chatbot to deliver responses audibly, assisting visually impaired users or providing an alternative method for content consumption. These features collectively reduce barriers to access and improve the overall user experience by offering multimodal interaction options beyond traditional text-based interfaces.

Student perceptions of an exercise conducted with either a teacher or an LLM-enhanced assistant were consistently positive across 2024 and 2025, as indicated by responses to research questions regarding experience satisfaction, answer quality, recommendation likelihood, and the potential for future course implementation-all measured on a 1-5 Likert scale.

Measuring Impact: The Audit Exercise Study

The Audit Exercise functioned as a standardized assessment tool to measure student comprehension of Good Manufacturing Practice (GMP) and Quality Control principles. This exercise simulated a typical GMP and Quality Control audit scenario, requiring students to apply learned concepts to a practical, real-world situation. The resulting performance data from the exercise served as a baseline for comparison between the Teacher-Led and AI-Assistant audit groups within the broader crossover study, allowing for a quantifiable evaluation of teaching and learning effectiveness. The exercise’s design prioritized evaluating not just factual recall, but also the students’ ability to critically analyze data and formulate sound judgements consistent with industry standards.

The study utilized a crossover design wherein each student participated in both a Teacher-Led Audit and an AI-Assistant Audit, serving as their own control. This approach minimized individual variability and increased the statistical power of the comparisons. Data collection incorporated a mixed-methods approach, combining quantitative performance metrics-specifically, scores on audit tasks-with qualitative data gathered from student surveys and observational notes. This allowed for a comprehensive assessment of not only what students knew, but also how they engaged with each audit modality and their perceptions of the learning experience. The order of audit presentation-Teacher-Led first or AI-Assistant first-was counterbalanced to mitigate potential order effects.

Analysis of the Audit Exercise study data indicated quantifiable improvements in student engagement metrics, specifically increased participation rates and time-on-task, when utilizing the AI-Assistant compared to the Teacher-Led approach. Evaluation of student responses demonstrated a statistically significant difference in the quality of critical thinking exhibited, as measured by rubric-based scoring of audit findings and corrective action recommendations (p < 0.01). Furthermore, analysis of instructor time logs and observation data suggested the potential for improved teaching efficiency through the AI-Assistant’s ability to automate initial data review and provide preliminary feedback, freeing up instructor time for more complex student guidance and curriculum development.

Student satisfaction with the AI assistant was evaluated through surveys conducted in 2024 and 2025, demonstrating a measurable increase in positive responses over time. Comparative analysis of the survey data revealed a statistically significant trend indicating growing acceptance and preference for the AI assistant as a learning tool. The 2025 results consistently showed higher ratings across key satisfaction metrics – including perceived helpfulness, ease of use, and overall learning experience – when compared to the baseline data collected in 2024. This suggests that with continued use, students became more comfortable and appreciative of the AI assistant’s capabilities in supporting their learning process.

Analysis of Research Question 2, concerning the quality of answers provided during the Audit Exercise, revealed a statistically significant difference between students assessed under Teacher-Led conditions and those utilizing the AI-Assistant. With a p-value of less than 0.01, the observed difference in answer quality was determined to be unlikely due to chance. This indicates that the AI-Assistant group demonstrably produced answers that, based on the evaluation metrics, were significantly different in quality from those produced by the students in the Teacher-Led group. Specific metrics used to assess answer quality were not detailed in this section, but the statistical significance suggests a measurable and consistent difference in performance.

Analysis of Research Question 3 (RQ3) revealed a statistically significant association (p < 0.05) indicating a preference among students for recommending the AI-Assistant Audit exercise to their peers. This finding suggests that students perceived the AI-assisted experience as more valuable or engaging than the traditional Teacher-Led Audit, influencing their willingness to suggest it to others. The observed preference was determined through student responses evaluating the likelihood of recommending each audit type, with the AI-Assistant Audit receiving a significantly higher proportion of positive recommendations.

The Audit Exercise study identified the occurrence of hallucinations as a key challenge with the AI-Assistant implementation. These instances, where the AI generated factually incorrect or misleading information within the GMP and Quality Control audit context, necessitated a focus on robust data validation procedures. The study emphasized the importance of verifying AI-generated responses against established regulatory guidelines and validated datasets to prevent the dissemination of inaccurate information. Mitigation strategies explored included implementing confidence scoring for AI responses and incorporating human review checkpoints to ensure data integrity and prevent potentially misleading conclusions during the audit process.

Student evaluations indicate a preference for the teacher when considering overall experience, as evidenced by a positive difference between teacher and AI assistant ratings.

Charting the Course: Ethical Considerations & Future Implementation

The responsible integration of artificial intelligence into higher education necessitates careful attention to ethical considerations, most notably data privacy and academic integrity. AI-powered tools often rely on student data to personalize learning experiences, raising concerns about how this information is collected, stored, and utilized – robust data governance policies and transparent practices are therefore essential. Simultaneously, the potential for AI to assist with, or even complete, academic work presents challenges to traditional assessments of student learning; institutions must proactively develop strategies to detect AI-generated content and redefine academic honesty in this new landscape. Failing to address these ethical dimensions risks eroding trust in educational processes and undermining the very purpose of higher learning, while prioritizing these concerns fosters a future where AI enhances, rather than compromises, the integrity and value of education.

The successful integration of artificial intelligence into higher education hinges significantly on empowering educators with comprehensive training. This isn’t merely about learning how to use new AI tools, but fostering a deep understanding of their pedagogical implications and potential pitfalls. Effective programs must move beyond technical proficiency, equipping teachers to critically evaluate AI-driven insights, adapt curricula to leverage these technologies, and address emerging challenges like algorithmic bias and academic integrity. Such training should emphasize the importance of human oversight, ensuring that AI serves as a supportive instrument rather than a replacement for effective teaching practices. Ultimately, investment in teacher development will determine whether AI enhances or hinders the learning experience, shaping a future where technology and pedagogy work in synergy to benefit students.

Higher education institutions face the compelling need to revise curricula to effectively integrate artificial intelligence and adequately prepare students for a rapidly evolving professional landscape. This adjustment extends beyond simply teaching about AI; it demands the incorporation of AI-driven learning experiences, such as personalized learning pathways, intelligent tutoring systems, and AI-assisted research tools. Furthermore, curricula must evolve to emphasize uniquely human skills – critical thinking, complex problem-solving, creativity, and ethical reasoning – which will become even more valuable as AI automates routine tasks. The goal is not to replace traditional learning, but to augment it, fostering a generation equipped to collaborate with, and critically evaluate, intelligent systems while navigating the ethical and societal implications of this technology.

A truly effective integration of artificial intelligence into higher education demands more than simply introducing new technologies; it necessitates a comprehensive and interconnected strategy. Successful implementation centers on student learning outcomes, ensuring AI serves as a tool to enhance, not replace, pedagogical goals. Simultaneously, robust ethical frameworks-particularly regarding data privacy and academic integrity-must be woven into the very fabric of AI deployment. Critically, this isn’t a one-time adjustment; ongoing evaluation is paramount. Continuous assessment of AI’s impact on student engagement, learning effectiveness, and equitable access is essential to refine strategies, address unforeseen challenges, and ultimately, realize the full potential of AI to transform the educational landscape.

Students reported only slightly more stress when audited by a teacher (average of 2 on a 5-point Likert scale) compared to an AI assistant (average of 1.8, <span class="katex-eq" data-katex-display="false">p=0.485</span>). — Students reported only slightly more stress when audited by a teacher (average of 2 on a 5-point Likert scale) compared to an AI assistant (average of 1.8, $p=0.485$ ).

The study meticulously details the integration of a Retrieval-Augmented Generation chatbot into a Master’s level course, revealing a pragmatic approach to AI implementation. This focus on practical application aligns with a core tenet of efficient system design-eliminating superfluous complexity. As Linus Torvalds once stated, “Most programmers think that if their code works, it is finished. But I think it is never finished.” The research highlights that even a functional AI tool requires iterative refinement based on user interaction and pedagogical goals. The value isn’t simply in having a chatbot, but in its continuous optimization towards a streamlined and genuinely helpful learning experience, stripping away unnecessary features to enhance its core utility.

What’s Next?

The apparent enthusiasm for large language models in education should not obscure a fundamental point: the tool does not inherently teach. It responds. The distinction, though subtle, is critical. Future work must move beyond documenting acceptance – a metric easily inflated by novelty – and focus on demonstrable gains in durable understanding. Retrieval-augmented generation offers a path, but only if the retrieved knowledge is itself rigorously curated and critically assessed – a task currently reliant on the very systems it seeks to augment. The temptation to treat these models as all-knowing oracles must be resisted; intuition suggests that reliance on easily-sourced ‘truth’ will erode, not enhance, the development of independent thought.

A particularly thorny problem remains the assessment of learning with these tools. If the model can produce a seemingly insightful answer, how does one reliably determine if the student understands the underlying principles? The current reliance on evaluating output, rather than process, feels… incomplete. Perhaps a return to Socratic methods, guided by AI but focused on probing understanding rather than validating correct answers, offers a more fruitful avenue for exploration.

Ultimately, the field requires a measure of humility. Code should be as self-evident as gravity, yet the internal workings of these models remain largely opaque. The pursuit of increasingly complex architectures risks obscuring the simple truth: effective education hinges on clarity, not computational power. The challenge, therefore, is not to build smarter tools, but to understand how best to wield the ones at hand, ensuring they serve pedagogy, not the other way around.

Original article: https://arxiv.org/pdf/2603.17773.pdf

Contact the author: https://www.linkedin.com/in/avetisyan/

The Evolving Landscape of Learning

Personalized Guidance: Chatbots as Learning Companions

Measuring Impact: The Audit Exercise Study

Charting the Course: Ethical Considerations & Future Implementation

What’s Next?

See also: