Critical Digital Applications of Generative AI in L2 Classrooms: Listening and Speaking

Published on May 29, 2026

Introduction

As automated speech recognition and other generative artificial intelligence (AI) tools move into everyday ESL/EFL speaking and listening instruction, their feedback is often received and operationalized as objective and corrective. Yet emerging evidence shows that AI judgments are not consistently aligned with human assessments of intelligibility, and they can reproduce sociolinguistic biases related to accent and language background (Kang & Hirschi, 2025). Put differently, when classrooms treat AI outputs as authoritative, they risk reifying accent hierarchies and reinforcing deficit views of L2 speakers, even as learners’ communicative goals are intelligibility and interactional success rather than formal convergence with a single norm (Goh & Aryadoust, 2025). These tensions highlight the need for pedagogical approaches that help learners and teachers critically engage with how AI listens to and evaluates speech, rather than accepting its feedback as neutral or universally valid.

 

This article argues that AI in listening and speaking must be approached through critical, sociolinguistically grounded AI literacy. We conceptualize AI as a non-anthropomorphic listener, defined as a system with patterned expectations shaped by training data and annotation practices that privilege dominant norms (Raza et al., 2024). At the same time, AI can be understood as a limited interlocutor. Although it does not engage fully in adaptive, socially situated interaction, it produces feedback that may affect how speakers interpret and modify their language use. In this sense, AI functions both as a listener that evaluates speech and as an interlocutor that influences communication. Framing AI in this dual role shifts the pedagogical question from “What does AI say is correct?” to “How and for whom does AI listen, judge, or misjudge?” In doing so, we foreground intelligibility, listener responsibility, and fairness as core constructs for instruction. To bridge research and practice, this article outlines a set of principles and classroom practices that treat AI as one among many listeners and proposes adaptable classroom practices for using AI. 

 

AI as a Listener

Conceptualizing AI as a listener foregrounds how automated systems evaluate human speech through built-in patterned expectations. This framing situates AI within established constructs of listening and speaking as socially situated, co-constructed activities in which success is defined by intelligibility, comprehensibility, and interactional competence (Goh & Aryadoust, 2025). Thus, listening is seen as a process of interpreting meaning in context by drawing on expectations and negotiating understanding when breakdowns occur. However, AI systems “listen” differently. Their judgments are shaped by the training data and annotation practices used to build them, which encode particular linguistic norms and privilege certain accent profiles as default targets (Raza et al., 2024). As a result, AI-driven feedback often operationalizes correctness through proxies such as segmental match or lexical alignment. These features may imperfectly correlate with communicative success. AI systems, for example, may penalize speech that is intelligible to human listeners while favoring speech that more closely conforms to the dominant speech norms on which they are trained (Kang & Hirschi, 2025).

 

Viewing AI as a listener makes these mismatches visible. AI output is therefore viewed as subjective rather than an objective diagnostic signal. This perspective emphasizes that AI judgments mirror the priorities of a specific type of listener who lacks access to contextual reasoning, interactional repair, and accommodation of listener feedback. Fundamentally, this does not render AI feedback useless. In fact, it clarifies AI’s status as one listening perspective among many, which may be constrained by system-specific factors rather than by communicative purpose. Pedagogically, this lens re-centers the alignment of constructs in listening and speaking instruction. Treating AI as a listener with patterned expectations prepares the ground for examining when its output aligns with human intelligibility, when it diverges, and how such divergences can be used productively in instruction.

 

Why this Matters: Bias, Intelligibility, and Learner Consequence

Failing to recognize the pitfalls of AI feedback for listening and speaking has consequences that extend beyond accuracy. At the core of this issue is a potential misalignment between intelligibility, which is understood as the ability to be understood by humans in context, and the forms of correctness that are prioritized by AI systems. When AI feedback operationalizes correctness through segmental accuracy or lexical alignment, it risks redefining interactional competence around conformity to linguistic forms (Kang & Hirschi, 2025). The feedback AI systems provide is not neutral. This is because AI systems are trained on data and annotation schemes that privilege dominant language norms. Hence, their feedback reflects systemic bias embedded in design choices and training distributions. In classroom contexts, this may result in intelligible speech being flagged as deficient simply because it diverges from “standard” accents or language varieties. 

 

For learners, consistent exposure to such AI evaluations may shape beliefs about language, ability, and legitimacy. Research on speech-recognition-mediated practice shows that students often internalize AI feedback as objective and may interpret repeated flags as evidence of personal deficiency rather than as reflections of system limitations (Jeon et al., 2024). Over time, this effect can narrow learners’ conceptions of English from a flexible communicative resource to a fixed set of linguistic forms to be mastered, increasing self-monitoring and discouraging risk-taking in spoken interaction.

 

These effects are consequential in multilingual classrooms. When AI feedback consistently penalizes nondominant accents, it implicitly reinforces hierarchies that privilege certain ways of speaking while casting others as deficient. The result is a reinforcing cycle. AI judgments shape learners’ beliefs; learners’ beliefs, in turn, shape participation and willingness to communicate, and reduced participation limits opportunities for meaningful listening and speaking practice. In this way, uncritical use of AI risks amplifying existing inequities rather than alleviating them. The following section translates these concerns into instructional practices that use AI critically to support intelligibility, promote equity, and foster learner agency. 

 

AI as a Pedagogical Interlocutor

In earlier sections, we conceptualized AI as a listener. That is, a system that receives and evaluates spoken input based on patterned expectations that are shaped by training data and design choices. This framing emphasizes how AI interprets speech, often in ways that diverge from human standards of interactional competence. However, when such interpretations are introduced into instructional contexts, AI assumes a second role of pedagogical interlocutor. That is, it participates indirectly in learners’ communicative development through the feedback it provides. If AI judgments are not consistently aligned with human communicative success, then their role in listening and speaking instruction must be reconsidered. AI is more productively framed as a pedagogical interlocutor. This perspective of AI as a pedagogical interlocutor highlights how AI-mediated feedback shapes how learners reflect on their speech and how teachers frame listening and speaking goals. Hence, the interlocutor role builds on the listener role. AI evaluates speech as a listener but functions as an interlocutor when its evaluations are taken up, interpreted, and negotiated within an instructional activity. 

 

This reframing shifts the instructional focus in two important ways. First, it restores construct alignment. Listening and speaking instruction can remain anchored in intelligibility, comprehensibility, and interactional competence, with AI feedback interpreted relative to these goals. Second, it renders AI judgments subject to interrogation. Instead of asking only what the system flagged, teachers and learners can examine why a particular feature mattered to the system, whether it affected human understanding, and in which communicative contexts that judgment would carry weight. AI becomes a resource for making listening processes visible, surfacing mismatches between formal accuracy and communicative success, and prompting reflection on listeners’ responsibility in real-world interaction. Thus, AI becomes embedded in listening and speaking activities to support communicative development.

 

Classroom Practices for AI in Listening and Speaking Instruction

Reframing AI as a pedagogical interlocutor becomes meaningful only when it is embedded in core listening and speaking activities. The practices below are designed to develop learners’ ability to understand spoken language, produce intelligible speech, and manage interactional breakdowns in real-world communication. These practices can be implemented flexibly and independently of one another, depending on the instructional goals, proficiency levels, and classroom contexts. Across activities, AI is used as a contrastive listener whose responses improve intelligibility, listener uptake, and repair processes during instruction.

 

Practice 1: AI-Human Listening Comparison (Listening for Meaning)

Students listen to a short-spoken response (e.g., a peer’s task response or a recorded summary) and complete a meaning-focused task, such as identifying the main idea or answering comprehension questions. The same audio is then processed by an AI speech recognition tool to generate a transcript or a confidence score. Students compare what they understood with what the AI “understood.” Instructors can use the guiding questions below to structure whole-class or small-group discussions to direct learners’ attention to differences between human and AI listening processing.

 

Guiding questions:

  • Did human listeners recover meaning that the AI did not?

  • What contextual or pragmatic cues supported understanding?

  • Did accent variation interfere with comprehension?

 

This activity strengthens listening comprehension while reinforcing that intelligibility is listener-dependent. AI functions as a contrastive listener, highlighting how human listeners use context and expectations to make sense.

 

Practice 2: Listening Breakdown and Repair Through Interactive Listening

Students engage in a brief spoken interaction (e.g., an information-gap or opinion exchange). An AI tool processes a single speaker’s contribution and produces a transcript that contains misrecognition. Students identify where the breakdown occurred and decide whether they would have required clarification. Students then practice repair strategies such as confirmation checks, repetition, or rephrasing to resolve the misunderstanding.

 

Learners develop interactive listening skills by recognizing breakdowns and practicing repair, reinforcing listening as a collaborative, meaning-driven process rather than passive decoding.

 

Practice 3: Speaking for Different Listeners 

Students deliver the same short-spoken message to three listeners: a peer, the teacher, and an AI listener. After each delivery, listeners indicate whether the message was immediately intelligible or required clarification. Students reflect on which aspects of their speech helped them understand across audiences.

 

Instructors can use the guiding questions below to structure whole-class or small-group discussions to help learners attend to how listener expectations, roles, and evaluative criteria shape listening success.

 

Guiding Questions:

  • Which listener understood you the most easily, and why?

  • Did you modify any aspect of your speech, such as pacing, clarity, or word choice, across listeners?

  • Were there features that mattered to the AI listeners but not to the human listeners?

 

This activity develops intelligible speaking by foregrounding audience effects. Learners see that speaking success depends on listener needs and that AI represents one type of listener rather than an ideal communicative standard.

 

Practice 4: Intelligibility-Oriented Speaking Revisions

Speaking focus: strategic modification for clarity

 

Students receive AI feedback that flags aspects of their pronunciation. Instead of correcting individual sounds, they re-record their speech using intelligibility-oriented strategies such as slower pacing, clearer chunking, emphasis of key words, or improved information organization. Human listeners then evaluate whether intelligibility improved. Students learn to prioritize meaning and listener uptake over accent conformity, developing strategic speaking competence without erasing linguistic identity. Figure 1 illustrates this process. Students first record a spoken response and use AI feedback (e.g., flagged pronunciation features or transcripts) to revise their speech using intelligibility-oriented strategies such as slower pacing, clearer chunking, or emphasis of keywords. They then re-record their responses. Human listeners, such as peers or teachers, evaluate intelligibility. The last stage in this process is learner reflection on their listening uptake. 

 

Figure 1 

Intelligibility-Oriented Speaking Revision Cycle 

 

Final Thoughts

As AI tools become increasingly common in ESL and EFL listening and speaking instruction, reframing AI as both a listener and pedagogical interlocutor offers a practical alternative. As a listener, AI provides a point of comparison for teachers and learners to recognize the affordances and limitations of automated feedback in relation to intelligibility and interactional competence. As a pedagogical interlocutor, AI participates in instructional interactions by generating feedback that learners can interpret, question, and use to adjust their speech. For listening and speaking instruction, this means using AI as a resource for comparison and reflection. By examining AI feedback alongside human listener responses, learners can develop a clear understanding of intelligibility in real-world communication. Thus, AI can support listening and speaking that is focused on communicative competence. 


Harriet Dentaa is a PhD student in the Applied Linguistics program at Northern Arizona University. Her research focuses on linguistic and reverse-linguistic stereotyping, speech assessment, multilingualism, and teacher training, with a particular interest in how perceptions of accent and identity influence communication and educational practice.

 

 


Dr. Okim Kang is a professor of Applied Linguistics at Northern Arizona University, where she also serves as the director of the Applied Linguistics Speech Lab and the Global Communication Center. Her research interests include L2 speech perception/production, linguistic stereotyping/attitudes, L2 pronunciation/intelligibility, L2 listening/speaking assessment and testing, automated scoring/speech recognition, and the intersection of AI and issues related to speech.

 


References

Goh, C. C. M., & Aryadoust, V. (2025). Developing and assessing second language listening and speaking: Does AI make it better?. Annual Review of Applied Linguistics, 45, 179–199. https://doi.org/10.1017/s0267190525100111 

Jeon, J., Wei, L., Tai, K. W. H., & Lee, S. (2025). Generative AI and its dilemmas: exploring AI from a translanguaging perspective. Applied Linguistics, 46(4), 709–717. https://doi.org/10.1093/applin/amaf049 

Kang, O., & Hirschi, K. (2025). Bias and stereotyping: Human and artificial intelligence (AI). Annual Review of Applied Linguistics, 45, 69–84. https://doi.org/10.1017/s026719052500008x

Raza, S., Garg, M., Reji, D. J., Bashir, S. R., & Ding, C. (2024). Nbias: A natural language processing framework for BIAS identification in text. Expert Systems with Applications, 237, Article 121542. https://doi.org/10.1016/j.eswa.2023.121542