
Teaching Pronunciation in a Short-Term EFL Tutoring Context Through Technology-Mediated Practice
Ahmad Zubaidi Amrullah, Iowa State University, Ames, Iowa, USA
Introduction
Pronunciation ability plays a crucial role in oral communication because it directly affects intelligibility, the ability of language learners both to understand and to be understood, and comprehensibility, or the amount of work required to understand the speech (Levis, 2018). Reflecting on my 15 years of experience as an English as a foreign language (EFL) speaking instructor, I find that pronunciation has received inadequate pedagogical attention, specifically in formal classroom teachings. My prior pedagogical practice was primarily based on incidental pronunciation instruction, which relied heavily on modeling, repetition, and corrective feedback without specific instructional planning. This article reflects on my experience as a novice pronunciation teacher involved in a four-week tutoring project with 50-minute sessions. Through this reflection, I discuss what worked, what did not, and the pedagogical lessons I learned from the project.
Last semester, in Fall 2025, I enrolled in a graduate course focused on the research and teaching of second language (L2) pronunciation. The course not only deepened my theoretical grasp of pronunciation instruction but also provided me with the opportunity to put classroom concepts into practice through a tutoring project. The project was carried out in a private (one-on-one) setting, based on the pedagogical assumption that the tutor and tutee can completely engage without "hiding" behind the dynamics of a bigger class (Kochem, 2021). In this scenario, every mistake, response, and development of the tutee becomes directly visible, thus providing ample room for continuous observation, correction, and reflection.
As part of the project procedure, the tutoring was preceded by a diagnostic assessment of the tutee candidate. The diagnostic assessment model was adapted from Chapter 11 of Teaching Pronunciation with Confidence: A Resource for ESL/EFL Teachers and Learners (Guskaroska et al., 2024). The test comprised seven steps that assess students’ overall intelligibility, consonants, word stress, intonation, vowels, linking, rhythm, prominence, -ed ending errors, and -s ending errors. The tutee in this program was a male EFL student from a Vietnamese L1 background who was pursuing an undergraduate degree. Based on the most frequent pronunciation errors I found in the diagnostic test, I focused on crucial areas he struggled with: final consonants, the vowel /eɪ/, and rhythm.
Implementing communicative pronunciation teaching through technology-mediated practice
Throughout the tutoring sessions, my initial focus was on instilling a sense of psychological safety and comfort, particularly when making mistakes. I deliberately positioned myself as a learning partner rather than a linguistic authority or teacher, highlighting our common history as non-native English speakers from EFL nations who had received little formal pronunciation training. We also established realistic expectations for this brief program so that the tutee understood what he could learn in that short period. These strategies sought to decrease awkwardness and impart the notion that long-term transformation requires active engagement and autonomous learning. Due to this time constraint, the main objective of this tutoring project was not solely to improve pronunciation accuracy in the short term, but also to increase the tutee's phonological awareness and strengthen motivation and autonomy, so that the improvement process could continue beyond the program.
To organize my sessions, I adapted Celce-Murcia et al.’s (2010) Communicative Framework, which originally includes five main stages: description and analysis, listening discrimination, controlled practice, guided practice, and communicative practice. For this short tutoring program, I replaced the description and analysis with an awareness-raising focused activity that I learned from the course to achieve my main objective: increasing the tutee's phonological awareness. This approach places listening and modelling as the core of pronunciation learning, in line with the view that production difficulties often stem from limitations in phonological perception. In other words, if the learner cannot hear the target feature, it would be difficult to produce it.
In the awareness-raising and listening discrimination stages, the learning focus was on ear training using minimal pairs, focusing on initial and final consonants and the vowels /eɪ/ vs /aɪ/. During these stages, we practiced some ear-training tasks, including same or different, circle the /aɪ/ word, underline /eɪ/ words, and word one or word two, to improve sound sensitivity. The minimal pair materials were created by adapting existing resources, including Teaching Pronunciation with Confidence (Guskaroska et al., 2024) and EnglishClub.com, both of which provide audio recordings for modelling. In addition, I presented the tutee to various online tools for quick word checking, including Google Translate, Cambridge Dictionary, and YouGlish. To aid in articulatory comprehension, the Tools for Clear Speech (TfCS) presented simulations of mouth motions that allowed the tutee to picture the formation of specific sounds.
The next step of the lesson design was to develop context-based materials, such as short reading texts and dialogues for controlled, guided, and communicative practices. The biggest problem at this point was the scarcity of ready-made materials geared to target certain pronunciation areas (i.e., final consonants, vowels /eɪ/ vs /aɪ/, and rhythm) that we were focusing on. To address this, I used Artificial Intelligence (AI)-powered tools like ChatGPT to generate authenticated texts like dialogues and stories that incorporate target words from minimal-pair lists into texts to practice pronunciation in a meaningful context. For instance, I created a dialogue featuring the vowels /eɪ/ and /aɪ/ using ChatGPT, based on lists of minimal pairs such as bay-buy, lay-lie, race-rice, etc. This approach allowed the tutee to practice pronunciation not only as a mechanical exercise, but also as part of purposeful communication. The audio component is also an important aspect of this procedure for modeling accurate pronunciation. Therefore, to develop these models, I used ElevenLabs, an AI-based platform that could generate text-to-speech audio with human-like sound quality and a wide range of character and accent variations. I inserted the text from the dialog I generated with ChatGPT into the ElevenLabs text-to-speech dashboard. I selected male and female voice characters that have American accents. Then, I generated the audio and shared it with the tutee so that it could be used for self-study.
During the production stage, the tutee engaged in a variety of oral activities, including practicing dialogues, answering short sentence-based yes/no questions, reading short stories, producing sentences, and retelling stories using keywords and pictures. As a complement, we practiced shadowing using Parroto, which provides English-language videos for shadowing and dictation activities, complete with automatic speech recognition (ASR) features to provide feedback on speakers’ pronunciation. This compilation of activities was created to bridge perceptual and oral production practice in a meaningful context while also promoting autonomous learning outside of tutoring sessions through technology-mediated practice.
Learning gains and persistent challenges
From the overall sessions, some activities contributed to significant progress, especially in final consonants, which were the main difficulty for the tutee. The tutee was gradually improving at pronouncing final consonants correctly. Initially, I expected the tutee would struggle with final sibilant sounds due to his L1 background, but I was mistaken; he was able to pronounce them with dedicated effort and awareness. This supported my conviction that regular repetition and correction can progressively help him notice his weakness and address it.
However, during the open dialogue exercise in the final session, the tutee reverted to his prior tendency of omitting final consonants. When the primary focus shifted to conveying meaning in an open dialog, pronunciation correctness was less closely monitored, and final sounds were unintentionally lowered or omitted. This experience demonstrates that success in controlled practice does not always translate into spontaneous speech output, highlighting the limitations of a correction-based approach in a short-term tutoring program, particularly when the intensity of practice and language exposure remains low. This gap between performance in controlled contexts and natural language use underscores that pronunciation change requires intensive and sustained exposure and practice (Thomson et al., 2023). To address this issue, I urged the tutee to speak slowly and to be more mindful of the target sounds during free speech. This finding also highlights the need for more communicative practice in this tutoring to allow learners to practice and build greater awareness during uncontrolled speech.
When the tutee noted that he used Google Translate to practice pronunciation, it sparked an intriguing discussion. The tutee displayed a word that I felt was spoken incorrectly, yet the system still recognized it as correct. To demonstrate this, I played examples of this word on YouGlish so the tutee could compare his pronunciation with a native-speaker model. I explained that the ASR technology uses pattern matching and prediction, thus incorrect pronunciation can still be recognized. This experience illustrates the limitations of relying solely on ASR for feedback and underscores the importance of improving listening skills alongside technology-based exercises.
Conclusion and implications
My observations from this brief tutoring project show that pronunciation improvement is most visible during controlled and directed activities, in which the tutee can consciously monitor target sounds in response to feedback. However, in more informal and spontaneous conversations, the tutee tends to revert to his former pronunciation tendencies. This disparity between controlled-context performance and natural-language use demonstrates that pronunciation improvement requires extensive exposure and consistent practice. Given the program's limited duration, the contribution of short-term tutoring to long-term change is limited, highlighting the importance of cultivating motivation and independent learning beyond the program.
Finally, this experience emphasizes the need to develop a pronunciation-lesson design that not only focuses on accuracy in classroom sessions but also encourages students to practice independently outside of class. In this setting, the tutor's role needs to shift from correction to facilitation, offering learners techniques, resources, and metacognitive knowledge of areas for improvement. From a professional development perspective, this experience emphasizes the importance of reflection for teachers, particularly those who are new to pronunciation teaching. Moreover, implementing a communicative framework that goes beyond isolated drills to meaningful practice provides the understanding that pronunciation teaching should move beyond mechanical drilling and be directed towards integrating sounds into real communication. Thus, pronunciation teaching practices can become more sustainable and better aligned with learners' needs.
References
Celce-Murcia, M., Brinton, D. M., & Goodwin, J. M. (2010). Teaching pronunciation: A course book and reference guide. Cambridge University Press.
Guskaroska, A., Zawadzki, Z., Levis, J., Challis, K., & Prikazchikov, M. (2024). Teaching Pronunciation with Confidence: A Resource for ESL/EFL Teachers and Learners. Iowa State University Press. https://doi.org/10.31274/isudp.2024.161
Kochem, T. (2021). Exploring the Connection between Teacher Training and Teacher Cognitions Related to L2 Pronunciation Instruction. TESOL Quarterly, 56(4), 1136–1162. https://doi.org/10.1002/tesq.3095
Levis, J. M. (2018). Intelligibility, oral communication, and the teaching of pronunciation. Cambridge University Press. https://doi.org/10.1017/9781108241564
Thomson, R. I., Derwing, T. M., & Munro, M. J. (2023). How long can naturalistic L2 pronunciation learning continue in adults? A 10-year study. Language Awareness, 33(2), 201–223. https://doi.org/10.1080/09658416.2023.2227559
Ahmad Zubaidi Amrullah is an MA student in TESL/Applied Linguistics at Iowa State University. He has been an English teacher in Indonesia for several years, where he facilitated students to leverage English for competitions and international scholarship opportunities. He is interested in issues such as computer/mobile-assisted language learning, teacher professional development, and gamified instruction.
