
Using Artificial Intelligence to Improve Pronunciation
Sinem Sonsaat-Hegelheimer, Iowa State University, Ames, Iowa, USA
A Conversation with Dr. Sinem Sonsaat-Hegelheimer
Artificial intelligence (AI) is increasingly being used in education, offering new opportunities for teachers to enhance their instruction. One area where AI shows particular promise is in pronunciation practice, where educators can implement innovative strategies to support learners’ speaking skills. Despite its potential, many teachers are still exploring how to use AI effectively and which approaches lead to meaningful improvements.
To learn more about practical strategies for AI-supported pronunciation teaching, we spoke with Dr. Sinem Sonsaat-Hegelheimer, an expert in pronunciation instruction and technology. Dr. Sonsaat-Hegelheimer shared insights from her research, comparing different AI tools and outlining actionable strategies that teachers can apply in their classrooms. Below is our conversation, highlighting key principles, examples, and practical recommendations.
Dr. Agata Guskaroska: Based on your work, what are the key elements for designing effective pronunciation teaching?
Dr. Sinem Sonsaat-Hegelheimer: Thank you for the question. There are a few core principles that I’ve found to be essential through my research and practice.
First, and most importantly, is to know your learners. I’ve worked primarily with graduate students from diverse fields whose needs were highly specific. They weren’t just learning general English; they wanted to improve their pronunciation of highly technical, field-specific words. They mentioned having difficulty being understood when discussing their research, so I had to get to know their individual areas and motivations. Understanding your specific learners, their language backgrounds, what they need to communicate, and the difficulties they’ve experienced, is the number one thing.
Second, you need to have a clear goal. My study’s primary goal was to improve my students’ comprehensibility. But alongside that, I had another important goal: I wanted them to practice speaking freely, without the anxiety of somebody listening to them and judging them. Technology can create a private space for that kind of practice. These clear goals for both the linguistic outcome and the student experience help shape everything that follows.
Third, teaching learners how to use AI for best practices is important. Teachers are vital for guidance and monitoring the process, but you don’t want learners to depend on you forever. The goal is to empower them to use these tools responsibly and efficiently on their own, long after they’ve finished your class. What was fascinating was that this became a collaborative process. I would show them how to use the AI, but then they would show me new ways they used it for their work. A student from computer science might discover a new application and share it with me, so I learned a great deal from them as well. It’s about getting them to ask, “How can I use this properly and efficiently for my specific needs?”
Finally, you must learn to write good prompts. If you’re using a tool like ChatGPT or Gemini, your prompt’s quality directly impacts the output’s quality. When you provide detailed, well-structured prompts, you not only get better results from the AI but also provide a good model for your students to follow. Graduate students, in particular, pay close attention to the details in the prompts you provide and use them as a guide. Just putting in a few keywords will get you something, but a good prompt will get you the specific thing you need.
Dr. Guskaroska: You mentioned “comprehensibility.” Could you explain that term for teachers who might not be familiar with it?
Dr. Sonsaat-Hegelheimer: Of course. To explain comprehensibility, it helps first to mention intelligibility. Intelligibility is about the listener’s actual, objective understanding: how much they can decode the speaker’s speech at a word or utterance level. Comprehensibility, conversely, is about the effort it takes for the listener to understand what is being said. As teachers, we aim to decrease that listener's effort as much as possible. It’s important to remember that comprehensibility isn’t only about pronunciation; it’s also affected by vocabulary choices and grammatical accuracy. However, pronunciation is a major component. We don’t need learners to sound like Americans or British speakers. We want them to be understandable while keeping their accent, which is a beautiful part of their identity.
Dr. Guskaroska: What kinds of AI-powered activities did you use in your research?
Dr. Sonsaat-Hegelheimer: I used two main types of activities. First were controlled tasks, where I asked learners to read specific words, sentences, or paragraphs. Controlled tasks allow you to monitor the content and see how the AI interprets their pronunciation of known text. For example, I had them read minimal pairs like “light” and “right” to work on specific sounds. The second type was free speech tasks, where I would give them a prompt, similar to what you’d find on a TOEFL or IELTS exam, and let them speak freely. The tasks included daily topics and also role-plays to simulate interaction with another person. This mix was strategic. I started with free speech tasks in the first week, but then introduced controlled tasks specifically to increase the participants’ awareness using Gemini, as it helped them focus more closely on the transcript and notice specific pronunciation issues. I facilitated these activities using two chatbots: Gemini, a general-purpose tool, and Pronounce, a chatbot specifically designed for pronunciation and speaking improvement.
Dr. Guskaroska: How do those two tools, Gemini and Pronounce, compare for these tasks?
Dr. Sonsaat-Hegelheimer: They have very different strengths and are suited for different purposes. Gemini is excellent for improving accuracy and intelligibility, especially with controlled tasks. Its best feature is the real-time transcript it generates as you speak. The feature acts as a powerful form of implicit feedback; students can immediately see if there are discrepancies between what they intended to say and what the computer understood. This gives them an idea of how a human perceives their speech. One student, an Indian speaker, noticed the transcript kept interpreting a word he said as “Delhi”. He realized it was because he is from India and uses the word “Delhi” frequently in his normal speech, so the AI was biased toward that interpretation. He gained an incredible level of awareness just by observing the transcript.
In turn, it highlights a fascinating tool aspect: with enough interaction, it can be trained. In one hilarious instance, a participant was reading the sentence “The rain poured heavily” but pronounced “poured” in a way that Gemini initially transcribed as “bored”. After several exchanges where the AI was prompted not to auto-correct, it finally explained: “You actually said bored, but I corrected it to poured based on context”. That it could learn the user’s intent and then explain its correction process shows that the AI is trainable.
However, Gemini’s major weakness in spoken conversation is that it can “forget” the initial prompt after a few exchanges and go off-topic. In other words, the user must be skilled at writing detailed prompts and sometimes even re-prompting mid-conversation to bring it back on track. Pronounce is better for developing fluency. It is designed to keep the user talking; it always ends its turn by asking a question to prompt a response. Unlike Gemini, it doesn’t show a real-time transcript, which prevents speakers from getting distracted, allowing them to speak more freely. Instead, it generates a detailed feedback report, covering not just pronunciation but vocabulary use, grammatical accuracy, discourse, and engagement.
The challenge with Pronounce is that the automated feedback isn’t always 100% accurate. For this reason, it is vital for teachers to keep an eye on the feedback the program gives and to explicitly teach students to take it with a grain of salt. I know the team developing Pronounce is constantly working on making its feedback accuracy better and their platform actually welcomes feedback, which is great. What we see here aligns with the concept from research that feedback doesn’t need to be perfect; it often just needs to be “good enough” to be effective. A great classroom activity could be to have students work in groups to critically evaluate the AI’s feedback, discussing whether they agree with it and why. This builds their critical awareness and reinforces that they have ultimate ownership over their learning. So, if I had access to both, I would use Pronounce for free speech practice to build fluency and Gemini for controlled tasks to build accuracy and intelligibility.
Dr. Guskaroska: Your point about accent being part of one’s identity is important. Can you expand on that?
Dr. Sonsaat-Hegelheimer: Absolutely. The goal is intelligibility, not accent reduction, unless a student personally chooses that path. An accent is not a deficit; it’s evidence that you are multilingual, which is fantastic. Your brain is processing at least two languages, which is an incredibly complex skill. My research with a Brazilian Portuguese speaker gave me a wonderful example. Through her practice with Gemini, she learned that she was dropping the final “-ed” from past tense verbs, a common feature for Portuguese speakers, and she worked to correct it. She was proud of her progress, but then her American boyfriend told me, “Where did those things go? I liked them!”. Those little quirks were part of her unique speech and her identity, and he missed them. The point is for learners to become aware of these features and choose what they want to work on. It’s their choice. Their accent is part of who they are.
Dr. Guskaroska: Finally, what is your main takeaway for teachers considering using these technologies?
Dr. Sonsaat-Hegelheimer: My main takeaway is to be ready to provide guidance and remember that technology is not your replacement: it’s a tool. My research showed that students really, really need a teacher’s guidance. Even with a perfect technology tool, they still want the assurance and connection from a a teacher, a human.
In my research, many participants chose to come to my office to do their practice with the chatbots. They liked knowing I was physically there and accessible if something went wrong or they had questions. Technology can make our jobs easier by taking care of repetitive practice while focusing on higher-order skills with our students, but it will not replace us.
Also, remember that using AI for speaking is completely different than for writing. A written essay might be one single exchange, but a spoken conversation can be 50 or 60 turns back and forth. It’s dynamic and unpredictable; we shouldn’t expect it to behave better than humans. If I can forget the question in the middle of a conversation, why wouldn’t Gemini forget the prompt? So don’t expect it to behave perfectly. Start small, pilot an activity, and don’t be discouraged if it isn’t perfect the first time. The potential is enormous, but it requires a thoughtful and hands-on approach from the educator.
Resources
Supplementary prompts ( Dr. Sonsaat-Hegelheimer’s paper): Journal of Second Language Pronunciation: https://doi.org/10.1075/jslp.24053.son
Sinem Sonsaat-Hegelheimer is an Assistant Professor, who works on second language pronunciation teaching and learning, materials development and evaluation, and computer-assisted language learning. She has co-authored chapters in several books and published her research on various journals such as TESOL Quarterly, CATESOL Journal, and Speech Communication. She chaired the Speech Pronunciation and Listening Interest Section (SPLIS) of TESOL International Association between 2020-2021. She is the editorial assistant of Journal of Second Language Pronunciation since 2015, and she is the co-editor of “Second language pronunciation: Bridging the gap between research and teaching” (Wiley, 2022) with John Levis and Tracey Derwing.
