Teaching Pronunciation: Intonation

Next up, intonation! What is intonation exactly, and why does it matter? Intonation is made up of the pitches that rise and fall when we speak. When we speak, intonation acts like punctuation. Although we don’t think about it too often, our intonation actually communicates a lot about our intentions and emotions. Misplaced intonation can not only make the speaker’s English sound “off” or “accented,” but it can also give off the wrong impression or cause miscommunication.


American English relies a lot on falling intonation, which is when we drop or lower our voice at the end of a phrase. We tend to use it at the end of a thought for short assertions and questions with interrogative words. For example: It’s hot today. What are you wearing? In both examples, your voice naturally drops to indicate that you have completed the thought. Sometimes, in more complex sentences, we fall or drop more than once to indicate the separation of phrases or ideas. This acts in a similar way as a comma or a semicolon.

We also rely on rising intonation when we are asking a yes or no question. For example: Is it hot? In this example, your voice rises when you say the word “hot,” indicating to the listener that you expect a response. Sometimes ESL texts will mistakenly teach that all questions need rising intonation. This isn’t true. Think about how you say these two questions: Is it hot? Why is it so hot? In the first example, the intonation rises at the end. In the second example, “why” is stressed, and the intonation drops at the end.

For some sentences, we mix up the intonation. If we have an introductory phrase or clause, sometimes we rise at the end of the first part and fall when the sentence is completed. For example: If I go outside, I’ll get hot. We naturally rise a bit when we say “outside,” and we fall when we complete the thought with the word “hot.” We also go up and down when we are asking about two or more things. For example: Is it hot (rise) or cold (fall)?

If we are saying all the correct words, then why does intonation even matter that much? Surely the listener can figure out what we mean, right? Well, sometimes but not always. Plus, listeners can subconsciously judge the speaker by these little cues, even if they don’t intend to. Strong, falling intonation at the end of each phrase (or “lexical chunk”) makes the speaker sound more confident. Misplaced rising intonation makes the speaker sound confused or insincere.

A common mistake for Mandarin speakers is to increase their volume to stress meaning rather than use their intonation. In their native language, a change in tone indicates a totally different word. So, they often give equal stress to each word and up the volume to give certain words more value. This can come across as aggressive or angry, which is unfortunate when the speaker does not have that intention.


Fortunately though, with VIPKID, you are working with young students who still have a lot of linguistic flexibility. With the very young students, you might notice that they naturally copy your intonation. Your best strategy with the young ones is to pay attention to your own intonation. Make sure your speech stays as natural as possible, even when you slow the pace down. For example: Can you circle (rise)? Yes! (fall) I can circle! (fall) For the really young students, you can also practice repeating “uh-oh!” and “oh no!” with an intonation shift. This can actually be a pretty fun game. Drag a character or image off the screen and say, “oh no!” Sometimes we combine this with practicing “goodbye.” Either the teacher or the student will say “bye!” and lean over so they aren’t in view. Student or teacher then says, “Oh no! Where’s Student/Teacher?” It’s silly and exaggerated, and I’ve found that the young ones tend to copy my intonation exactly when we do it.

With the older, more advanced students, visual cues help. I like to draw little arrows to indicate the ups and downs. When you have longer reading passages, drawing arrows to coincide with the punctuation helps highlight how intonation acts as audible punctuation for the listener.


One great thing about focusing on intonation is that it naturally lends itself to fixing another common problem for Mandarin speakers: the dropped word endings. You’ve probably noticed that many of your VIPKID students struggle with their final S, T, L, D, and B sounds. It doesn’t come easily for them, so many students drop the sounds as they speak. However, focusing on intonation requires pauses as we rise and fall, which often helps the student slow down to finish the word correctly. Once the student gets in the habit of moving their pitch up and down, it is easier to add stress to place value on words rather than shooting the words out one by one. Intonation goes hand in hand with word stress, and when we stress a word, we are more likely to hit that final consonant as well.

Happy teaching!


Teaching English Pronunciation with VIPKID


One of the biggest reasons why parents push to have their very young children learn English is because they want exposure to the language during the child’s Critical Period. Studies don’t agree on exactly when this period ends, and it might be slightly different for each child. However, there is a general consensus that, especially when it comes to pronunciation and language fluency, the earlier the child begins to learn the language, the better. Some more conservative studies suggest that if a child does not begin to learn a language by the age of 5, the child will not be able to speak the language like a native speaker. When it comes to grammar and syntax, studies show that the timeline is much more flexible. Older children are able to learn grammar, and second language acquisition for children in the 7-14 year age range can be very successful. But when we are looking at pronouncing the language like a native speaker, it seems that early exposure is critical. And, if we consider that the goal of language is effective communication, pronunciation is critical when it comes to getting our message across to the listener. In fact, oftentimes HOW we say something influences comprehension just as much as WHAT we say. A single mispronounced sound or misplaced stress can cause the listener to completely misunderstand the intended meaning.


If you are VIPKID teacher, you probably know the struggle of teaching and correcting pronunciation. Unfortunately, while most teachers are generally familiar with the grammar rules of English and how to teach them, many of us have not spent a lot of time studying the phonetics of American English. In fact, many elements of pronunciation are just so ingrained in us from an early age that we do not even recognize them or have the ability to pinpoint what they are. We can, however, recognize when something is off or missing…when something doesn’t sound quite right. The challenge, then, is identifying the issue, explaining the problem, and helping the student correct it. Not an easy task! While the VIPKID curriculum does a pretty good job teaching phonics, it does not really provide as much for teaching phonetics (aside from the PreVIPKID curriculum). The workshops and materials for teachers do provide instruction on synthetic phonics, which is helpful for reading and for the pronunciation of certain sounds to some extent. If you haven’t looked over the information on synthetic phonics, I recommend you take some time to do that because it is a good place to start.

When we are talking about pronunciation, it is important to remember that we are actually examining two things: segmentals and suprasegmentals. Segmentals are the individual sounds; suprasegmentals apply to different segments that come together. Basically, you can think of suprasegmentals as all the “other stuff” that affects pronunciation: intonation, word stress, syllable stress, prosody/rhythm, etc. Many people think of pronunciation as simply pronouncing all of the sounds correctly, but that is only one small part of the way we speak. In fact (and I find this super interesting!), for many American listeners, fixing the suprasegmentals of a non-native English speaker’s speech can actually have a bigger impact on perceived “accent” than fixing segmentals. The bad news is…suprasegmentals are less tangible for most people and are generally much harder to adjust once they are “set” by our native language. Obviously, there is no hope in trying to explain to BaoBao the difference between syllable stress and stress patterns in descriptive phrases vs. set phrases. Many native English speaking adults will look at you funny if you try to explain it! The good news is, most of the VIPKID students are still young enough to copy and acquire these elements without having to understand them, which is why the listening phase and the parroting-everything-back phase are actually really useful as long as the teacher is speaking slowly yet naturally.


I do, however, think it is helpful for the teacher to have some knowledge of suprasegmentals, especially those elements that are specifically harder for native Chinese speakers. This allows the teacher to be prepared for common mistakes. It can also help us remember to continue to speak “naturally” even when we slow down our normal speech rate for beginner students. Keeping the importance of suprasegmentals in mind will also ensure that we take the parts of the VIPKID curriculum that really help with suprasegmentals seriously: songs and poems. Yes, I am sure we have all gotten to that dreaded Five Little Monkeys Jumping on the Bed slide with 30 seconds left on the clock and thought…NOOOOOO! I personally dread all the songs actually, because I have a horrible singing voice. It is truly dreadful.


This part of the curriculum serves an important purpose though. Even if the student has no clue what we are saying, when they mimic your singing, they are incorporating the suprasegmental elements that they need. It is actually beneficial that they don’t understand the meaning, because this allows them to focus completely on mimicking your prosody, stress, and intonation…all the things that are almost impossible to teach. You might even notice that, if the student is more advanced, they try to read the words to the song rather than just mimic you, and they end up getting the rhythm wrong. Poetry acts in the same way because, even though you are not singing, the intonation and stress is exaggerated. And again, the WAY you speak is the focus of the activity rather than the meaning of the content.

I’ll be doing a short series of posts on three suprasegmentals: intonation, stress, and prosody. I plan to give a short overview of what they are, what to keep in mind when working with native Mandarin speakers, and (hopefully!) a few useful tips for helping your students with this element of pronunciation in the context of the VIPKID classroom. After that, I will write a short series on helping students listen to and reproduce the more difficult individual sounds. Although teaching pronunciation can be tricky and sometimes straight up confusing, it is an essential part of learning a language. The more you know about what your mouth (and everything in it!) is doing, the easier it will be to teach correct pronunciation to your VIPKID students.


Affective Filter Hypothesis and the PreVIP Class

If you are reading about language acquisition, you have probably frequently encountered the words “input,” “output.” Everyone wants to figure out: what is the best amount of input vs. output. How can we best get the input in the student and the output out of the student? How do we determine quality vs. quantity? Linguists and educators (all of them a lot smarter than I am!) have written a ton on the subject, but what I want to focus on today is the affective filter hypothesis. It basically deals with how non-linguistic factors, such as anxiety or stress, prohibit input. These negative feelings seem to act like a filter or screen, preventing the language from getting from the outside (the teacher’s mouth) into the mind of the student.

If you’ve never experienced learning a second language, you have still probably experienced something similar to this. Maybe you or a loved one has been really sick, and you are listening to the doctor explain the possible treatment. The situation is causing you to feel scared and anxious, and even though you are trying to listen to the doctor, you are not getting all the information into your brain. Sometimes we don’t even realize that we didn’t properly get the information until after the situation is over and our stress levels have dropped. So, even though we heard all the words, the stress filtered the comprehensible input.

Now, some suggest that the affective filter hypothesis doesn’t apply to children because they lack the affective filter that causes problems for adult language learners. I disagree. Children, from a very young age, experience stress and anxiety, especially in unfamiliar learning situations…and especially when they feel pressure or “on the spot.” I do think that very small babies acquire language in a very stress-free environment. If you watch how people interact with infants, it is hilarious. We smile, coo, and act goofy…right up in their little faces. We repeat the same words, mama, mamma, mama…over and over again from the time they are first born. And when they respond, we go nuts. Baby says “mama” for the first time, and the room lights up, everyone repeating “Mama!” “Mama!” Yay! Cheers! When, a few months later, baby waves bye bye to a random stranger in Target, the whole aisle goes nuts. Everyone smiles and waves. Baby hear friendly excited voices repeating “bye bye!” We tend to respond to babies like their words are downright miraculous. And you know what? They are! This tiny new human is building a foundation for a lifetime of communicating with the world the thoughts and feelings that will grow in their hearts and minds.

As we grow, we get more and more self aware and self conscious. And unfortunately, the world around us also gets more and more critical. Probably no one is cheering on your every utterance in the Target aisle anymore. Bummer.

So, by the time a kid is 3-4 years old, while they might not be as self conscious about making a mistake as an adult, they still experience stress and anxiety. When you have a PreVIP student, they are experiencing a new learning format for the first time. They are looking at a new face on a screen and listening to a new voice making sounds they don’t understand. It’s understandable that a lot of kids are going to feel at least some level of stress in that situation.

The reason why I wanted to focus on the affective filter and the very young student is because we generally think of anxiety in relation to output rather than input. When I think of stress and language, I tend to think of “freezing up” or stage fright…being scared and unable to respond. However, the affective filter hypothesis claims that negative emotions influence input. So, we have to keep in mind how stress is affecting the language coming in, not just the language coming out. This is really important for the younger students in the “listening phase” of language learning because input is our main goal.

Long story short, we have to find ways of lowering that filter, or in other words, we need to try to mitigate those negative feelings. And we have to do it in a different environment than most of us are used to because we are on a tiny screen rather than in person.

The most important thing you can do is smile. Right off the bat, you absolutely have to smile. It’s universal. We all want to see a smiling face. And you must keep smiling, no matter what. On a practical note, I find that my face reverts to a more neutral expression when I’m drawing on the screen or typing. So, I have to make a conscious effort to keep smiling. Second practical note, it helps to wear lipstick. I’m not a big make-up person, and I hate wearing lipstick. However, I can’t deny that my smile shows up and projects across camera better when I have some color.

You should also double check your set up. You don’t want to be too far or too close to your screen. You also need to make sure you are eye level. It is important that you don’t appear to be looking down at the student because that can be intimidating. You can prop your laptop up on books if you need to.

Another thing you can do is use familiar props. I like to use small Minnie and Mickey figures. I can hold them up next to my face, and I can bring them closer and then farther away from the screen. I’m not a familiar face, but Mickey almost always is! I also like small rubber duckies or dinosaurs. I know VIPKID recommends using a print out of the Mike and Meg characters, but they aren’t familiar at first so I don’t use those right away. I also like to start with sounds instead of words. The curriculum does this as well. For example, instead of holding up the dinosaur and saying “Hi dino,” I just hold up the dinosaur and say “roar!” Roar! Yay! Roar! Let the kid roar back a few times, then say “hi dino…roar!” The “roar” is familiar and fun before saying the English word “hi.” If I think the “roar” is going to be too scary, I do the same thing with a little duck. However, it seems like even the shyest kids like to roar.

One thing that is necessary yet stress inducing is correction. We sometimes have to correct pronunciation…it is inevitable. You can do it in a fun, easy way though. Using the recast technique is a friendly way to correct without being intimidating or making the student feel uncomfortable. Basically, when the student says something incorrectly, just say it back correctly. Give them a chance to say it right, and then move on. Don’t frown or say “that was wrong.” Keep your positive momentum and be really enthusiastic with your praise when the student says it correctly.

The last thing that helps me is simply acknowledging that the stress is there and that it is hindering the input. Keeping that in mind reminds me to be more compassionate toward the little student on the screen. It helps to remember that making the student comfortable is a constant, important part of the job, not just an occasional side hurdle to deal with.

Good luck with the little ones! I know that the beginner lessons can be difficult. Doing our best to reduce the non-linguistic factors that hinder language input can be challenge; however, it really is remarkable how language is acquired and developed, and it can be really fun to watch a student go from nervous and scared to happy and excited as they get more comfortable with you and more confident in their language skills.