Flap! Flap! (Or maybe, tap?!)

Everybody flaps! Well, at least, native English speakers from North America do it. The funny thing about it is, many English speakers don’t realize that they are doing it. If someone asks us what sound the letter T makes, we probably respond by making the “tuh” sound. It’s the sound we make when we sound out the letter T in individual words. However, in actual speech, it isn’t so simple.


In a previous post, I talked about the “fast D” sound. However, since it can be a source of confusion, I wanted to look at it in a little more detail. I think the easiest way to explain and teach this sound in an ESL context is to describe it as a D. The actual name for this is  alveolar flapping. The T or D changes to this sound in certain circumstances, mainly when the T or D comes between two vowel sounds with the second vowel being unstressed. This is true when the sound occurs within a word, and it’s also true when the sound occurs in connected words.

For example:
Butter is pronounced “budder.”

Get up! is pronounced “ged-up!”

For this reason, the words “feudal” and “futile” often sound the same, even though the spelling is quite different. If you have been speaking American English your whole life, chances are you don’t even notice. To make it a bit more confusing, oftentimes if a native speaker slows down and pays extra attention to an individual word, they revert back to pronouncing a clear T or D.

But I’m an ESL teacher! I speak clearly and enunciate every sound! Well, no. You don’t actually. And if you did, you would sound very weird. Like. A. Robot.


In fact, the alveolar flap is a very natural sound. It’s not lazy or informal American English at all. It’s just how we speak.

If you are a VIPKID teacher and you don’t have an ESL background, you might have gotten confused when this topic popped up in the pronunciation courses. One reason might be confusion over the D sound. Most ESL textbooks (and the VIPKID curriculum) just use the letter D to describe the sound. However, that’s not accurate. It’s not the harsher, initial D sound that we produce when the D comes at the beginning of the word. It is a softer, faster D, the same sound we make when the D comes in the middle of a word. So, if you are making a harsh, strong “DUH” sound, then it is going to sound funny and incorrect.


The second reason for confusion is our letter/sound association. We spent all our lives thinking that T makes a “tuh” sound. It can tricky to re-evaluate that association. The best way is to close your eyes and say the word without looking at it. Don’t think of the way it is spelled. Think only about the shape and movement of your mouth.

Say these words out loud at a normal speech rate: butter; wedding; party

If you pay careful attention, you can feel that your tongue is making the same movement in all three words. So much about how we speak, we do without thinking. It’s simply ingrained in us from a young age. However, if you are going to teach pronunciation, you have to start thinking about and feeling what’s going on in your mouth and throat. You also have to start listening for sounds, not thinking about letters or spelling.

Another reason for confusion is that the sound does not always occur between two vowels. It occurs when the second vowel sound is unstressed and reduced. That’s why you don’t hear it in a word like “retail,” for example. Also, we are talking about vowel sounds here, not letters. That’s why you do hear the sound in a word like “party.” Finally, it does not occur in foot-initial positions, such as in the word “Mediterranean.”


Well, this is all very confusing! How do I teach this?!

Like I mentioned, most textbooks just use the letter D, and I think that really is the easiest approach. Certainly attempting to explain the alveolar flap via an online classroom to a beginner English speaker on the other side of the planet is not going to go well. If you want to make a distinction for a more advanced student, I think referring to it as a “medial D” or “fast D” is sufficient. The most effective way to teach it is simply to say the word correctly as you would in normal speech so the student can listen and re-produce the sound. You might have to pay a little extra attention to this when you are slowing down your speech rate for a more beginner student.

In the context of VIPKID, parents are paying for an American teacher for a reason: they want their child exposed to the American accent. If language is all about communication, then we have to teach the language as it is spoken and understood. For those of you who don’t have a background in teaching ESL, or even better…in linguistics or speech pathology, teaching these weird pronunciation elements might seem a little daunting. The good news is, you do know them. You just might not know that you know them! But if you take some time and really pay attention to your mouth and the sounds you make, you will hopefully find that teaching pronunciation is actually really fun and rewarding!


Teaching Pronunciation: Intonation

Next up, intonation! What is intonation exactly, and why does it matter? Intonation is made up of the pitches that rise and fall when we speak. When we speak, intonation acts like punctuation. Although we don’t think about it too often, our intonation actually communicates a lot about our intentions and emotions. Misplaced intonation can not only make the speaker’s English sound “off” or “accented,” but it can also give off the wrong impression or cause miscommunication.


American English relies a lot on falling intonation, which is when we drop or lower our voice at the end of a phrase. We tend to use it at the end of a thought for short assertions and questions with interrogative words. For example: It’s hot today. What are you wearing? In both examples, your voice naturally drops to indicate that you have completed the thought. Sometimes, in more complex sentences, we fall or drop more than once to indicate the separation of phrases or ideas. This acts in a similar way as a comma or a semicolon.

We also rely on rising intonation when we are asking a yes or no question. For example: Is it hot? In this example, your voice rises when you say the word “hot,” indicating to the listener that you expect a response. Sometimes ESL texts will mistakenly teach that all questions need rising intonation. This isn’t true. Think about how you say these two questions: Is it hot? Why is it so hot? In the first example, the intonation rises at the end. In the second example, “why” is stressed, and the intonation drops at the end.

For some sentences, we mix up the intonation. If we have an introductory phrase or clause, sometimes we rise at the end of the first part and fall when the sentence is completed. For example: If I go outside, I’ll get hot. We naturally rise a bit when we say “outside,” and we fall when we complete the thought with the word “hot.” We also go up and down when we are asking about two or more things. For example: Is it hot (rise) or cold (fall)?

If we are saying all the correct words, then why does intonation even matter that much? Surely the listener can figure out what we mean, right? Well, sometimes but not always. Plus, listeners can subconsciously judge the speaker by these little cues, even if they don’t intend to. Strong, falling intonation at the end of each phrase (or “lexical chunk”) makes the speaker sound more confident. Misplaced rising intonation makes the speaker sound confused or insincere.

A common mistake for Mandarin speakers is to increase their volume to stress meaning rather than use their intonation. In their native language, a change in tone indicates a totally different word. So, they often give equal stress to each word and up the volume to give certain words more value. This can come across as aggressive or angry, which is unfortunate when the speaker does not have that intention.


Fortunately though, with VIPKID, you are working with young students who still have a lot of linguistic flexibility. With the very young students, you might notice that they naturally copy your intonation. Your best strategy with the young ones is to pay attention to your own intonation. Make sure your speech stays as natural as possible, even when you slow the pace down. For example: Can you circle (rise)? Yes! (fall) I can circle! (fall) For the really young students, you can also practice repeating “uh-oh!” and “oh no!” with an intonation shift. This can actually be a pretty fun game. Drag a character or image off the screen and say, “oh no!” Sometimes we combine this with practicing “goodbye.” Either the teacher or the student will say “bye!” and lean over so they aren’t in view. Student or teacher then says, “Oh no! Where’s Student/Teacher?” It’s silly and exaggerated, and I’ve found that the young ones tend to copy my intonation exactly when we do it.

With the older, more advanced students, visual cues help. I like to draw little arrows to indicate the ups and downs. When you have longer reading passages, drawing arrows to coincide with the punctuation helps highlight how intonation acts as audible punctuation for the listener.


One great thing about focusing on intonation is that it naturally lends itself to fixing another common problem for Mandarin speakers: the dropped word endings. You’ve probably noticed that many of your VIPKID students struggle with their final S, T, L, D, and B sounds. It doesn’t come easily for them, so many students drop the sounds as they speak. However, focusing on intonation requires pauses as we rise and fall, which often helps the student slow down to finish the word correctly. Once the student gets in the habit of moving their pitch up and down, it is easier to add stress to place value on words rather than shooting the words out one by one. Intonation goes hand in hand with word stress, and when we stress a word, we are more likely to hit that final consonant as well.

Happy teaching!


Teaching Pronunciation: Word Connections!

Teaching American English pronunciation is always a challenge. So much about the way we speak has been ingrained in us basically since birth. Teaching the phonetic sounds is a little more straightforward than teaching suprasegmentals, but since the phonetic sounds don’t exist in isolation, you need to be able to address all of the elements of pronunciation and understand how they work together to form our speech. Like I said in my previous post, when you are working with young kids (such as with VIPKID), you oftentimes won’t be able to explicitly teach these elements. However, having a solid understanding will give you a better foundation for identifying the student’s pronunciation issues and addressing them within the context of the VIPKID curriculum. Today, I’m going to look at three things: word liaisons/connections, the “fast D” sound (a bit of a detour, but relevant!), and intonation.


When we speak English, our speech doesn’t sound “choppy,” but rather it sounds fluid or rhythmic because we naturally connect some words together when they are part of a thought group. This also allows us to speak faster than if we had to stop and pause after each word. There are some general rules we follow when it comes to connecting words.


1.) We connect words when a word ends in a consonant sound, and the next word begins with a vowel or semivowel (also called glides; W, Y, and R) sound. For example: “Pick up the phone!” The words “pick” and “up” are connected. It sounds like “pi+kup.”

2.) We connect words when a word ends in a consonant, and the next words begins with the same consonant or with a consonant that is produced from a similar position. For example: “I’ve been there!” The V and B sounds are both made at the lips, so it’s easy to blend them. It sounds like “I’vbin.”

3.) When a word ends in a vowel and the next begins with a vowel, they are connected with a semivowel, either a slight Y or W. For example: “Who is it?” It sounds like, “Who(w)izit?”

4.) This one is a weird one to teach. When a T, D, S, or Z sound is followed by a word that starts with Y, they are connected when the speech rate is fast (even moderately so).

T + Y = CH (“What’s your name?” sounds like “whatcher name?”)
D + Y = J (“Did you go?” sounds like “didju go?”)
S/Z+ Y = SH/ZH (“Yes, you did” sounds like “yeshu did”)

5.) Pronouns are generally non-stressed and connected to the previous word. For example: “I knew her.” sounds like “I newer.”)

For students learning English, this can be tricky both for mimicking American English speech and for understanding the native English speaker. Word liaisons aren’t addressed in the early VIPKID levels, but they are essential for speaking and listening to English. So, what can we do? First, don’t completely avoid them. We generally need to slow our speech down for the lower levels, and this results in removing natural word liaisons. However, you can slow your speech rate without removing the connections completely. You can also introduce a phrase slowly, and then repeat it faster as the lesson progresses. This works great in the PreVIP levels because they are so repetitive anyway. So, the first time you ask the student for their name, you ask: What. Is. Your. Name? Then, next time, you say it faster and connect the first two words. Keep that up until you are asking the student for their name at a normal, natural rate…just like you would use when you meet a fellow English speaker for the first time.

Another opportunity to emphasize word liaisons is the song/poem slide in the lower levels. Like I mentioned in a previous post, these slides are helpful because you can focus on mimicking rather than learning the content. This slide is a good example:


The lines end with the phrase, “Shout it out.” This is a good chance to try to get the student to blend the words. They can generally repeat after you, “Shou+did+out,” with no major breaks. This example is also great because it shows how, when you connect the words, it actually changes the pronunciation of the individual sounds. In American English, when the T is between two vowel sounds (either in the same word or because of a word connection), the T is pronounced like a fast D sound. It’s also sometimes called a tapped T. We don’t change the T into a fast D when it’s in a stressed syllable (like attack).

For example, we pronounce “better” as “bedder,” “total” as “todal,” “little” as “liddul,” and “party” as “pardee.” You’ll note that in the words party and little, the T isn’t in-between two vowels. But, it is between two vowel sounds. We need to make a small vowel sound before we pronounce the L and R acts as a semivowel. If you say those sounds out loud and pay attention to what your tongue is doing, you can feel how you are making those sounds without your tongue touching anything…making it act as a vowel sound.

When working with students who are young enough to learn by listening rather than reading, you have a great opportunity to introduce these elements. For some words, if they learn the word before they learn to read it, it’s easier to get the correct pronunciation. In the VIPKID curriculum, the word “little” is a sight word early on. I like to use rewards to introduce this word to young students, preferably before they see it as a sight word. I have little ducks, and I hold them up when they do a good job. “Great! (Thumbs up!) You get a little duck!” The students tend to repeat “little duck!” It works nicely because the D sound is repeated and reinforced in the second word.


Native Mandarin speakers generally pronounce each word separately, and the characteristics of the language can make it hard to get comfortable connecting the words in English. Mandarin tends to start with consonants and end with vowels or nasal consonants (like n). However, learning word connections can actually alleviate one of the hardest things for Chinese speakers: those final T, L, B, and D consonants. A common mistake for Chinese speakers learning English is to drop those final consonant sounds and replace them with a W. If you can connect the final consonant with the next word, it’s easier to produce the consonant correctly. Common mistake: “Call her” is incorrectly pronounced as “Caw her” because the final L is a hard sound. If you link it, it’s much easier to say, “Ca+ller” with a good L sound. So, if you can get the students comfortable mimicking your word connections from an early on, you will be setting the stage for solid pronunciation habits as they progress toward fluency.

That’s it (or should I say…that+sit!) for word liaisons. Part II, intonation, coming soon.


Teaching English Pronunciation with VIPKID


One of the biggest reasons why parents push to have their very young children learn English is because they want exposure to the language during the child’s Critical Period. Studies don’t agree on exactly when this period ends, and it might be slightly different for each child. However, there is a general consensus that, especially when it comes to pronunciation and language fluency, the earlier the child begins to learn the language, the better. Some more conservative studies suggest that if a child does not begin to learn a language by the age of 5, the child will not be able to speak the language like a native speaker. When it comes to grammar and syntax, studies show that the timeline is much more flexible. Older children are able to learn grammar, and second language acquisition for children in the 7-14 year age range can be very successful. But when we are looking at pronouncing the language like a native speaker, it seems that early exposure is critical. And, if we consider that the goal of language is effective communication, pronunciation is critical when it comes to getting our message across to the listener. In fact, oftentimes HOW we say something influences comprehension just as much as WHAT we say. A single mispronounced sound or misplaced stress can cause the listener to completely misunderstand the intended meaning.


If you are VIPKID teacher, you probably know the struggle of teaching and correcting pronunciation. Unfortunately, while most teachers are generally familiar with the grammar rules of English and how to teach them, many of us have not spent a lot of time studying the phonetics of American English. In fact, many elements of pronunciation are just so ingrained in us from an early age that we do not even recognize them or have the ability to pinpoint what they are. We can, however, recognize when something is off or missing…when something doesn’t sound quite right. The challenge, then, is identifying the issue, explaining the problem, and helping the student correct it. Not an easy task! While the VIPKID curriculum does a pretty good job teaching phonics, it does not really provide as much for teaching phonetics (aside from the PreVIPKID curriculum). The workshops and materials for teachers do provide instruction on synthetic phonics, which is helpful for reading and for the pronunciation of certain sounds to some extent. If you haven’t looked over the information on synthetic phonics, I recommend you take some time to do that because it is a good place to start.

When we are talking about pronunciation, it is important to remember that we are actually examining two things: segmentals and suprasegmentals. Segmentals are the individual sounds; suprasegmentals apply to different segments that come together. Basically, you can think of suprasegmentals as all the “other stuff” that affects pronunciation: intonation, word stress, syllable stress, prosody/rhythm, etc. Many people think of pronunciation as simply pronouncing all of the sounds correctly, but that is only one small part of the way we speak. In fact (and I find this super interesting!), for many American listeners, fixing the suprasegmentals of a non-native English speaker’s speech can actually have a bigger impact on perceived “accent” than fixing segmentals. The bad news is…suprasegmentals are less tangible for most people and are generally much harder to adjust once they are “set” by our native language. Obviously, there is no hope in trying to explain to BaoBao the difference between syllable stress and stress patterns in descriptive phrases vs. set phrases. Many native English speaking adults will look at you funny if you try to explain it! The good news is, most of the VIPKID students are still young enough to copy and acquire these elements without having to understand them, which is why the listening phase and the parroting-everything-back phase are actually really useful as long as the teacher is speaking slowly yet naturally.


I do, however, think it is helpful for the teacher to have some knowledge of suprasegmentals, especially those elements that are specifically harder for native Chinese speakers. This allows the teacher to be prepared for common mistakes. It can also help us remember to continue to speak “naturally” even when we slow down our normal speech rate for beginner students. Keeping the importance of suprasegmentals in mind will also ensure that we take the parts of the VIPKID curriculum that really help with suprasegmentals seriously: songs and poems. Yes, I am sure we have all gotten to that dreaded Five Little Monkeys Jumping on the Bed slide with 30 seconds left on the clock and thought…NOOOOOO! I personally dread all the songs actually, because I have a horrible singing voice. It is truly dreadful.


This part of the curriculum serves an important purpose though. Even if the student has no clue what we are saying, when they mimic your singing, they are incorporating the suprasegmental elements that they need. It is actually beneficial that they don’t understand the meaning, because this allows them to focus completely on mimicking your prosody, stress, and intonation…all the things that are almost impossible to teach. You might even notice that, if the student is more advanced, they try to read the words to the song rather than just mimic you, and they end up getting the rhythm wrong. Poetry acts in the same way because, even though you are not singing, the intonation and stress is exaggerated. And again, the WAY you speak is the focus of the activity rather than the meaning of the content.

I’ll be doing a short series of posts on three suprasegmentals: intonation, stress, and prosody. I plan to give a short overview of what they are, what to keep in mind when working with native Mandarin speakers, and (hopefully!) a few useful tips for helping your students with this element of pronunciation in the context of the VIPKID classroom. After that, I will write a short series on helping students listen to and reproduce the more difficult individual sounds. Although teaching pronunciation can be tricky and sometimes straight up confusing, it is an essential part of learning a language. The more you know about what your mouth (and everything in it!) is doing, the easier it will be to teach correct pronunciation to your VIPKID students.