Flap! Flap! (Or maybe, tap?!)

Everybody flaps! Well, at least, native English speakers from North America do it. The funny thing about it is, many English speakers don’t realize that they are doing it. If someone asks us what sound the letter T makes, we probably respond by making the “tuh” sound. It’s the sound we make when we sound out the letter T in individual words. However, in actual speech, it isn’t so simple.


In a previous post, I talked about the “fast D” sound. However, since it can be a source of confusion, I wanted to look at it in a little more detail. I think the easiest way to explain and teach this sound in an ESL context is to describe it as a D. The actual name for this is  alveolar flapping. The T or D changes to this sound in certain circumstances, mainly when the T or D comes between two vowel sounds with the second vowel being unstressed. This is true when the sound occurs within a word, and it’s also true when the sound occurs in connected words.

For example:
Butter is pronounced “budder.”

Get up! is pronounced “ged-up!”

For this reason, the words “feudal” and “futile” often sound the same, even though the spelling is quite different. If you have been speaking American English your whole life, chances are you don’t even notice. To make it a bit more confusing, oftentimes if a native speaker slows down and pays extra attention to an individual word, they revert back to pronouncing a clear T or D.

But I’m an ESL teacher! I speak clearly and enunciate every sound! Well, no. You don’t actually. And if you did, you would sound very weird. Like. A. Robot.


In fact, the alveolar flap is a very natural sound. It’s not lazy or informal American English at all. It’s just how we speak.

If you are a VIPKID teacher and you don’t have an ESL background, you might have gotten confused when this topic popped up in the pronunciation courses. One reason might be confusion over the D sound. Most ESL textbooks (and the VIPKID curriculum) just use the letter D to describe the sound. However, that’s not accurate. It’s not the harsher, initial D sound that we produce when the D comes at the beginning of the word. It is a softer, faster D, the same sound we make when the D comes in the middle of a word. So, if you are making a harsh, strong “DUH” sound, then it is going to sound funny and incorrect.


The second reason for confusion is our letter/sound association. We spent all our lives thinking that T makes a “tuh” sound. It can tricky to re-evaluate that association. The best way is to close your eyes and say the word without looking at it. Don’t think of the way it is spelled. Think only about the shape and movement of your mouth.

Say these words out loud at a normal speech rate: butter; wedding; party

If you pay careful attention, you can feel that your tongue is making the same movement in all three words. So much about how we speak, we do without thinking. It’s simply ingrained in us from a young age. However, if you are going to teach pronunciation, you have to start thinking about and feeling what’s going on in your mouth and throat. You also have to start listening for sounds, not thinking about letters or spelling.

Another reason for confusion is that the sound does not always occur between two vowels. It occurs when the second vowel sound is unstressed and reduced. That’s why you don’t hear it in a word like “retail,” for example. Also, we are talking about vowel sounds here, not letters. That’s why you do hear the sound in a word like “party.” Finally, it does not occur in foot-initial positions, such as in the word “Mediterranean.”


Well, this is all very confusing! How do I teach this?!

Like I mentioned, most textbooks just use the letter D, and I think that really is the easiest approach. Certainly attempting to explain the alveolar flap via an online classroom to a beginner English speaker on the other side of the planet is not going to go well. If you want to make a distinction for a more advanced student, I think referring to it as a “medial D” or “fast D” is sufficient. The most effective way to teach it is simply to say the word correctly as you would in normal speech so the student can listen and re-produce the sound. You might have to pay a little extra attention to this when you are slowing down your speech rate for a more beginner student.

In the context of VIPKID, parents are paying for an American teacher for a reason: they want their child exposed to the American accent. If language is all about communication, then we have to teach the language as it is spoken and understood. For those of you who don’t have a background in teaching ESL, or even better…in linguistics or speech pathology, teaching these weird pronunciation elements might seem a little daunting. The good news is, you do know them. You just might not know that you know them! But if you take some time and really pay attention to your mouth and the sounds you make, you will hopefully find that teaching pronunciation is actually really fun and rewarding!


Teaching Pronunciation: Intonation

Next up, intonation! What is intonation exactly, and why does it matter? Intonation is made up of the pitches that rise and fall when we speak. When we speak, intonation acts like punctuation. Although we don’t think about it too often, our intonation actually communicates a lot about our intentions and emotions. Misplaced intonation can not only make the speaker’s English sound “off” or “accented,” but it can also give off the wrong impression or cause miscommunication.


American English relies a lot on falling intonation, which is when we drop or lower our voice at the end of a phrase. We tend to use it at the end of a thought for short assertions and questions with interrogative words. For example: It’s hot today. What are you wearing? In both examples, your voice naturally drops to indicate that you have completed the thought. Sometimes, in more complex sentences, we fall or drop more than once to indicate the separation of phrases or ideas. This acts in a similar way as a comma or a semicolon.

We also rely on rising intonation when we are asking a yes or no question. For example: Is it hot? In this example, your voice rises when you say the word “hot,” indicating to the listener that you expect a response. Sometimes ESL texts will mistakenly teach that all questions need rising intonation. This isn’t true. Think about how you say these two questions: Is it hot? Why is it so hot? In the first example, the intonation rises at the end. In the second example, “why” is stressed, and the intonation drops at the end.

For some sentences, we mix up the intonation. If we have an introductory phrase or clause, sometimes we rise at the end of the first part and fall when the sentence is completed. For example: If I go outside, I’ll get hot. We naturally rise a bit when we say “outside,” and we fall when we complete the thought with the word “hot.” We also go up and down when we are asking about two or more things. For example: Is it hot (rise) or cold (fall)?

If we are saying all the correct words, then why does intonation even matter that much? Surely the listener can figure out what we mean, right? Well, sometimes but not always. Plus, listeners can subconsciously judge the speaker by these little cues, even if they don’t intend to. Strong, falling intonation at the end of each phrase (or “lexical chunk”) makes the speaker sound more confident. Misplaced rising intonation makes the speaker sound confused or insincere.

A common mistake for Mandarin speakers is to increase their volume to stress meaning rather than use their intonation. In their native language, a change in tone indicates a totally different word. So, they often give equal stress to each word and up the volume to give certain words more value. This can come across as aggressive or angry, which is unfortunate when the speaker does not have that intention.


Fortunately though, with VIPKID, you are working with young students who still have a lot of linguistic flexibility. With the very young students, you might notice that they naturally copy your intonation. Your best strategy with the young ones is to pay attention to your own intonation. Make sure your speech stays as natural as possible, even when you slow the pace down. For example: Can you circle (rise)? Yes! (fall) I can circle! (fall) For the really young students, you can also practice repeating “uh-oh!” and “oh no!” with an intonation shift. This can actually be a pretty fun game. Drag a character or image off the screen and say, “oh no!” Sometimes we combine this with practicing “goodbye.” Either the teacher or the student will say “bye!” and lean over so they aren’t in view. Student or teacher then says, “Oh no! Where’s Student/Teacher?” It’s silly and exaggerated, and I’ve found that the young ones tend to copy my intonation exactly when we do it.

With the older, more advanced students, visual cues help. I like to draw little arrows to indicate the ups and downs. When you have longer reading passages, drawing arrows to coincide with the punctuation helps highlight how intonation acts as audible punctuation for the listener.


One great thing about focusing on intonation is that it naturally lends itself to fixing another common problem for Mandarin speakers: the dropped word endings. You’ve probably noticed that many of your VIPKID students struggle with their final S, T, L, D, and B sounds. It doesn’t come easily for them, so many students drop the sounds as they speak. However, focusing on intonation requires pauses as we rise and fall, which often helps the student slow down to finish the word correctly. Once the student gets in the habit of moving their pitch up and down, it is easier to add stress to place value on words rather than shooting the words out one by one. Intonation goes hand in hand with word stress, and when we stress a word, we are more likely to hit that final consonant as well.

Happy teaching!


Affective Filter Hypothesis and the PreVIP Class

If you are reading about language acquisition, you have probably frequently encountered the words “input,” “output.” Everyone wants to figure out: what is the best amount of input vs. output. How can we best get the input in the student and the output out of the student? How do we determine quality vs. quantity? Linguists and educators (all of them a lot smarter than I am!) have written a ton on the subject, but what I want to focus on today is the affective filter hypothesis. It basically deals with how non-linguistic factors, such as anxiety or stress, prohibit input. These negative feelings seem to act like a filter or screen, preventing the language from getting from the outside (the teacher’s mouth) into the mind of the student.

If you’ve never experienced learning a second language, you have still probably experienced something similar to this. Maybe you or a loved one has been really sick, and you are listening to the doctor explain the possible treatment. The situation is causing you to feel scared and anxious, and even though you are trying to listen to the doctor, you are not getting all the information into your brain. Sometimes we don’t even realize that we didn’t properly get the information until after the situation is over and our stress levels have dropped. So, even though we heard all the words, the stress filtered the comprehensible input.

Now, some suggest that the affective filter hypothesis doesn’t apply to children because they lack the affective filter that causes problems for adult language learners. I disagree. Children, from a very young age, experience stress and anxiety, especially in unfamiliar learning situations…and especially when they feel pressure or “on the spot.” I do think that very small babies acquire language in a very stress-free environment. If you watch how people interact with infants, it is hilarious. We smile, coo, and act goofy…right up in their little faces. We repeat the same words, mama, mamma, mama…over and over again from the time they are first born. And when they respond, we go nuts. Baby says “mama” for the first time, and the room lights up, everyone repeating “Mama!” “Mama!” Yay! Cheers! When, a few months later, baby waves bye bye to a random stranger in Target, the whole aisle goes nuts. Everyone smiles and waves. Baby hear friendly excited voices repeating “bye bye!” We tend to respond to babies like their words are downright miraculous. And you know what? They are! This tiny new human is building a foundation for a lifetime of communicating with the world the thoughts and feelings that will grow in their hearts and minds.

As we grow, we get more and more self aware and self conscious. And unfortunately, the world around us also gets more and more critical. Probably no one is cheering on your every utterance in the Target aisle anymore. Bummer.

So, by the time a kid is 3-4 years old, while they might not be as self conscious about making a mistake as an adult, they still experience stress and anxiety. When you have a PreVIP student, they are experiencing a new learning format for the first time. They are looking at a new face on a screen and listening to a new voice making sounds they don’t understand. It’s understandable that a lot of kids are going to feel at least some level of stress in that situation.

The reason why I wanted to focus on the affective filter and the very young student is because we generally think of anxiety in relation to output rather than input. When I think of stress and language, I tend to think of “freezing up” or stage fright…being scared and unable to respond. However, the affective filter hypothesis claims that negative emotions influence input. So, we have to keep in mind how stress is affecting the language coming in, not just the language coming out. This is really important for the younger students in the “listening phase” of language learning because input is our main goal.

Long story short, we have to find ways of lowering that filter, or in other words, we need to try to mitigate those negative feelings. And we have to do it in a different environment than most of us are used to because we are on a tiny screen rather than in person.

The most important thing you can do is smile. Right off the bat, you absolutely have to smile. It’s universal. We all want to see a smiling face. And you must keep smiling, no matter what. On a practical note, I find that my face reverts to a more neutral expression when I’m drawing on the screen or typing. So, I have to make a conscious effort to keep smiling. Second practical note, it helps to wear lipstick. I’m not a big make-up person, and I hate wearing lipstick. However, I can’t deny that my smile shows up and projects across camera better when I have some color.

You should also double check your set up. You don’t want to be too far or too close to your screen. You also need to make sure you are eye level. It is important that you don’t appear to be looking down at the student because that can be intimidating. You can prop your laptop up on books if you need to.

Another thing you can do is use familiar props. I like to use small Minnie and Mickey figures. I can hold them up next to my face, and I can bring them closer and then farther away from the screen. I’m not a familiar face, but Mickey almost always is! I also like small rubber duckies or dinosaurs. I know VIPKID recommends using a print out of the Mike and Meg characters, but they aren’t familiar at first so I don’t use those right away. I also like to start with sounds instead of words. The curriculum does this as well. For example, instead of holding up the dinosaur and saying “Hi dino,” I just hold up the dinosaur and say “roar!” Roar! Yay! Roar! Let the kid roar back a few times, then say “hi dino…roar!” The “roar” is familiar and fun before saying the English word “hi.” If I think the “roar” is going to be too scary, I do the same thing with a little duck. However, it seems like even the shyest kids like to roar.

One thing that is necessary yet stress inducing is correction. We sometimes have to correct pronunciation…it is inevitable. You can do it in a fun, easy way though. Using the recast technique is a friendly way to correct without being intimidating or making the student feel uncomfortable. Basically, when the student says something incorrectly, just say it back correctly. Give them a chance to say it right, and then move on. Don’t frown or say “that was wrong.” Keep your positive momentum and be really enthusiastic with your praise when the student says it correctly.

The last thing that helps me is simply acknowledging that the stress is there and that it is hindering the input. Keeping that in mind reminds me to be more compassionate toward the little student on the screen. It helps to remember that making the student comfortable is a constant, important part of the job, not just an occasional side hurdle to deal with.

Good luck with the little ones! I know that the beginner lessons can be difficult. Doing our best to reduce the non-linguistic factors that hinder language input can be challenge; however, it really is remarkable how language is acquired and developed, and it can be really fun to watch a student go from nervous and scared to happy and excited as they get more comfortable with you and more confident in their language skills.