Figure 1: Siri cost Apple like $300 million dollars to create and involved speech recording (phonetics), speech processing (phonetics), and speech annotation (phonetics). |
Trying to educate other academics about phonetics is a rather more difficult task, however. Academics are a proud group composed of people who make a living being authorities on arcane topics. Tell them that you study a topic that they believe they know about (like language, ahem) and they will be highly motivated to voice their opinion, even though they may know as much about it as the average non-academic. Frankly, academics are terrible at admitting ignorance. I'll admit that I struggle with this too when it comes to areas that I think I know about. In response to this, I have created a short guide to phonetics as a way to tell other academics two things: (1) phonetics is an active area of research and (2) there is a lot we do not know about speech.
I. Starting from Tabla Rasa
Let's start with what phonetics is and is not. Phonetics is the study of how humans produce speech sounds (articulatory phonetics), what the acoustic properties of speech are (acoustic phonetics), and even how air and breathing are controlled in producing speech (speech aerodynamics). It has nothing to do with phonics, which is the connection between speech sounds and letters in an alphabet. In fact, it has little to do with reading whatsoever. After all, there are no letters in spoken language - just sounds (and in the case of sign languages, just gestures).
- Voicing - whether your vocal folds (colloquially called your "vocal cords") are vibrating when you make the speech sound.
- Place of articulation - where you place your tongue or lips to make the speech sound.
- Manner of articulation - either how tight of a seal you make between your articulators in producing the speech sound or the cavity that the air flows through (your mouth or your nose being the two possibilities).
Why define speech this way? First, is scientifically accurate/testable. After all, the same sound should be produced in a similar way by different speakers. We can measure exactly how sounds are produced by imaging the tongue moving; or by recording a person and looking at an image of the acoustics of specific sounds. The figure below shows just one method phoneticians can use to examine how speech is produced.
Second, this way of looking at speech is also useful for understanding grammatical patterns. When we learn a language, we rely on regularities (grammar) to form coherent words and sentences. For a linguist (and a phonetician), grammar is not something learned in a book and explicitly taught to speakers. Rather, it is tacit knowledge what we, as humans, learn by listening to other humans producing language in our environment.
To illustrate this, I'll give you a quick example. In English, you are probably familiar with the plural suffix "-s." You may not have thought about it this way, but this plural can be pronounced three ways. Consider the following words:
In the first column, the plural is pronounced like the "z" sound in English. In the IPA this is transcribed as [z]. In the second column, the plural is pronounced like the "s" sound in English - [s] in the IPA. In the third column, the plural is pronounced with a short vowel sound and the "z" sound again, transcribed as [ɨz] in the IPA.
Why does the plural change its pronunciation? The words in the first column all end with a speech sound that is voiced, meaning that the vocal folds are vibrating. The words in the second column all end with a speech sound that is voiceless, meaning that the vocal folds are not vibrating. If you don't believe me, touch your neck while pronouncing the "m" sound (voiced) and you will feel your vocal folds vibrating. Now, try this with while pronouncing the "th" sound in the word "bath." You will not feel anything because your vocal folds are not vibrating. In the third column, all the words end with sounds that are similar to the [s] and [z] sounds in place and manner of articulation. So, we normally add a vowel to break up these sounds. (Otherwise, we would have to pronounce things like wishs and churchs, without a vowel to break up the consonants.) What this means is that these changes are predictable; it is a pattern that must be learned. English-speaking children start to learn it between ages 3-4 (Berko, 1958).
Why does this rule happen though? To answer this question, we would need to delve further in to how speech articulations are produced and coordinated with each other. Importantly though, the choice of letters is not relevant to knowing how to pronounce the plural in English. It's the characteristics of the sounds themselves. Rules like these (phonological rules) exist throughout the world's languages, whether the language has an alphabet or not - and only about 10% of the world's languages even have a writing system (Harrison, 2007). Unless you are learning a second language in a classroom, speakers and listeners of a language learn such rules without much explicit instruction. The field of phonology focuses on how rules like these work across the different languages of the world. The basis for these grammatical rules is the phonetics of the language.
II. Open areas of research in phonetics
The examples above illustrate the utility of phonetics for well-studied problems. Yet, there are several broad areas of research in phoneticians occupy themselves with. I will focus on just a few here to give you an idea of how this field is both scientifically interesting and practically useful.
a. Acoustic phonetics and perception
When we are listening to speech, a lot is going on. Our ears and our brain (and even our eyes) have to decode a lot of information quickly and accurately. How do we know what to pay attention to in speech signal? How can we tell whether a speaker has said the word 'bed' or 'bet' to us? Speech perception concerns itself both with what characteristics of the sounds a listener must pay special attention to and how they pay attention to these sounds.
This topic is hard enough when you think about all the different types of sounds that one could examine. It is even harder when you consider how multilingual speakers do it (switching between languages) or the fact that we perceive speech pretty well even in noisy environments. Right now, we know a bit about how humans perceive speech sounds in laboratory settings, but much less so in more natural environments. Moreover, most of the world is multilingual, but most of our research on speech perception has focused on people who speak just one language (often English).
There is also a fun fact relevant to acoustics and perception - there are no pauses around most words in speech! Yet, we are able to pull out and identify individual words without much difficulty. To do this, we must rely on phonetic cues to tell us when words begin and end. An example of this is given in Figure 3. Between these five words there are no pauses but we are aware of when one word ends and another begins.
How are humans able to do all of this so seamlessly though? and how do they learn it? Acoustic phonetics examines questions in each of these areas and is itself a broad sub-field. Phoneticians must be able to examine and manipulate the acoustic signal to do this research.
Is this research useful though? Consider that when humans lose hearing or suffer from conditions which impact their language abilities, they sometimes lose the ability to perceive certain speech sounds. Phoneticians can investigate the specific acoustic properties of speech that are most affected. Moreover, as I mentioned above, the speech signal has no pauses. Knowing what acoustic characteristics humans use to pick apart words (parse words) can help to create software that recognizes speech. These are among a few of the many practical uses of research in acoustic phonetics and speech perception.
b. Speech articulation and production
When we articulate different speech sounds, there is a lot that is going on inside of our mouths (and in the case of sign languages, many different manual and facial gestures to coordinate). When we speak slowly we are producing 6-10 different sounds per second. When we speak quickly, we can easily produce twice this number. Each consonant involves adjusting your manner of articulation, place of articulation, and voicing. Each vowel involves adjusting jaw height, tongue height, tongue retraction, and other features as well. The fact that we can do this means that we must be able to carefully and quickly coordinate different articulators with each other.
To conceptualize this, imagine playing a piano sonata that requires a long sequence of different notes to be played over a short time window. The fastest piano player can produce something like 20 notes per second (see this video if you want to see what this sounds like). Yet, producing 20 sounds per second, while fast, is not that exceptional for the human vocal tract. How do speakers coordinate their speech articulators with each other?
Phoneticians that look at speech articulation and production investigate both how articulators move and what this looks like in the acoustic signal. Your articulators are the various parts of your tongue, your lips, your jaw, and the various valves in the posterior cavity. The way in which these articulators move and how they are coordinated with one another is important both for understanding how speech works from a scientific perspective and extremely useful for clinical purposes. One of the reasons that this is important is that movements overlap quite a bit.
Since we are familiar with writing, we like to think that sounds are produced in a clear sequence, one after the other, like beads on a string. After all, our writing and even phonetic transcription reflects this. Yet, it's not the truth. Your articulators overlap all the time and moving your lips for the "m" sound in a word like "Amy" overlaps with moving your lips in a different way for the vowels.
To provide an example, in Figure 4, a Korean speaker is producing the (made-up) word /tɕapa/. The lower panels show just when the tongue and lips are moving in pronouncing this word. If you look at the spectrogram (the large white, black, and grey speckled figure in the middle), you can observe what looks like a gap right in the middle of the image. This is the "p" sound. Now, if you look at the lowest panel, we observe the lower lip moving upward for making this sound. This movement for the "p" happens much earlier than what we hear as the [p] sound, during the vowel itself. Where is the "p" then? Isn't it after the /a/ vowel (sounds like "ah")? Not exactly. Parts of it overlap with the preceding and following vowels, but parts of those vowels also overlap with the "p." In the panel labelled TLy, we are observing how high the tongue is raised. It stays lowered throughout this word because it needs to stay lowered for the vowel /a/. So, the "a" is also overlapping with the "p" here.
Overlap in speech is the norm and sometimes speakers move their articulators in ways that are unexpected. You might struggle to coordinate your articulators in a particular way when you are learning new sounds in a first language (as a child) or new sounds in a second language (as a child or adult). You also might have difficulty producing sequences of sounds due to a range of physical or cognitive disorders. By looking at speech articulation, phoneticians are able to examine what is typical in speech and also what is atypical.
One fun way to examine what speakers can do is to have them speak really quickly or give them tongue twisters. As mentioned earlier, speech can be really fast. The Korean speaker above produced 5 speech sounds in just 400 milliseconds (12 sounds per second) and she was speaking carefully. When speakers speed up, phoneticians can both determine where difficulties arise and how different movements must be adjusted relative to one another.
References:
Berko, J. (1958). The child's learning of English morphology. Word, 14(2-3), 150-177.
Harrison, K. D. (2007). When languages die. Oxford University Press.
Mielke, J, Olson, K, Baker, A, and Archangeli D (2011) Articulation of the Kagayanen interdental approximant: An ultrasound study. Journal of Phonetics 39:403-412.
Figure 2: An ultrasound image of the surface of the tongue, from Mielke, Olson, Baker, and Archangeli (2011). Phoneticians can use ultrasound technology to view tongue motion over time. |
Second, this way of looking at speech is also useful for understanding grammatical patterns. When we learn a language, we rely on regularities (grammar) to form coherent words and sentences. For a linguist (and a phonetician), grammar is not something learned in a book and explicitly taught to speakers. Rather, it is tacit knowledge what we, as humans, learn by listening to other humans producing language in our environment.
To illustrate this, I'll give you a quick example. In English, you are probably familiar with the plural suffix "-s." You may not have thought about it this way, but this plural can be pronounced three ways. Consider the following words:
[z] plural | [s] plural | [ɨz] plural | ||
drum - drum[z] | mop - mop[s] | bus - bus[ɨz] | ||
rib - rib[z] | pot - pot[s] | fuzz - fuzz[ɨz] | ||
hand - hand[z] | bath - bath[s] | wish - wish[ɨz] | ||
lie - lie[z] | tack - tack[s] | church - church[ɨz] |
In the first column, the plural is pronounced like the "z" sound in English. In the IPA this is transcribed as [z]. In the second column, the plural is pronounced like the "s" sound in English - [s] in the IPA. In the third column, the plural is pronounced with a short vowel sound and the "z" sound again, transcribed as [ɨz] in the IPA.
Why does the plural change its pronunciation? The words in the first column all end with a speech sound that is voiced, meaning that the vocal folds are vibrating. The words in the second column all end with a speech sound that is voiceless, meaning that the vocal folds are not vibrating. If you don't believe me, touch your neck while pronouncing the "m" sound (voiced) and you will feel your vocal folds vibrating. Now, try this with while pronouncing the "th" sound in the word "bath." You will not feel anything because your vocal folds are not vibrating. In the third column, all the words end with sounds that are similar to the [s] and [z] sounds in place and manner of articulation. So, we normally add a vowel to break up these sounds. (Otherwise, we would have to pronounce things like wishs and churchs, without a vowel to break up the consonants.) What this means is that these changes are predictable; it is a pattern that must be learned. English-speaking children start to learn it between ages 3-4 (Berko, 1958).
Why does this rule happen though? To answer this question, we would need to delve further in to how speech articulations are produced and coordinated with each other. Importantly though, the choice of letters is not relevant to knowing how to pronounce the plural in English. It's the characteristics of the sounds themselves. Rules like these (phonological rules) exist throughout the world's languages, whether the language has an alphabet or not - and only about 10% of the world's languages even have a writing system (Harrison, 2007). Unless you are learning a second language in a classroom, speakers and listeners of a language learn such rules without much explicit instruction. The field of phonology focuses on how rules like these work across the different languages of the world. The basis for these grammatical rules is the phonetics of the language.
II. Open areas of research in phonetics
The examples above illustrate the utility of phonetics for well-studied problems. Yet, there are several broad areas of research in phoneticians occupy themselves with. I will focus on just a few here to give you an idea of how this field is both scientifically interesting and practically useful.
a. Acoustic phonetics and perception
When we are listening to speech, a lot is going on. Our ears and our brain (and even our eyes) have to decode a lot of information quickly and accurately. How do we know what to pay attention to in speech signal? How can we tell whether a speaker has said the word 'bed' or 'bet' to us? Speech perception concerns itself both with what characteristics of the sounds a listener must pay special attention to and how they pay attention to these sounds.
This topic is hard enough when you think about all the different types of sounds that one could examine. It is even harder when you consider how multilingual speakers do it (switching between languages) or the fact that we perceive speech pretty well even in noisy environments. Right now, we know a bit about how humans perceive speech sounds in laboratory settings, but much less so in more natural environments. Moreover, most of the world is multilingual, but most of our research on speech perception has focused on people who speak just one language (often English).
How are humans able to do all of this so seamlessly though? and how do they learn it? Acoustic phonetics examines questions in each of these areas and is itself a broad sub-field. Phoneticians must be able to examine and manipulate the acoustic signal to do this research.
Is this research useful though? Consider that when humans lose hearing or suffer from conditions which impact their language abilities, they sometimes lose the ability to perceive certain speech sounds. Phoneticians can investigate the specific acoustic properties of speech that are most affected. Moreover, as I mentioned above, the speech signal has no pauses. Knowing what acoustic characteristics humans use to pick apart words (parse words) can help to create software that recognizes speech. These are among a few of the many practical uses of research in acoustic phonetics and speech perception.
b. Speech articulation and production
When we articulate different speech sounds, there is a lot that is going on inside of our mouths (and in the case of sign languages, many different manual and facial gestures to coordinate). When we speak slowly we are producing 6-10 different sounds per second. When we speak quickly, we can easily produce twice this number. Each consonant involves adjusting your manner of articulation, place of articulation, and voicing. Each vowel involves adjusting jaw height, tongue height, tongue retraction, and other features as well. The fact that we can do this means that we must be able to carefully and quickly coordinate different articulators with each other.
To conceptualize this, imagine playing a piano sonata that requires a long sequence of different notes to be played over a short time window. The fastest piano player can produce something like 20 notes per second (see this video if you want to see what this sounds like). Yet, producing 20 sounds per second, while fast, is not that exceptional for the human vocal tract. How do speakers coordinate their speech articulators with each other?
Phoneticians that look at speech articulation and production investigate both how articulators move and what this looks like in the acoustic signal. Your articulators are the various parts of your tongue, your lips, your jaw, and the various valves in the posterior cavity. The way in which these articulators move and how they are coordinated with one another is important both for understanding how speech works from a scientific perspective and extremely useful for clinical purposes. One of the reasons that this is important is that movements overlap quite a bit.
Since we are familiar with writing, we like to think that sounds are produced in a clear sequence, one after the other, like beads on a string. After all, our writing and even phonetic transcription reflects this. Yet, it's not the truth. Your articulators overlap all the time and moving your lips for the "m" sound in a word like "Amy" overlaps with moving your lips in a different way for the vowels.
To provide an example, in Figure 4, a Korean speaker is producing the (made-up) word /tɕapa/. The lower panels show just when the tongue and lips are moving in pronouncing this word. If you look at the spectrogram (the large white, black, and grey speckled figure in the middle), you can observe what looks like a gap right in the middle of the image. This is the "p" sound. Now, if you look at the lowest panel, we observe the lower lip moving upward for making this sound. This movement for the "p" happens much earlier than what we hear as the [p] sound, during the vowel itself. Where is the "p" then? Isn't it after the /a/ vowel (sounds like "ah")? Not exactly. Parts of it overlap with the preceding and following vowels, but parts of those vowels also overlap with the "p." In the panel labelled TLy, we are observing how high the tongue is raised. It stays lowered throughout this word because it needs to stay lowered for the vowel /a/. So, the "a" is also overlapping with the "p" here.
Overlap in speech is the norm and sometimes speakers move their articulators in ways that are unexpected. You might struggle to coordinate your articulators in a particular way when you are learning new sounds in a first language (as a child) or new sounds in a second language (as a child or adult). You also might have difficulty producing sequences of sounds due to a range of physical or cognitive disorders. By looking at speech articulation, phoneticians are able to examine what is typical in speech and also what is atypical.
One fun way to examine what speakers can do is to have them speak really quickly or give them tongue twisters. As mentioned earlier, speech can be really fast. The Korean speaker above produced 5 speech sounds in just 400 milliseconds (12 sounds per second) and she was speaking carefully. When speakers speed up, phoneticians can both determine where difficulties arise and how different movements must be adjusted relative to one another.
References:
Berko, J. (1958). The child's learning of English morphology. Word, 14(2-3), 150-177.
Harrison, K. D. (2007). When languages die. Oxford University Press.
Mielke, J, Olson, K, Baker, A, and Archangeli D (2011) Articulation of the Kagayanen interdental approximant: An ultrasound study. Journal of Phonetics 39:403-412.