Friday, June 26, 2020

What's universal in phonetics?

As a fieldworker, I'm often struck by how many linguistic patterns I've observed that just "shouldn't" occur. Linguistics often propels itself as a field by asserting theories that are both too strong and too myopic. The thinking goes that one should assume universality first and then adjust accordingly afterwards (or unfortunately, ignore exceptions and continue on).

In phonetics, there has been a long history around the notion of universalism. Jakobson, Fant, and Halle (1961) assumed that one needed only distinctive features to characterize cross-linguistic differences. Once you got features down, you could just assume that all speakers had the same sort of mapping from features to articulation. This idea persisted into the 1970's (at least among phonologists), but began to break down in the 1980's - 1990's with Pat Keating's work on voicing (1984), Doug Whalen's discussion of coarticulation (1990), and Kingston & Diehl's discussion about "automatic" and "controlled" phonetics (1994). The emerging consensus from this earlier work and the resulting evolution of laboratory phonology was that phonetic patterns are closely controlled by speakers and many patterns are language-specific.1

Ladd's (2014) book provides a nice overview of many of these ideas - in particular the view that "Phonologists want their descriptions to account for the phonetic detail of utterances. Yet most are reluctant to consider the use of formalisms involving continuous mathematics and quantitative variables, and without such formalisms, it is doubtful that any theory can deal adequately with all aspects of the linguistic use of sound." (p.51)

If we fast-forward to the present day, the landscape of phonetics and phonology is quite different than what it used to be. I think most laboratory phonologists (and most phonologists nowadays are laboratory phonologists) would agree that representations reflect distributions of productions in some way and that the statistical and articulatory details can vary in a gradient way across languages.

With this in mind, what is left of phonetic universals? There are certainly several universals regarding phonological inventories that could be discussed (see Gordon's recent 2016 book on the topic). But what of phonetic patterns that are best captured quantitatively? What are the universals and near universals? I thought I would start to collect a list of these here as a way to organize my thoughts and to challenge/question my assumptions. I invite anyone to propose additional things here too.

1. Dorsal stops (almost always) have longer VOT (voice onset time) than coronal or labial stops
On the basis of looking at 18 different languages, Cho and Ladefoged (1999) first noted that, after one adjusts for the same laryngeal category (voiced, voiceless, voiceless aspirated), dorsal stops will tend to have a longer VOT than coronal or labial stops. A more recent analysis of this question is found in Chodroff et al. (2019) where the authors looked at over 100 different languages. Of the languages that they sampled, 95% displayed the dorsal > coronal pattern. This finding probably relates to a mechanical constraint on movement of the tongue dorsum. Since the dorsum has greater mass, the release portion tends to take longer (Stevens 2000). All else being equal, larger articulators usually move more slowly than smaller ones - a general principle of physiology and movement. This longer release portion delays venting of the supralaryngeal cavity which ultimately facilitates aerodynamic conditions for voicing.

Chodroff et al.'s sampling revealed another near universal - VOT is strongly correlated within a particular language. That is, if a language tends to have very short lag VOT values for one stop consonant, it has very short lag VOT values for all the others too. This finding is interesting since it suggests that speakers and languages produce identical laryngeal gestures regardless of the supralaryngeal constriction. There is some physiological evidence for this universal (Munhall & Löfqvist 1992).

2. All languages have utterance-final lengthening.

Though languages tend to vary in the extent to which words are lengthened at phrase-final or utterance-final position, it seems to have been found in every language where it has been investigated (Fletcher 2010, White et al. 2020). Even languages which lack phonological units used in intonation systems (boundary tones, pitch accents) seem to have utterance-final lengthening (DiCanio and Hatcher, 2018, DiCanio et al. 2018, in press).

There is probably a biomechanical explanation for utterance-final lengthening based on articulatory slowing at the end of utterances. As speakers are finishing utterances, their articulators gradually move more slowly (Byrd & Saltzman 2003). The scope of this effect varies across languages and it is not yet clear whether certain syllable types are more affected than others, i.e. closed syllables or syllables with short vowels might undergo less final lengthening.

3. Languages optimize the distance between vowels in articulation/acoustics.

I'll leave it open for now whether this refers just to articulatory dispersion or acoustic dispersion (there is debate around this, of course), but it seems like most languages try to optimize the height and backness of vowels. In languages with asymmetric vowel systems, e.g. /i, e, a, o/, or /i, e, ɛ, a, o, u/, the back vowels will have F1 values that often sit in-between the values for the corresponding front vowels (Becker-Kristal 2010). Becker-Kristal looked at the acoustics of over 100 different languages and found this to be a general pattern. The opposite pattern is ostensibly true, but most languages have more front vowel contrasts than back vowel contrasts.

***Edited to include new things - thanks to Eleanor Chodroff, David Kamholz, Joseph Casillas, Rory Turnbull, Claire Bowern, Carlos Wagner and various others on Twitter whose identities/names are not clear.***

4. Intrinsic F0 of high vowels

There is some discussion of this effect, but it seems to be the case that, all else being equal, high vowels will have higher F0 than low vowels (Whalen & Levitt 1995). In all languages where it has been investigated, researchers have found positive evidence for this. Whalen & Levitt note that the explanation here has to do with enhanced subglottal pressure and greater cricothyroid (CT) activitiy in the production of high vowels relative to low vowels. Ostensibly, as the tongue is raised, it exerts a pull on the larynx via the geniohyoid and hyothyroid muscles. This raises the thyroid cartilage and thus exerts pull on the cricothyroid itself (raising F0). Greater subglottal pressure would then be needed to surpass the impedance due to greater vocal fold tension.

There is a tendency, however, to not observe the effect in low F0 contexts, in particular for low tones in tone languages. I've personally wondered about this in Mixtec and Triqui languages, though it is usually quite difficult to control for glottalization, tone, and vowel quality all at once in these languages in order to investigate this question. Why might the effect not be found for low tones? One possibility is that F0 control is essentially different in a low F0 context. According to Titze's body-cover model of vocal fold vibration (1994), the thyroarytenoid (TA) muscles are more responsible for vocal fold vibration when F0 is low. Perhaps tongue raising exerts less force on the TA than it does on the CT.

5. Voiced stops are shorter in duration than voiceless stops

Voicing is hard to maintain when there is any constriction in the supraglottal cavity. Assuming no velopharyngeal port venting, supralaryngeal oral stop closure will cause a build up in pressure above the glottis which will inhibit the necessary pressure differential across the glottis required for continued voicing - the aerodynamic voicing constraint (Ohala, 1983). Thus, stops will stop voicing relatively quickly during closure. Similarly, for voiced fricatives, the necessity to maintain narrow constriction for frication and greater intra-oral air pressure relative to atmospheric air pressure is at odds with a simultaneous necessity to maintain greater subglottal pressure relative to intra-oral (supraglottal) air pressure for continued voicing. Thus, voiced fricatives will often devoice or de-fricativize (and be produced as continuants).

A consequence of the aerodynamic voicing constraint in stops is that the duration of stop voicing is limited and so, it turns out, voiced stops are shorter than voiceless ones. This has been observed since early work of Lisker (1957) (c.f. Lisker 1986 as well). It seems to be a phonetic universal. What about fricatives though? Are voiced fricatives typically shorter than voiceless ones? I think that the jury is still out on this one. While it is difficult to maintain simultaneous voicing and frication for voiced fricatives, the temporal constraints are not as clear as with stops. Yet, voiced fricatives are almost always shorter than voiceless fricatives as well. 

What's not a universal?

In thinking about ostensible phonetic universals, I am struck by many patterns that do not seem to be as universal as once believed. I am most familiar with those in the research that I have done.

6. Not a universal - word-initial strengthening

A common cross-linguistic pattern is that word-initial consonants will be produced with greater duration and/or with stronger articulations (more contact, faster velocity). Fougeron & Keating (1997) is a seminal paper observing this pattern with English speakers. It has been studied in various languages - most recently in work by Katz & Fricke (2018) and White et al. (2020). While Fougeron & Keating (1997) and subsequent work by Keating et al. (2003) do not assert that this pattern is universal, White et al. (2020) state the following (in their conclusions):

"We propose, however, that initial consonant lengthening may be likely to maintain a universal structural function because of the critical importance of word onsets for the entwined processes of speech segmentation and word recognition."

I should admit, I'm working on a paper which addresses this claim with some of my research on Yoloxóchitl Mixtec, an Otomanguean language in Mexico. The language is prefixal and has final stress. Word-initial consonants are always shorter than word medial ones and (in the paper I'm working on now at least), undergo more lenition. You don't have to take my word about this based on something not-yet-published though. The durational finding is replicated in both DiCanio et al. (2018) and DiCanio et al. (to appear). So, three different publications all with different speakers have found the effect. (I'll just mention here, because this is a blog and not a publication, that the same pattern seems to hold in Itunyoso Triqui - another Mixtecan language with final stress and prefixation. That's another paper for this summer.)

There's an interesting thing here though - most of the languages which have been studied in relation to initial strengthening are not prefixing languages. In prefixal languages, like Scottish Gaelic, parsing word-initial consonants does not help too much in word identification (Ussishkin et al. 2017). The authors state the following:

"Our results show that during the process of spoken word recognition, listeners stick closer to the surface form until other sounds lead to an interpretation that the surface form results from the morphophonological alternation of mutation." (Ussishkin et al. 2017, p.30)

While this research does not address word-initial strengthening, it suggests that there is just something different about prefixal languages in terms of word recognition. If the goal of word-initial strengthening is to enhance cues to word segmentation, then it stands to reason that word-initial strengthening might not occur in heavily prefixing languages. At the very least, the Mixtec data show that word-initial consonant lengthening is indeed not a universal.

7. Not a universal - native listeners of a tone language are better at pitch perception than native listeners of non-tonal languages

I know, I know, you want to believe that it's true. All tone language listeners must have superpowers when it comes to perceiving pitch, right? It turns out that the evidence is quite mixed here and that the role of music experience ends up playing a big role. There are papers that have found evidence that speaking a tone language confers some benefit in pitch discrimination when listeners have to discriminate both between tonal categories and within them (Burnham et al. 1996, Hallé et al 2004, Peng et al. 2010). However, there are other papers showing no advantage (Stagray & Downs 1993, DiCanio 2012, So & Best 2010). At issue is usually the musical background of the listeners under question. In Stagray & Downs (1993), the authors chose only speakers of Mandarin who did not have musical experience and in DiCanio (2012), none of the Triqui listeners had any music experience. In So & Best (2010), the authors screened 300 Hong Kong Cantonese listeners and chose only those with (a) no knowledge of Mandarin and (b) no formal music training. Only 30/300 qualified! Many other studies finding an advantage to tone language listeners have not controlled for musical background.

So, how much does musical ability play a role in tonal discrimination? I can provide an example from some data from my 2012 paper (though this was not discussed in the paper itself). Triqui is heavily tonal, with nine lexical tones (/1, 2, 3, 4, 45, 13, 43, 32, 31/) and extensive tonal morphophonology (DiCanio 2016). One would imagine that, when presented with stimuli along a continuum between two tonal categories, e.g. falling tones 32 and 31, they might be extra careful at perceiving slight differences. It turns out that they have improved perception at perceiving between-category differences (steps 2-4, 3-5, 4-6, 5-7) than within-category differences (steps 1-3, 6-8).
Discrimination accuracy of tonal continua for Triqui and French listeners. Data from DiCanio (2012). No Triqui speaker has musical training but a subset (13/20) of the French speakers did. Discrimination is better than predicted at the end of the continuum because listeners were comparing resynthesized natural to non-resynthesized speech.
On the whole, French speakers were better at discriminating Triqui tonal pairs along the continuum than Triqui speakers were. This is quite surprising, but once we separate French speakers by their musical background, we find that the non-musicians of the bunch were worse at between-category tonal discrimination than the Triqui listeners (but better at within-category discrimination). Having some musical background (at least 2-3 years) provides a remarkable benefit to your pitch discrimination abilities. Speaking a tone language makes you good at telling apart two particular tones in your language at a categorical boundary between them, but it does not make you magically better at pitch discrimination, apparently.

-----
There are undoubtedly many other things that could be put here both for universals and no-longer universals. I'm of course very biased here as someone that works on prosody. (I tend to be more interested in the prosodic patterns.) This is intended to be a continually-developing list of things for both my personal memory and for others to contribute to (or argue with). So, any thoughts of things to add are most welcome.
_____________
1. There are many other sources here that I'm probably missing. I'd be happy to add any that people suggest.

References:
Becker-Kristal, R. (2010). Acoustic typology of vowel inventories and Dispersion Theory: Insights from a large cross-linguistic corpus. PhD thesis, UCLA.

Burnham, D., Francis, E., Webster, D., Luksaneeyanawin, S., Attapaiboon, C., Lacerda, F., and Keller, P. (1996). Perception of lexical tone across languages: evidence for a linguistic mode of processing. In Proceedings of the 4th International Conference on Spoken Language Processing, volume 4, pages 2514–2517.

Byrd, D. and Saltzman, E. (2003). The elastic phrase: modeling the dynamics of boundary-adjacent lengthening. Journal of Phonetics, 31:149–180.

Cho, T. and Ladefoged, P. (1999). Variation and universals in VOT: evidence from 18 languages. Journal of Phonetics, 27:207–229.

Chodroff, E., Golden, A., and Wilson, C. (2019). Covariation of stop voice onset time across languages: Evidence for a universal constraint on phonetic realization. Journal of the Acoustical Society of America, Express Letters, 145(1):EL109–EL115.

DiCanio, C. T. (2012). Cross-linguistic perception of Itunyoso Trique tone. Journal of Phonetics, 40:672–688.

DiCanio, C. T. (2016). Abstract and concrete tonal classes in Itunyoso Trique person morphology. In Palancar, E. and Léonard, J.-L., editors, Tone and Inflection: New Facts and New Perspectives, volume 296 of Trends in Linguistics Studies and Monographs, chapter 10, pages 225–266. Mouton de Gruyter.

DiCanio, C., Benn, J., and Castillo García, R. (2018). The phonetics of information structure in Yoloxóchitl Mixtec. Journal of Phonetics, 68:50–68.

DiCanio, C., Benn, J., and Castillo García, R. (in press). Disentangling the effects of position and utterance-level declination on tone production. Language and Speech, in press. Preprint available here.

DiCanio, C. and Hatcher, R. (2018). On the non-universality of intonation: evidence from Triqui. Journal of the Acoustical Society of America, 144:1941.

DiCanio, C., Zhang, C., Whalen, D. H., and Castillo García, R. (2019). Phonetic structure in Yoloxóchitl Mixtec consonants. Journal of the International Phonetic Association, https://doi.org/10.1017/S0025100318000294.

Fletcher, J. (2010). The prosody of speech: Timing and rhythm. In The Handbook of Phonetic Sciences, pages 521–602. Wiley-Blackwell, 2nd edition.

Fougeron, C. and Keating, P. A. (1997). Articulatory strengthening at edges of prosodic domains. Journal of the Acoustical Society of America, 101(6):3728–3740.

Gordon, M. K. (2016). Phonological Typology. Oxford University Press.

Hallé, P. A., Chang, Y. C., and Best, C. T. (2004). Identification and discrimination of Mandarin Chinese tones by Mandarin Chinese vs. French listeners. Journal of Phonetics, 32(3):395–421.

Jakobson, R., Fant, C. G. M., and Halle, M. (1961). Preliminaries to Speech Analysis: The Distinctive Features and their Correlates. MIT Press.

Katz, J. and Fricke, M. (2018). Auditory disruption improves word segmentation: A functional basis for lenition phenomena. Glossa, 3(1):1–25.

Keating, P. (1984). Phonetic and phonological representation of stop consonant voicing. Language, 60:286–319.

Keating, P., Cho, T., Fougeron, C., and Hsu, C.-S. (2003). Domain-initial articulatory strengthening in four languages. In Local, J., Ogden, R., and Temple, R., editors, Phonetic interpretation: Papers in Laboratory Phonology VI, pages 145–163. Cambridge University Press, Cambridge, UK.

Kingston, J. and Diehl, R. L. (1994). Phonetic knowledge. Language, 70(3):419– 454.

Ladd, D. R. (2014). Simultaneous Structure in Phonology. Oxford University Press.

Lisker, L. (1957). Closure duration and the intervocalic voiced-voiceless distinction in English. Language, 33:42–49.

Lisker, L. (1986). Voicing in English: a catalogue of acoustic features signaling /b/ versus /p/ in trochees. Language and Speech, 29(3):3–11.

Munhall, K. G. and Löfqvist, A. (1992). Gestural aggregation in speech: laryngeal gestures. Journal of Phonetics, 20:93–110.

Ohala, J. (1983). The origin of sound patterns in vocal tract constraints. In MacNeilage, P. F., editor, The production of speech, pages 189–216. Springer, New York.

Peng, G., Zheng, H.-Y., Gong, T., Yang, R.-X., Kong, J.-P., and Wang, W. S.-Y. (2010). The influence of language experience on categorical perception of pitch contours. Journal of Phonetics, 38:616–624.

So, C. K. and Best, C. T. (2010). Cross-language perception of non-native tonal contrasts: effects of native phonological and phonetic influences. Language and Speech, 53(2):273–293.

Stagray, J. and Downs, D. (1993). Differential sensitivity for frequency among speakers of a tone and nontone language. Journal of Chinese Linguistics, 21:143–163.

Stevens, K. N. (2000). Acoustic Phonetics. MIT Press, first edition.

Titze, I. R. (1994). Principles of Voice Production. Prentice-Hall, Englewood Cliffs, NJ.

Ussishkin, A., Warner, N., Clayton, I., Brenner, D., Carnie, A., Hammond, M., and Fisher, M. (2017). Lexical representation and processing of word-initial morphological alternations: Scottish Gaelic mutation. Journal of Laboratory Phonology, 8(1):1–34.

Whalen, D. H. (1990). Coarticulation is largely planned. Journal of Phonetics, 18(1):3–35.

Whalen, D. H. and Levitt, A. G. (1995). The universality of intrinsic f0 of vowels. Journal of Phonetics, 23:349–366.

White, L., Benavides-Varela, S., and Mády, K. (2020). Are initial-consonant lengthening and final-vowel lengthening both universal word segmentation cues? Journal of Phonetics, 81:1–14.