Friday, June 26, 2020

What's universal in phonetics?

As a fieldworker, I'm often struck by how many linguistic patterns I've observed that just "shouldn't" occur. Linguistics often propels itself as a field by asserting theories that are both too strong and too myopic. The thinking goes that one should assume universality first and then adjust accordingly afterwards (or unfortunately, ignore exceptions and continue on).

In phonetics, there has been a long history around the notion of universalism. Jakobson, Fant, and Halle (1961) assumed that one needed only distinctive features to characterize cross-linguistic differences. Once you got features down, you could just assume that all speakers had the same sort of mapping from features to articulation. This idea persisted into the 1970's (at least among phonologists), but began to break down in the 1980's - 1990's with Pat Keating's work on voicing (1984), Doug Whalen's discussion of coarticulation (1990), and Kingston & Diehl's discussion about "automatic" and "controlled" phonetics (1994). The emerging consensus from this earlier work and the resulting evolution of laboratory phonology was that phonetic patterns are closely controlled by speakers and many patterns are language-specific.1

Ladd's (2014) book provides a nice overview of many of these ideas - in particular the view that "Phonologists want their descriptions to account for the phonetic detail of utterances. Yet most are reluctant to consider the use of formalisms involving continuous mathematics and quantitative variables, and without such formalisms, it is doubtful that any theory can deal adequately with all aspects of the linguistic use of sound." (p.51)

If we fast-forward to the present day, the landscape of phonetics and phonology is quite different than what it used to be. I think most laboratory phonologists (and most phonologists nowadays are laboratory phonologists) would agree that representations reflect distributions of productions in some way and that the statistical and articulatory details can vary in a gradient way across languages.

With this in mind, what is left of phonetic universals? There are certainly several universals regarding phonological inventories that could be discussed (see Gordon's recent 2016 book on the topic). But what of phonetic patterns that are best captured quantitatively? What are the universals and near universals? I thought I would start to collect a list of these here as a way to organize my thoughts and to challenge/question my assumptions. I invite anyone to propose additional things here too.

1. Dorsal stops (almost always) have longer VOT (voice onset time) than coronal or labial stops
On the basis of looking at 18 different languages, Cho and Ladefoged (1999) first noted that, after one adjusts for the same laryngeal category (voiced, voiceless, voiceless aspirated), dorsal stops will tend to have a longer VOT than coronal or labial stops. A more recent analysis of this question is found in Chodroff et al. (2019) where the authors looked at over 100 different languages. Of the languages that they sampled, 95% displayed the dorsal > coronal pattern. This finding probably relates to a mechanical constraint on movement of the tongue dorsum. Since the dorsum has greater mass, the release portion tends to take longer (Stevens 2000). All else being equal, larger articulators usually move more slowly than smaller ones - a general principle of physiology and movement. This longer release portion delays venting of the supralaryngeal cavity which ultimately facilitates aerodynamic conditions for voicing.

Chodroff et al.'s sampling revealed another near universal - VOT is strongly correlated within a particular language. That is, if a language tends to have very short lag VOT values for one stop consonant, it has very short lag VOT values for all the others too. This finding is interesting since it suggests that speakers and languages produce identical laryngeal gestures regardless of the supralaryngeal constriction. There is some physiological evidence for this universal (Munhall & Löfqvist 1992).

2. All languages have utterance-final lengthening.

Though languages tend to vary in the extent to which words are lengthened at phrase-final or utterance-final position, it seems to have been found in every language where it has been investigated (Fletcher 2010, White et al. 2020). Even languages which lack phonological units used in intonation systems (boundary tones, pitch accents) seem to have utterance-final lengthening (DiCanio and Hatcher, 2018, DiCanio et al. 2018, in press).

There is probably a biomechanical explanation for utterance-final lengthening based on articulatory slowing at the end of utterances. As speakers are finishing utterances, their articulators gradually move more slowly (Byrd & Saltzman 2003). The scope of this effect varies across languages and it is not yet clear whether certain syllable types are more affected than others, i.e. closed syllables or syllables with short vowels might undergo less final lengthening.

3. Languages optimize the distance between vowels in articulation/acoustics.

I'll leave it open for now whether this refers just to articulatory dispersion or acoustic dispersion (there is debate around this, of course), but it seems like most languages try to optimize the height and backness of vowels. In languages with asymmetric vowel systems, e.g. /i, e, a, o/, or /i, e, ɛ, a, o, u/, the back vowels will have F1 values that often sit in-between the values for the corresponding front vowels (Becker-Kristal 2010). Becker-Kristal looked at the acoustics of over 100 different languages and found this to be a general pattern. The opposite pattern is ostensibly true, but most languages have more front vowel contrasts than back vowel contrasts.

***Edited to include new things - thanks to Eleanor Chodroff, David Kamholz, Joseph Casillas, Rory Turnbull, Claire Bowern, Carlos Wagner and various others on Twitter whose identities/names are not clear.***

4. Intrinsic F0 of high vowels

There is some discussion of this effect, but it seems to be the case that, all else being equal, high vowels will have higher F0 than low vowels (Whalen & Levitt 1995). In all languages where it has been investigated, researchers have found positive evidence for this. Whalen & Levitt note that the explanation here has to do with enhanced subglottal pressure and greater cricothyroid (CT) activitiy in the production of high vowels relative to low vowels. Ostensibly, as the tongue is raised, it exerts a pull on the larynx via the geniohyoid and hyothyroid muscles. This raises the thyroid cartilage and thus exerts pull on the cricothyroid itself (raising F0). Greater subglottal pressure would then be needed to surpass the impedance due to greater vocal fold tension.

There is a tendency, however, to not observe the effect in low F0 contexts, in particular for low tones in tone languages. I've personally wondered about this in Mixtec and Triqui languages, though it is usually quite difficult to control for glottalization, tone, and vowel quality all at once in these languages in order to investigate this question. Why might the effect not be found for low tones? One possibility is that F0 control is essentially different in a low F0 context. According to Titze's body-cover model of vocal fold vibration (1994), the thyroarytenoid (TA) muscles are more responsible for vocal fold vibration when F0 is low. Perhaps tongue raising exerts less force on the TA than it does on the CT.

5. Voiced stops are shorter in duration than voiceless stops

Voicing is hard to maintain when there is any constriction in the supraglottal cavity. Assuming no velopharyngeal port venting, supralaryngeal oral stop closure will cause a build up in pressure above the glottis which will inhibit the necessary pressure differential across the glottis required for continued voicing - the aerodynamic voicing constraint (Ohala, 1983). Thus, stops will stop voicing relatively quickly during closure. Similarly, for voiced fricatives, the necessity to maintain narrow constriction for frication and greater intra-oral air pressure relative to atmospheric air pressure is at odds with a simultaneous necessity to maintain greater subglottal pressure relative to intra-oral (supraglottal) air pressure for continued voicing. Thus, voiced fricatives will often devoice or de-fricativize (and be produced as continuants).

A consequence of the aerodynamic voicing constraint in stops is that the duration of stop voicing is limited and so, it turns out, voiced stops are shorter than voiceless ones. This has been observed since early work of Lisker (1957) (c.f. Lisker 1986 as well). It seems to be a phonetic universal. What about fricatives though? Are voiced fricatives typically shorter than voiceless ones? I think that the jury is still out on this one. While it is difficult to maintain simultaneous voicing and frication for voiced fricatives, the temporal constraints are not as clear as with stops. Yet, voiced fricatives are almost always shorter than voiceless fricatives as well. 

What's not a universal?

In thinking about ostensible phonetic universals, I am struck by many patterns that do not seem to be as universal as once believed. I am most familiar with those in the research that I have done.

6. Not a universal - word-initial strengthening

A common cross-linguistic pattern is that word-initial consonants will be produced with greater duration and/or with stronger articulations (more contact, faster velocity). Fougeron & Keating (1997) is a seminal paper observing this pattern with English speakers. It has been studied in various languages - most recently in work by Katz & Fricke (2018) and White et al. (2020). While Fougeron & Keating (1997) and subsequent work by Keating et al. (2003) do not assert that this pattern is universal, White et al. (2020) state the following (in their conclusions):

"We propose, however, that initial consonant lengthening may be likely to maintain a universal structural function because of the critical importance of word onsets for the entwined processes of speech segmentation and word recognition."

I should admit, I'm working on a paper which addresses this claim with some of my research on Yoloxóchitl Mixtec, an Otomanguean language in Mexico. The language is prefixal and has final stress. Word-initial consonants are always shorter than word medial ones and (in the paper I'm working on now at least), undergo more lenition. You don't have to take my word about this based on something not-yet-published though. The durational finding is replicated in both DiCanio et al. (2018) and DiCanio et al. (to appear). So, three different publications all with different speakers have found the effect. (I'll just mention here, because this is a blog and not a publication, that the same pattern seems to hold in Itunyoso Triqui - another Mixtecan language with final stress and prefixation. That's another paper for this summer.)

There's an interesting thing here though - most of the languages which have been studied in relation to initial strengthening are not prefixing languages. In prefixal languages, like Scottish Gaelic, parsing word-initial consonants does not help too much in word identification (Ussishkin et al. 2017). The authors state the following:

"Our results show that during the process of spoken word recognition, listeners stick closer to the surface form until other sounds lead to an interpretation that the surface form results from the morphophonological alternation of mutation." (Ussishkin et al. 2017, p.30)

While this research does not address word-initial strengthening, it suggests that there is just something different about prefixal languages in terms of word recognition. If the goal of word-initial strengthening is to enhance cues to word segmentation, then it stands to reason that word-initial strengthening might not occur in heavily prefixing languages. At the very least, the Mixtec data show that word-initial consonant lengthening is indeed not a universal.

7. Not a universal - native listeners of a tone language are better at pitch perception than native listeners of non-tonal languages

I know, I know, you want to believe that it's true. All tone language listeners must have superpowers when it comes to perceiving pitch, right? It turns out that the evidence is quite mixed here and that the role of music experience ends up playing a big role. There are papers that have found evidence that speaking a tone language confers some benefit in pitch discrimination when listeners have to discriminate both between tonal categories and within them (Burnham et al. 1996, Hallé et al 2004, Peng et al. 2010). However, there are other papers showing no advantage (Stagray & Downs 1993, DiCanio 2012, So & Best 2010). At issue is usually the musical background of the listeners under question. In Stagray & Downs (1993), the authors chose only speakers of Mandarin who did not have musical experience and in DiCanio (2012), none of the Triqui listeners had any music experience. In So & Best (2010), the authors screened 300 Hong Kong Cantonese listeners and chose only those with (a) no knowledge of Mandarin and (b) no formal music training. Only 30/300 qualified! Many other studies finding an advantage to tone language listeners have not controlled for musical background.

So, how much does musical ability play a role in tonal discrimination? I can provide an example from some data from my 2012 paper (though this was not discussed in the paper itself). Triqui is heavily tonal, with nine lexical tones (/1, 2, 3, 4, 45, 13, 43, 32, 31/) and extensive tonal morphophonology (DiCanio 2016). One would imagine that, when presented with stimuli along a continuum between two tonal categories, e.g. falling tones 32 and 31, they might be extra careful at perceiving slight differences. It turns out that they have improved perception at perceiving between-category differences (steps 2-4, 3-5, 4-6, 5-7) than within-category differences (steps 1-3, 6-8).
Discrimination accuracy of tonal continua for Triqui and French listeners. Data from DiCanio (2012). No Triqui speaker has musical training but a subset (13/20) of the French speakers did. Discrimination is better than predicted at the end of the continuum because listeners were comparing resynthesized natural to non-resynthesized speech.
On the whole, French speakers were better at discriminating Triqui tonal pairs along the continuum than Triqui speakers were. This is quite surprising, but once we separate French speakers by their musical background, we find that the non-musicians of the bunch were worse at between-category tonal discrimination than the Triqui listeners (but better at within-category discrimination). Having some musical background (at least 2-3 years) provides a remarkable benefit to your pitch discrimination abilities. Speaking a tone language makes you good at telling apart two particular tones in your language at a categorical boundary between them, but it does not make you magically better at pitch discrimination, apparently.

-----
There are undoubtedly many other things that could be put here both for universals and no-longer universals. I'm of course very biased here as someone that works on prosody. (I tend to be more interested in the prosodic patterns.) This is intended to be a continually-developing list of things for both my personal memory and for others to contribute to (or argue with). So, any thoughts of things to add are most welcome.
_____________
1. There are many other sources here that I'm probably missing. I'd be happy to add any that people suggest.

References:
Becker-Kristal, R. (2010). Acoustic typology of vowel inventories and Dispersion Theory: Insights from a large cross-linguistic corpus. PhD thesis, UCLA.

Burnham, D., Francis, E., Webster, D., Luksaneeyanawin, S., Attapaiboon, C., Lacerda, F., and Keller, P. (1996). Perception of lexical tone across languages: evidence for a linguistic mode of processing. In Proceedings of the 4th International Conference on Spoken Language Processing, volume 4, pages 2514–2517.

Byrd, D. and Saltzman, E. (2003). The elastic phrase: modeling the dynamics of boundary-adjacent lengthening. Journal of Phonetics, 31:149–180.

Cho, T. and Ladefoged, P. (1999). Variation and universals in VOT: evidence from 18 languages. Journal of Phonetics, 27:207–229.

Chodroff, E., Golden, A., and Wilson, C. (2019). Covariation of stop voice onset time across languages: Evidence for a universal constraint on phonetic realization. Journal of the Acoustical Society of America, Express Letters, 145(1):EL109–EL115.

DiCanio, C. T. (2012). Cross-linguistic perception of Itunyoso Trique tone. Journal of Phonetics, 40:672–688.

DiCanio, C. T. (2016). Abstract and concrete tonal classes in Itunyoso Trique person morphology. In Palancar, E. and Léonard, J.-L., editors, Tone and Inflection: New Facts and New Perspectives, volume 296 of Trends in Linguistics Studies and Monographs, chapter 10, pages 225–266. Mouton de Gruyter.

DiCanio, C., Benn, J., and Castillo García, R. (2018). The phonetics of information structure in Yoloxóchitl Mixtec. Journal of Phonetics, 68:50–68.

DiCanio, C., Benn, J., and Castillo García, R. (in press). Disentangling the effects of position and utterance-level declination on tone production. Language and Speech, in press. Preprint available here.

DiCanio, C. and Hatcher, R. (2018). On the non-universality of intonation: evidence from Triqui. Journal of the Acoustical Society of America, 144:1941.

DiCanio, C., Zhang, C., Whalen, D. H., and Castillo García, R. (2019). Phonetic structure in Yoloxóchitl Mixtec consonants. Journal of the International Phonetic Association, https://doi.org/10.1017/S0025100318000294.

Fletcher, J. (2010). The prosody of speech: Timing and rhythm. In The Handbook of Phonetic Sciences, pages 521–602. Wiley-Blackwell, 2nd edition.

Fougeron, C. and Keating, P. A. (1997). Articulatory strengthening at edges of prosodic domains. Journal of the Acoustical Society of America, 101(6):3728–3740.

Gordon, M. K. (2016). Phonological Typology. Oxford University Press.

Hallé, P. A., Chang, Y. C., and Best, C. T. (2004). Identification and discrimination of Mandarin Chinese tones by Mandarin Chinese vs. French listeners. Journal of Phonetics, 32(3):395–421.

Jakobson, R., Fant, C. G. M., and Halle, M. (1961). Preliminaries to Speech Analysis: The Distinctive Features and their Correlates. MIT Press.

Katz, J. and Fricke, M. (2018). Auditory disruption improves word segmentation: A functional basis for lenition phenomena. Glossa, 3(1):1–25.

Keating, P. (1984). Phonetic and phonological representation of stop consonant voicing. Language, 60:286–319.

Keating, P., Cho, T., Fougeron, C., and Hsu, C.-S. (2003). Domain-initial articulatory strengthening in four languages. In Local, J., Ogden, R., and Temple, R., editors, Phonetic interpretation: Papers in Laboratory Phonology VI, pages 145–163. Cambridge University Press, Cambridge, UK.

Kingston, J. and Diehl, R. L. (1994). Phonetic knowledge. Language, 70(3):419– 454.

Ladd, D. R. (2014). Simultaneous Structure in Phonology. Oxford University Press.

Lisker, L. (1957). Closure duration and the intervocalic voiced-voiceless distinction in English. Language, 33:42–49.

Lisker, L. (1986). Voicing in English: a catalogue of acoustic features signaling /b/ versus /p/ in trochees. Language and Speech, 29(3):3–11.

Munhall, K. G. and Löfqvist, A. (1992). Gestural aggregation in speech: laryngeal gestures. Journal of Phonetics, 20:93–110.

Ohala, J. (1983). The origin of sound patterns in vocal tract constraints. In MacNeilage, P. F., editor, The production of speech, pages 189–216. Springer, New York.

Peng, G., Zheng, H.-Y., Gong, T., Yang, R.-X., Kong, J.-P., and Wang, W. S.-Y. (2010). The influence of language experience on categorical perception of pitch contours. Journal of Phonetics, 38:616–624.

So, C. K. and Best, C. T. (2010). Cross-language perception of non-native tonal contrasts: effects of native phonological and phonetic influences. Language and Speech, 53(2):273–293.

Stagray, J. and Downs, D. (1993). Differential sensitivity for frequency among speakers of a tone and nontone language. Journal of Chinese Linguistics, 21:143–163.

Stevens, K. N. (2000). Acoustic Phonetics. MIT Press, first edition.

Titze, I. R. (1994). Principles of Voice Production. Prentice-Hall, Englewood Cliffs, NJ.

Ussishkin, A., Warner, N., Clayton, I., Brenner, D., Carnie, A., Hammond, M., and Fisher, M. (2017). Lexical representation and processing of word-initial morphological alternations: Scottish Gaelic mutation. Journal of Laboratory Phonology, 8(1):1–34.

Whalen, D. H. (1990). Coarticulation is largely planned. Journal of Phonetics, 18(1):3–35.

Whalen, D. H. and Levitt, A. G. (1995). The universality of intrinsic f0 of vowels. Journal of Phonetics, 23:349–366.

White, L., Benavides-Varela, S., and Mády, K. (2020). Are initial-consonant lengthening and final-vowel lengthening both universal word segmentation cues? Journal of Phonetics, 81:1–14.

Monday, December 2, 2019

Tutorial: Creating pretty spectrograms


Phonetic data is no longer just for papers on phonetics. Research using quantitative methods, corpus data, and experimental approaches may involve phonetic data for analytical or visualization purposes. There may also simply be a need to demonstrate visually a phonetic pattern in a linguistics paper unrelated to phonetics. For instance, descriptive grammars are stronger/clearer when phonological argumentation is accompanied with phonetic data showing patterns (Maddieson 2001, Maddieson et al. 2009). The movement to examine more phonetic data within linguistics is motivated by several factors:

a.   It is easier than ever before to show proof of one's observations.
b.   A greater focus on spoken language corpora means that one must use tools which analyze the speech signal (not just texts or transcriptions). 
c.   Laboratory phonology has been incorporated into all areas of phonology.
d.   Gradient processes within the phonetic signal are relevant to our understanding of social variation and representations in the mental lexicon.

Yet, despite these changes to the field, linguists (and especially students starting off in linguistics) often have trouble visualizing phonetic data within research. The effect of this is that one might not be able to convey one's message clearly to the audience, casting doubt on the observations. Some of the common pitfalls include: (1) The scaling parameters for displaying the acoustics are incorrect and you can not observe the relevant detail (e.g. dynamic range, F0 range, etc), (2) The text is not correctly aligned with the acoustics, (3) Too much information is displayed (another scaling problem), and (4) No scale is given.

Drawing well-labelled spectrograms is not difficult and Praat (Boersma & Weenink 2019) possesses several nice tools that allow you to visualize things rather nicely (far better than taking a screen shot of your screen). This tutorial is designed as the first (of perhaps many) which aim to improve how acoustic phonetic data is visualized.

I.  Initial steps: include a textgrid

(1) Open up the sound file that you wish to visualize. In most cases, a reader will not be able to visually inspect anything more than 6-10 segments long in an image. So, make sure that the duration that you wish to image is shorter than this. Otherwise, it is not showing much to a reader.

(2) Create a textgrid along with the sound file and segment the portions that you wish to visualize. If you are not sure about how to create a textgrid, please see the Praat manual. I have created a simple example here of myself saying the word 'ken' [kʰɛ̃n] (below).

(3) Once you have created a textgrid, select the portion of the sound file corresponding to the textgrid and then choose from the File menu "Extract selected textgrid (preserve times)." This will create a textgrid file exactly the size of the spectrogram you wish to display it with.

A spectrogram of the word 'ken.'
II. Exporting a visible spectrogram

(4) Praat does not currently allow users to export a spectrogram from a sound file - it is necessary to export a visible spectrogram.

(5) To do this, first select the portion of the sound file that you wish to visualize and click 'sel' (select). Then, from the Spectrum menu, select "Extract Visible Spectogram."

(6) You should now see a visible spectrogram in the object window of Praat.

III. Adding layers to create an image

(7) The key to creating a nice image is to add objects/details in layers. Praat allows you the ability to add in layers to an image and you may undo multiple layers at a time in the picture window.

(8) The things to understand about the picture window are that (a) it will print only in the region that you have selected and (b) it will use any presets you have chosen for Pen/Font. It does not revert to a default. Select a fairly large region for your spectrogram, perhaps a 4x6 image.

(9) Now, select the spectrogram in the object window and select "Draw:Paint..." In the dialog window, the option "Garnish" is often pre-selected for you. When you print with the Garnish button selected, Praat will print information about the sound image. You do not want it to do that since we will be adding in elements pertaining to the axes separately, ourselves. So, unselect this (see below).


(10) This should now produce a spectrogram with no margins in the picture window. That's the first step.

(11) Now, from the "Margins" menu in the picture window, select "Draw inner box." This will create margins around the box. Note that the thickness of the margin here can be adjusted under the Pen:Line Width menu in the picture window. However, Praat does not allow you to adjust things after they are drawn - you must do this before you print elements. For now, the preset - 1.0 line width - is sufficient. You should have created something like this below:


(12) Now comes the fun part - we will be adding in axes in stages. First, let's add in a y-axis. From the Margins menu, select Marks:Marks left at the bottom of the menu. We can choose to exclude dotted lines for the moment, but Praat recognizes the scale of the image, so it will know that the y-axis should be frequency in Hz.

(13) Once you have done this, select "Text left" from the Margins menu. Print "Frequency (Hz)." The resulting image should look like the one below:


(14) We can continue to add in layers this way (including duration on the x-axis), and if you so wished, we could then export this to a pdf document. However, we could also add in text.

(15) To add in text, select a portion of the image larger than the box with the spectrogram itself (see below) and then choose the textgrid file from the objects window. Deselect the "garnish" option again and click OK.



(16) The "show boundaries" option allows us to visualize the segmental boundaries that you have chosen in your spectrogram, but the default line width (1.0) is a bit narrow/thin for visualization. If you want to adjust this, choose Line width from the Pen menu and set it to something larger (like 1.5 or 1.8). Then print the textgrid.

If you want to go back to do this, just undo the print option, change the settings, and then print the textgrid again.

(17) The result should look something like below.



(18) The last step we might do is to include some acoustic information. Let's suppose we want to add in formants to our figure. Select the sound file from the object window and choose "Analyze Spectrum: To formant (burg)..." This will create a formant object in your window.

(19) Select the original box portion in the picture window again (not the entire portion with text). Now, select the formant object from your object window and click "Draw: Speckle..." and make sure you deselect the "garnish" option. This will create speckles corresponding to your formants. Be sure to set the range of the drawing option to match the range of the spectrogram, i.e. if your sound file is longer than the spectrogram you are visualizing, you will end up with formant values that do not match the image.

Note that if you lower the dynamic range, it will only draw formants within that range, i.e. 20 dB = the loudest 20 dB of the speech signal. The output of this should look as below:


(20) We could add in extra layers, e.g. duration on an x-axis under the text, F0 data on the axis to the right of the spectrogram, etc. However, we'll just stop here because I think you probably get the gist of this. The final exported pdf always looks nicer than what appears in the Praat picture window (see below). You can now add in labels (arrows, text) using other software.
References:
Boersma, P. and Weenink, D. (2019). Praat: doing phonetics by computer (version 6.1). Computer program. Retrieved from http://www.praat.org/.

Maddieson, I. (2001). Phonetic fieldwork. In Newman, P. and Ratliff, M., editors, Linguistic Fieldwork, pages 211–229. Cambridge University Press.

Maddieson, I., Avelino, H., and O’Connor, L. (2009). The Phonetic Structures of Oaxaca Chontal. International Journal of American Linguistics, 75(1):69–103.

Saturday, August 17, 2019

Readability in reporting statistics

Within the past 20 years there has been a bit of a (r)evolution in the quantitative methods used in the speech sciences and linguistics. A renewed focus on experimental research in linguistics and the development of laboratory phonology as a field have contributed to this development. Though phonetics has always been an experimental field, it too has benefitted from a renewed interest in quantitative methods. The availability of free and powerful statistical analysis software, such as R, has improved access to tools. Finally, several books focusing on quantitative methods in linguistic sciences have been published, all of which improve the statistical learning curve.

Yet with any changes to a field come challenges. Since several types of linguistic data violate the assumptions of ANOVA, what method should you use instead? With several newer methods (multi-level modeling, generalized additive models, growth curve analysis, spine ANOVA, functional data analysis, Bayesian methods, etc), it is often also unclear what statistic to report. If we are concerned about replicability in our field, how do we ensure that our methods are clear enough to be replicated? And, importantly, how do we communicate these concerns to both novice and experienced researchers that might not be familiar with them? Since so many methods are new (or new to some of us), we are often tempted to include a fancier model without understanding it fully. How do we ensure we understand it enough to use it?

These issues are all very important, but we must also not lose sight of our duty as scientists to properly communicate our research. It would be great if our research could "speak for itself." It would be great if we could rely on our readers being so engaged in our results that they never got bored or frustrated reading pages and pages of statistical modeling and tests. It would great if we could assume that all readers understood the mechanics of each model too. Yet, our research seldom speaks for itself and readers can be both bored and uninformed. Unless your research findings are truly groundbreaking, you probably have to pay attention to your writing style.

I'm not an expert in writing or an expert in statistical methods. I teach a somewhat intense graduate course in quantitative methods in linguistics and have been a phonetician for about 15 years (if I include some time in grad school). My graduate education is in linguistics, not mathematical psychology or statistics. But as a researcher/phonetician I am a practitioner of statistical tests, as a reviewer I read many submitted manuscripts in phonetics, and as a professor I frequently evaluate how students talk about statistics in writing. I think that the best way to open up a discourse about how we report statistics in linguistics and whether it is readable or not is to present various strategies and to discuss their pros/cons.

I should mention that I'll be pulling examples from my own research in phonetics here as well as a few that I've seen in the literature. I am not intending to offend any particular researcher's practice. On the contrary, I feel that it's necessary to bring up some real examples in this discussion (and I've picked some good ones).

I.  The laundry list

One practice in reporting statistics is to essentially report all the effects as a list in the text itself. We've all seen this practice, but after digging for an example of it, I was happy to discover that it is not nearly as frequent as I had assumed (or perhaps we've become better writers). So, here's a made-up example:

There were significant main effects of vowel quality (F[3, 38] = 6.3, p < .001), age (F[6, 12] = 2.9, p < .01), speech style (F[2, 9] = 5.7, p < .001), and gender (F[3, 8] = 3.2, p < .01) and significant interactions of vowel quality x age (F[18, 40] = 2.7, p < .01], vowel quality x gender (F[12, 20] = 2.4, p < .05), but no significant interaction between vowel quality and gender nor between vowel quality and speech style. There were significant three-way interactions between vowel quality x gender x speech style (F[12, 120] = 2.4, p < .05) but no three-way interaction between either...  These effects are seen in the plot of the data shown in Figure 3.

Effect, stat, effect, stat, effect, stat, repeat. It almost sounds like an exercise routine. On the one hand, this method of reporting statistics is comprehensive - all our effects are reported to the reader. We also avoid the issue of tabling your statistics (more on this below). Yet, it reads like a laundry list and a reader can quickly forget (a) which effect to pay attention to and (b) what each effect means in the context of the hypothesis being explored.

If the research involves just one or two multivariate models for an entire experiment, the researcher might be forgiven for writing this way, but now let's pretend that there are eight models and you are reading the sample paragraph above eight times within the results section of a paper. Then you go on to experiments 2 and 3 and read the same type of results section two more times. By the end of reading the paper, you may have seen results indicating an effect or non-effect of gender x vowel quality twenty-four times. It truly becomes a slog to recall which effects are important in the complexity of the model and you might be forgiven for losing interest in the process.

There is an additional problem with the laundry list method - our effects have been comprehensively listed but the linkage between individual effects and an illustrative figure has not been established. It might be clear to the researcher, but it's the reader who needs to interpret just what a gender x vowel quality interaction looks like from the researcher's figure. Without connecting the specific statistic and the specific result, we both risk over-estimating the relevance of our particular effect in relation to our hypothesis (risking a Type I error) and we fail to guide our readers in interpreting our statistic the right way (producing either Type S or Type M errors). Our practice of reporting statistics can influence our statistical practice.

Tip #1: Connect the model's results to concrete distinctions in the data in the prose itself.

Now, just what does it look like to connect statistics to the data? and how might we easily accomplish this? To learn this, we need to examine additional methods.

II.  The interspersed method with summary statistics

If it's not already clear, I'm averse to the laundry list method. It's clear that we need to provide many statistical results to the reader, but how do we do this in a way that will engage them with the data/results? I think that one approach is to include summary statistics in the text of the results section immediately after or before the reported statistic. This has three advantages, in fact. First, the reader is immediately oriented to the effect to look for in a figure. Second, we avoid both type S and type M errors simultaneously. The sign and the magnitude of the effect are clear if we provide sample means alongside our statistic. Third, it breaks up the monotony found in a laundry list of statistical effects. Readers are less likely to forget about what the statistic means when it's tied to differences in the data.

I have been trying to practice this approach when I write. I include an excerpt from a co-authored paper here below (DiCanio et al. 2018). As a bit of background, we were investigating the effect of focus type on the production of words in Yoloxóchitl Mixtec, a Mixtec language spoken in Guerrero, Mexico. Here, we were discussing the combined effect of focus and stress on consonant duration.


The statistics reported here are t values from a linear mixed effects model using lmertest (Kuznetsova et al. 2017). The first statistic mentioned is the effect of focus type on onset duration. This effect is then immediately grounded in the quantitative differences in the data - a difference between 114 ms and 104 ms. Then, additional statistics are reported. This approach is avoiding Type S and Type M errors and it makes referring to Figure 2 rather easy. The reader knows that this is a small difference and they might not make much of it even though it is statistically significant. The second statistical effect is related to stress. Here, we see that the differences are more robust - 126 vs. 80 ms. Figure 2, which we referred the reader to above, is shown below.


While it is rather easy to get some summary statistics for one's data, what do you do when you need more complex tables of summary statistics? I generally use the ddply() function in the plyr package for R. This function allows one to quickly summarize one's data alongside the fixed effects that you are reporting in your research. Here's an example:

ddply(data.sample, .(Focus, Stress), summarize, Duration = mean(Duration, na.rm=TRUE))

For a given data sample, this will provide mean duration values for the fixed effects of focus and stress. One can specify different summary statistics (mean, sd, median, etc) and include additional fixed effects. While this may seem rather trivial here (it's just a 2x2 design after all), it ends up being crucially useful for larger multivariate models where there are 2-way and 3-way interactions. If each factor includes more than four levels, a two-way or three-way interaction can become harder to interpret. Leaving this interpretation open to the reader is problematic.

Now, for the person in the room saying "don't post-hoc tests address this?" I would point out that many of the statistical tests that linguists have been using more recently are less amenable to traditional post-hoc tests. (Is there an equivalent to Tukey's HSD for different multi-level models?) Also, if there are a number of multivariate models that one needs to report, the inclusion of post-hoc tests within a manuscript will weigh it down. So, even if certain types of post-hoc tests were to address this concern, they still would end up in an appendix or as article errata and essentially hide a potential Type M or Type S error.

We've now connected our statistics with our data in a clearer way to the reader and resolved the potential for Type S and M errors in the process. I think this is a pretty good approach. It also manages to treat the audience as if they need help reading the figure because the text reiterates what the figure shows. Is this "holding the reader's hand" too much? Keep in mind that you are intimately familiar with your results in a way that the reader is not and the reader has many other things on their mind, so it is always better to hold the their attention by guiding them. Also, the point is to communicate your research findings, not to engage in a competition of "whose model is more opaque?". Such one-upmanship is not an indicator of intelligence, but of insecurity.

What are the downsides though? One potential issue is that the prose can become much longer. You are writing more, so in a context where more words cost more to publish or where there is a strict word limit, this method is less good. This issue can be ameliorated by reporting summary statistics just for those effects which are relevant to the hypothesis under investigation. There is another approach here as well - why not just eliminate statistics from the results section prose altogether. If it is the statistics that get in the way of interpreting the relationship between the hypothesis and results, we could just put the statistics elsewhere.

III.  Tabling your stats

Another approach to enhancing the readability of your research is to place the results from statistical tests and models in a table. I'll admit - when I first studied statistics I was told to avoid this. Yet, I can also see the appeal of this approach. Consider that as models have gotten more complex, there are more things to report. If one is avoiding null hypothesis significance testing or if one is avoiding p values, a set of different values might need to be reported which would otherwise be clunky within the text itself. At the same time, reviewers have been demanding more replicability and transparency within statistical models themselves. This means that they may wish to see more details - many of which need to be included in a table.

A very good recent example of this is found in an interesting recent paper by Schwarz et al. (2019) where the authors investigated the phonetics of the laryngeal properties of Nepali stops. I have included a snippet of this practice from this paper below (reprinted with authors' permission).

Snippet from p,123 of Schwarz, Sonderegger, and Goad (2019), reprinted with permission of the authors.
The dependent variable in the linear mixed effects model here is VD (voicing duration). The authors refer the readers to a table of the fixed effects. They include p values and discuss the directionality and patterns found within the data by referring the readers to a figure. The paragraph here is very readable because the statistics certainly do not interfere with the prose. The authors have also avoided Type M and Type S interpretation errors by stating the effects' directionality and using adverbial qualifiers, e.g. slightly.

One general advantage of tabling statistics is that reading one's results becomes more insightful. When done in a manner similar to what Schwarz et al. do above, readers also do not forget about the statistics completely. This is accomplished by commenting on specific effects in the model even though all the statistics are in the table.

If this is not done, however, the potential problem is that the reader might forget about the statistics completely. In such a case, the risk for a Type M or Type S error is inflated. Moreover, sometimes the effect you find theoretically interesting is not what is driving improvement to statistical model fit. This is obscured if individual results are not examined in the text at all.

Tip #2Whether tabling your stats or not, always include prose discussing individual statistical effects. Include magnitude and sign (positive or negative effect) in some way in the prose.

There is, of course, another alternative here - you can always combine an interspersed method with the tabling of statistical results. This would seem to address both a frequent concern among reviewers that they be able to see specific aspects of the statistical model while also not relegating the model to an afterthought while reading. I could talk about this method in more detail, but it seems as if most of the main points have been covered.

IV. Final points
There are probably other choices that one could make in writing up statistical results and I welcome suggestions and ideas here. As phonetics (and linguistics) have grown as fields, there has been a strong focus on statistical methods but perhaps less of an overt conversation about how to discuss such methods in research effectively. One of the motivations to writing about these approaches a bit is that, when I started studying phonetics in graduate school, much of what I saw in the speech production literature seemed to follow the laundry list approach. Yet, if you have other comments, please let me know.

References:
DiCanio, C., Benn, J., and Castillo García, R. (2018). The phonetics of information structure in Yoloxóchitl Mixtec. Journal of Phonetics, 68:50–68.

Schwarz, M., Sonderegger, M., and Goad, H. (2019). Realization and representation of Nepali laryngeal contrasts: Voiced aspirates and laryngeal realism. Journal of Phonetics, 73:113–127.


Kuznetsova, A., Brockhoff, P. B., and Christensen, R. H. B. (2017). lmerTest Package: Tests in Linear Mixed Effects Models. Journal of Statistical Software, 82(13):1–26.

Monday, August 5, 2019

Is it Trique or Triqui?

Though I am a linguist who has worked on several languages over the years, one of the languages (or language groups) that I have spent the most time studying is Triqui. There are three major Triqui languages (Copala, Itunyoso, and Chicahuaxtla) and though the latter two have some degree of mutual intelligibility, the Copala dialect/language is mostly unintelligible to speakers of the other two dialects/languages.

There are all sorts of interesting things about these languages and about indigenous languages in Mexico, more generally. However, one of the persistent questions I get asked is about the name of the language itself - "is it Trique [
ˈtʰɹike] or Triqui [ˈtʰɹiki]?" The answer to this is rather simple - in Spanish used by both Triqui speakers and non-Triqui speakers in Mexico, it's [ˈtɾiki]. So, the closest equivalent in English is [ˈtʰɹiki], with a final [i] sound.

But the follow-up question is usually "Why is it spelled with an "e" then?" To understand this, it's necessary to understand a little bit about dialectal differences in the languages and linguistic practice into the 20th century. To begin, the name of the language ostensibly comes from a spanification (or castellanización) of the Triqui phrase /tʂeh³ (k)kɨh³/, 'father/padre + mountainside/monte', meaning something like 'father of the mountain' in the Chicahuaxtla dialect, though this is a bit debatable. There is another word /
tʂːeh³²/ (Itunyoso) or /tʂeh³²/ (Chicahuaxtla and Copala) meaning 'camino' or 'road/path.' So, the name itself may have come from a phrase meaning 'the path of the mountainside.'

One thing to notice is that the Chicahuaxtla dialect retains the central vowel /ɨ/ where it has merged with /i/ in the Itunyoso dialect and, in some contexts, with /u/ in the Copala dialect. 
So, the word for 'mountainside/monte' retains this vowel in Chicahuaxtla where the word is /kːih³/ in Itunyoso Triqui and /kih³/ in Copala Triqui. This vowel also exists in many Mixtec languages (Triqui is Mixtecan) and is reconstructed for Proto-Mixtec (Josserand, 1983).

The first Triqui language to be described was the Chicahuaxtla dialect (Belmar 1897) and he wrote the name of the language as Trique. Now, Belmar was not particularly adept at transcribing many of the nuanced phonetic details of many languages. His tonal transcription is non-existent and he misses many important suprasegmental contrasts. However, he chose "e" here because he heard a difference between /i/ and /
ɨ/ and it was customary at the time to transcribe this latter vowel with "e." This practice goes back to very early Mixtecan/Otomanguean philology - the Dominican friar Antonio de los Reyes (1593) used "e" to transcribe this vowel in Teposcolula Mixtec. So, the six historical Mixtec vowels are, at least in old historical sources, transcribed as /i/ "i", /e/ "ai", /a/ "a", /o/ "o", /u/ "u", /ɨ/ "e." The IPA certainly did not exist during Belmar's time and this practice is simply an extension of a Mexican philological tradition.

Incidentally, the use of 'e' for transcribing mid back unrounded vowels is not limited to languages in Mexico. The romanization of Chinese, called pinyin, uses "e" for the vowel /ɤ/, found in many Chinese languages. This practice in fact seems to go back to earlier romanizations of Chinese and, in fact, the earliest grammar of Chinese was Arte de la lengua Mandarina, written by another Dominican friar, Francisco Varo. Though, as far as I can tell, he did not use "e" in his romanization of Chinese - that came later.


The earliest work on Triqui written in English is Longacre (1952) and he must have simply taken the practice of writing the language with an "e" from Belmar and other Spanish sources. Nowadays, it is written with an "i" in Spanish. Though, due to older sources using an "e", such as all work by Hollenbach on the Copala dialect, from 1973 to 1992, the spelling with an "e" has stuck around.


References:
Belmar, F. (1897). Lenguas del Estado de Oaxaca: Ensayo sobre lengua Trique. Imprenta de Lorenzo San-Germán.

Hollenbach, B. E. (1973). La aculturación lingüística entre los triques de Copala, Oaxaca. América Indígena, 33:65–95.

Hollenbach, B. E. (1992). A syntactic sketch of Copala Trique. In Bradley, C. H. and Hollenbach, B. E., editors, Studies in the syntax of Mixtecan Languages, volume 4. Dallas: Summer Institute of Linguistics and University of Texas at Arlington.

Josserand, J. K. (1983). Mixtec Dialect History. PhD thesis, Tulane University.

de Los Reyes, F. A. (1593). Arte en Lengua Mixteca. Casa de Pedro Balli, Mexico, Comte H. de Charencey edition.

Longacre, R. E. (1952). Five phonemic pitch levels in Trique. Acta Linguistica, 7:62–81.