Sunday, December 10, 2023

On the generalization of linguistic discovery

Discovery is a crucial part of the evolution of most academic disciplines that take a scientific approach towards understanding the world. New empirical evidence of a phenomenon leads researchers to re-examine old perceptions they had. Or rather, as Kuhn (1962) would argue, those with the old perceptions of the world eventually die or fade away while those who only have these newer perceptions mature.

But how do we generalize discovery? There are certainly many disciplines where discovery is generalizable. Findings in many of the physical sciences and mathematics are truths that will continue to be true forever. Discover a solution to a long-held mathemetical problem and it will remain true from now on. 

In the social and cognitive sciences though, discoveries seem somewhat murkier. Where they relate to biological, neurobiological, biophysical principles, the discoveries seem more generalizable. In my main sub-discipline, phonetics, there are clear physical relationships between what a person does with their speech articulators and what this produces in an acoustic signal, for instance. This is true across languages because all humans have similar oral and laryngeal anatomy. Yet, since speakers can massively vary just how they produce similar speech sounds, generalization is challenging here too.

Where they do not relate to biological or physical principles, behavioral and linguistic discoveries are usually observational findings restricted to a certain type of population. Generalization here necessarily needs to proceed to multiple experiments or studies with different types of populations. From a linguists' perspective (and I can only speak as a linguist here), that necessarily means that discoveries need more languages. 

There's a danger here that comes out of a kind of science-envy with behavioral and linguistic sciences. Though some of the methods in the social/behavioral sciences have become more scientifically rigorous (mostly in relation to statistical testing and modeling), the findings are not magically more generalizable to new populations than they were in the past. Discovering that college-aged speakers of English prefer certain syntactic structures over others does not mean anything about any other language unless subsequent research is undertaken. It might make predictions about patterns in other languages, but predictions are not generalizations.

Can we ever generalize about "Language"? What if we can't?

There are a lot of half-truths that linguists hold about "Language" that arise from a casual extension of findings in a few languages. Demonstrate that some linguistic phenomenon occurs in American English, Spanish, and German and linguists will believe it is a universal or "strong tendency" without a very clear criterion for what "universal" or "strong tendency" would mean.

Why be so careful with formal and statistical methods but so uncareful regarding the scientific bread-and-butter of hypothesis testing? The answer seems to lie in a kind of all-or-nothing perspective about where linguistic discoveries have value to a discipline. Linguists either believe linguistic patterns demonstrate unique characteristics of individual languages or populations; -or- they are universal patterns reflecting something deep about human evolution or murkier things like universal grammar. The field tends to narrowly merit the latter type of work since it is smells like a generalization.

This all-or-nothing approach means that we often come up empty-handed when we wish to talk about the relevance of our findings to the discipline - we're delving deeply into specific languages with an empirical or historical goal or we're looking broadly (and more superficially) at patterns in a larger number of languages. What might exist in the middle? We're a small discipline examining a huge topic with a gigantic amount of variation. We can't do it all.

I think one future path for the discipline is to take a note from the quantitative revolution that has occurred over the past 20-25 years in the discipline. The more we examine phenomena that we once believed to be discrete (x occurs in context A, but y occurs in context B), the more we discover that these are strong statistical tendencies instead. And the reason for this is that linguistic phenomena are behavioral. They are not the formal mathematical proofs that remain true forever after being solved. We just keep wanting to commit our error of generalization because of this science envy.

Might there not be any true linguistic universals? Maybe there are but we can never be typologically-balanced enough to prove anything more than fairly superficial patterns. Maybe there aren't any at all and this is ok. Languages are endlessly fascinating and we can still demonstrate how many languages work along statistical lines. The idea that there is massive inter-language variation and that this is structured to occur in certain types of languages necessarily means that we can look at types of languages to construct complex cross-linguistic hypotheses. To provide a concrete example, do speakers of fusional languages or those with non-concatentive morphology store words differently than speakers of isolating languages? This is an interesting question but it does not require a model of what must be universal. It just requires experiments and cross-linguistic research.

This is a blog post, so take my musings with a grain of salt. I don't have the answers to my own subdiscipline, let alone all of linguistics. I think though that we need to be more careful distinguishing between the things that we believe are proven/demonstrated and the things that are demonstrated typological patterns or universals. 

Wednesday, November 1, 2023

Issues in choosing a statistical model in phonetics

What's the bar for deciding to use a new statistical model in research? It seems like often enough within linguistics or speech science, one chooses a model based on what is à la mode. That frequently translates into increasing complexity.

Is it always good to have a more complex model? No. It might reveal more intricate interactions in the data. It might also model interactions between terms better than competing models, usually by improving fit with non-linear terms (cf. GCA, GAMMS). Yet, there are missing evaluative criteria for choosing a model that end up being crucially important. 

1. Is the model easily implementable and understandable? 

If a model is easy to implement and understand, then it is easy enough for new users to emerge and for a set of standards to come about. Yet, if neither of these things are true, there is severe social cost. 

If there are a handful of researchers proposing using a new model, is there an existing infrastructure that can help with training and implementation? Usually there is not and, as a consequence, many researchers get frustrated if the field pushes a model where no infrastructure exists. The same people proposing the model will end up fielding hundreds or thousands of questions about how to use it. And nobody has time for that.

Now, why might the field (or paper reviewers, most likely) decide that everyone has to use one particularly new and popular model for one's data? Sometimes important new factors are discovered that need to be modeled. But sometimes it's just the impostor syndrome, i.e. we are only a serious field if we have increasingly more mathematically opaque models for our data. And it's easy to give a post-hoc reason to include all possible factors when our predictions are so weak.

2. Does the model enable us to generalize?

Do we actually need to model as many of the details as we can? Even models that take a fairly generic approach to avoiding overfitting can end up overfitting things like dynamics. Resultingly, researchers lose time needing to discuss details that end up being unimportant and we end up losing the ability to generalize.

I'll provide one personal example of this. In my co-authored paper on the phonetics of focus in Yoloxóchitl Mixtec, we provided statistical models for f0 dynamics alongside statistical models for midpoint f0 values. There is certainly good reason to model changes in f0, but in a language with a number of level tones (and tone levels), this type of modeling might not say much. Indeed, we found mostly the same results when we looked at f0 midpoint for many of the level tones than when we looked at dynamic trajectories for them. Including two sets of models resulted in twice as many statistical tests and twice as much reporting.

Why did we choose to do this? We favored being comprehensive over possibly missing some unknown pattern (maybe the lower level tones had some different dynamic behavior?) Given the subtlety of the resulting patterns, it's hard to say what might be important.

Nowadays, I think we would be asked to choose to use GAMS instead of the mixed effects modeling. Yet, that also results in a statistical bloat (e.g. you have to model each tone separately). The results of our research should lead us to make scientific conclusions about speech, not get lost in 101 statistical tests where we spend time analyzing our three-way interactions. 

I don't know the right answer to how the field might address this issue, but I do not believe that it has to do with reducing the purview of one's study. GAMs are great if you are looking at one pattern in one language, but they are terrible for generalizing over a language's inventory (of vowel formants, of tones, of prosodic contexts, etc). One finds either studies using GAMs for limited topics (one vowel or one context) or studies where 101 models are included to provide a comprehensive account of a language's patterns. The former are more likely in studies examining well-studied languages while the latter are more likely in exploratory analyses of languages.

The negative consequence here might be that the "clear case" for GAMs is made within the less complex pattern in a well-studied language, while no one can make heads or tails of all the analyses in the less well-studied language. I see this as just an extension of linguistic common ground as privilege. Yet, now it's done with statistics.




Sunday, January 8, 2023

The "Bender rule" in some linguistics journals in 2022

The Bender rule is the informal idea that one ought to explicitly mention the name of a language in a publication on language and linguistics. It is named after Emily Bender, a computational linguist at the University of Washington (Seattle) who has written and discussed the need to be explicit about languages that one studies. The impetus behind it is the observation that studies on English (or other commonly-studied languages) are typically understood as a default norm, while less commonly studied languages are more likely to be overtly mentioned. This contributes to a biased perspective in linguistics that only the conclusions from studies on English contribute to a general picture of Language, while similar conclusions from studies on other languages reflect language-specific phenomena and are less generalizable. A similar issue arises in work on indigenous languages that I've written about before.

People have talked about the Bender rule since 2019. I'd like to think that linguists have paid attention to what this means in academic publications since then. After all, it would be fairly simple for journal editors or editorial boards to implement a policy where languages are mentioned in titles or in abstracts. After all, people often read/skim the titles and abstracts of most publications without investing in more time to read all the details. If one were to apply the Bender rule to titles and/or abstracts (and yes, I am suggesting it), it has the additional benefit of helping your librarians organize publications better by topic language.

So, how have some popular journals fared in 2022? Are many publications mentioning the languages of study? I thought I would look at two popular journals that I am familiar with: the Journal of Memory and Language (JML), and the Journal of Phonetics (JPhon). Both journals heavily focus on experimental research. I decided to include two separate measures here: does the journal article mention the language of study in the title? and does it mention it in the abstract? I have excluded publications that reflect surveys of methodological reports, as these lack experiments and they tend not to focus on individual languages anyways.

For JML, between January 2022 - present, 43 relevant articles have been published. Of these, just 2/43 mention the language of study in the title. Within the abstracts, 8/43 articles mention the language of study. Studies that explicitly mentioned languages were those on Mandarin Chinese, ASL, and those involving bilingual populations.

For JPhon, between January 2022 - present, 40 relevant articles have been published. Of these, 18/40 mentioned the language of study in the title. Within the abstracts, 35/40 articles mention the language of study. 

Why might these numbers (and practices) might be so different across journals? Are the psycholinguistic patterns found in brains and minds in the articles in JML fundamentally different in terms of their language-specificity from studies on phonetic memory/perception, speech planning, speech coordination, and speech articulation found in JPhon? In other words, is it that only the phoneticians need worry about the Bender rule?

I think most phoneticians would probably state that a study on the articulatory and acoustic phonetics of one language is bound to be fundamentally different from a similar study on another language. Thus, there is less of an expectation that one's findings will immediately generalize to all of Language. Rather, one draws conclusions and amasses evidence for common patterns by looking across a large enough sample of languages. Existing theories are examined, tested with new data, and revised.

I don't know what psycholinguists believe here though. Perhaps it is the case that many still believe that English-focused studies in psycholinguistics are always uncovering something fundamental about Language in a way that studies in phonetics are not, despite apparent evidence to the contrary. I have to doubt that though. I know many psycholinguists and they seem to be a pretty open-minded group. For the time being, it would seem like JML is failing the Bender rule.


Friday, August 5, 2022

Open projects for collaboration

Open projects for possible collaboration

In the Whova event page for the meeting in Laboratory Phonology, I started a thread with the title "How are better collaborations created?" The goal of this was to really ask the question of "what works?" with labphon-related projects that involve multiple people and institutions. I suppose there is another kind of guilty reason - I have several things I've worked on but many things that are at various earlier stages of development. It would be great to see people tackle some of these types of projects and also be involved with other things along the way.

I received feedback from eight people: Jen Nycz, Anne Pycha, Valerie Freeman, Paul De Decker, Miao Zhang, Bihua Chen, Ivy Hauser, and Timo Roettger. 

Paul started by asking whether it is really clear if people want to collaborate. Is there a mechanism that we can think of to make this known to others? Valerie suggested a "collaborator's corner" at conferences with skills/preferences for a particular project. She also mentioned a resource for ensuring that collaborators are on the same page with regard to goals. Jen mentioned how each person has their strengths/preferences in research projects and that we might try to match along these preferences. This way we would be truly aiming to find not just collaborators, but ideal collaborators where all parties benefit. Ivy mentioned that more intentional networking at conferences might serve some of these goals. Timo's idea involved a special session proposal for a conference (maybe the next LabPhon?).

I like all these ideas. I think there are some separate threads:

(a) Identification. We could identify what we're doing and discuss our project goals with others. Maybe this is the collaborator's corner that become part of the networking process at conferences?

(b) Needs/Wants. We could focus on really identifying what we would like with each of the projects we are working on. Is it in the idea stage? Is the data already collected? Is the data already annotated? Is the data ready for analysis? Where are you stuck and what would you like to collaborate on?

(c) Goals and agreements. As per Ivy's point, each project could have a timeline and set of goals that collaborators agree upon. Is the project part of a larger project? Do you want to submit a paper this year? Next year? What about author order in submission? Will the collaboration continue or end at a certain point? Who is responsible for managing goals?

With these in mind, I'm going to try to identify some of my own projects that are seeking collaboration.

1. Speech rate and lenition in Spanish

Back in 2010, I collected a set of recordings from 9 young Oaxacan Spanish speakers (ages 19 - 26). They produced a short read passage (Sleeping Beauty), a retelling of a narrative after a short video (the pear story), and a free narrative. The initial goals of this project were to examine speech rate variation across speech styles across different dialects of Spanish. The cross-dialectal goal did not work out, but the data remains.

Team: Myself (UB Department of Linguistics, Colleen Balukas (UB Romance Languages and Literature), Jamieson Wezelis (UB Romance Languages and Literature)

The current goals of this project are rather open, but we have considered three topics:

a. An exploratory study on vowel sequences and vowel hiatus patterns across word boundaries. There is a literature on this topic in Spanish phonetics, but not with spontaneous speech data (and certainly not across speech styles).

b. An exploratory study on aspects of vowel reduction in Oaxacan Spanish.

c. An exploratory study on patterns of vowel devoicing in Mexican Spanish.

The eventual goal would be one (or more) papers on the acoustic phonetics of spontaneous speech in Spanish.

The current state: All the recordings have been trasnscribed in ELAN and force-aligned. The read passages have also now been hand-corrected. All recordings have also been syllabified using a custom Praat script. However, Jamieson can no longer be actively involved in the process of hand correction of the data.

An ideal collaborator is (1) interested in helping with the remaining hand-correction of the acoustic recordings (roughly 1.5 - 2 hours worth), (2) is either interested in one of our goals or has their own which we could all pursue once the alignments are corrected, (3) has some knowledge of statistics as it applies to analyzing acoustic phonetic data, (4) is interested in delving into some of the literature in Spanish phonetics (lots of dissertations), and (5) is literate in Spanish.

Timeline: We're kind of stuck right now (no progress for about a year), but we can devote some time to this starting in the next semester. It would be great to see results in 2023 (a talk, a paper, etc).

Bonus: I'm open to data sharing after collaboration.

2. Glottal reduction in Itunyoso Triqui

Throughout the course of my language documentation and phonetic data analysis grant, we collected about 35 hours of spontaneous speech in Itunyoso Triqui, an Otomanguean language spoken in Oaxaca, Mexico. Triqui languages are rather tonally complex and have orthogonal contrasts involving glottal consonants (/ʔ, ɦ/). While there is some description of glottalization in the language (DiCanio 2012), there is an open question as to how much lenition of glottal stops occurs. The goal would be to analyze the acoustic data to examine variation in the production of the glottalization. We are particularly interested in variation in glottalization as a function of word position (VCV vs. VC#) and contrast type (pre-glottalized sonorant vs. glottal stop). This project would tie in nicely with recent work on Hawaiian glottal stops (Davidson 2021).

Team: Myself (UB), Lisa Davidson (NYU), Richard Hatcher (postdoc, Hanyang University - former UB grad student)

The current state: All of the recordings are force aligned with a custom-built aligner for Triqui. The recordings of interest have also been hand-corrected. We have begun some analysis of variation in production of the glottalization using a script I wrote for Praat which allows users to identify glottal reduction types. We presented preliminary results from this work at Haskins Laboratories in Fall 2021. We would like a collaborator to help us analyze more of the existing data.

An ideal collaborator is (1) interested in non-modal phonation type in complex tone languages, (2) has some knowledge of the phonation literature and acoustic phonetics, (3) is familiar with running voice quality scripts in Praat (or at least scripts), (4) has some knowledge of statistics as it applies to analyzing acoustic phonetic data, (4) is interested in judging patterns of glottal reduction in field recordings, and (5) would like to get involved with work on phonetic variation in Itunyoso Triqui.

Timeline: We have not made new progress for about a year, but some of us can devote some time to this starting in the next semester. It would be great to see results in 2023 (a talk, a paper, etc).

Bonus: I'm open to data sharing after collaboration.

3. Triqui clitic phonetics study

Certain Triqui person clitics (speech act participant clitics) condition tonal changes on the right edge of the root they attach to. This is described in the literature on the language (DiCanio 2008, 2016, 2020, 2022). Consider that the 2S clitic /=ɾeʔ¹/ conditions (1) tonal raising on certain roots, e.g. /ɾa³ʔa³/ 'hand / mano' > /ɾa³ʔa⁴=ɾeʔ¹/ 'your hand', (2) leftward, low-tone spreading on others, e.g. /ka⁴ne⁴³/ 'bathed / se bañó' > /ka⁴ne¹=ɾeʔ¹/ `you bathed', and (3) no tonal change on others, e.g. /ki³ɾi¹/ `took out / sacar' > /ki³ɾi¹=ɾeʔ¹/ `you took out'. There are two research questions here. First, there is an empirical question as to what these tonal changes look like for roots containing the 9 lexical tones. Of particular interest is the observation that, in those roots where no tonal changes occur, pre-clitic lengthening may. Second, utterance-final prosodic lengthening takes place for lexical roots (DiCanio & Hatcher 2018, submitted), but the conditions on this are quite limited (almost no lengthening takes place for roots ending with coda /ʔ, ɦ/). Moreover, is prosodic lengthening limited to roots or may it also affect clitics? The study here sought to try to answer these empirical questions for Itunyoso Triqui.

Team: Myself (so far)

The current state: This has been on hold for 4 years now. The recordings that were collected alongside this data has been analyzed (DiCanio & Hatcher 2018, submitted). The relevant stimuli were recorded in 2018, consisting of 224 trials with target words in clitic and non-clitic conditions, in both utterance-final and non-final position, repeated 5 times per speaker, with 10 speakers (11,200 sentences). This data has not yet been transcribed or segmented in Praat, though all the stimuli and their (random) order of presentation are saved in an Excel file, so transcription should be relatively straightforward.

An ideal collaborator is (1) interested in tone production and the phonetics of tone sandhi, (2) has some knowledge of acoustic phonetics and Praat, (3) has some knowledge of statistics as it applies to analyzing acoustic phonetic data, and (4) is interested doing speech segmentation work with this data.

Timeline: No work has taken place on this since the recordings were made. It's a big project given the amount of data and speakers. So, it's completely open. I imagine an analysis of the data alongside segmentation would take at least several months with a few researchers.

Bonus: I'm open to data sharing after collaboration.

Sunday, November 28, 2021

On the lexicalization of Triqui compounds

In the process of doing historical reconstruction, one is often led to believe that the conditioning factors leading to sound change are specific to a phonotactic context, i.e. one finds /k/ > [tʃ]/_i and perhaps only in onsets. Yet, there are several variable patterns in Itunyoso Triqui compounds that suggest that stress-induced simplification might also cause unique types of sound changes.

As a bit of background, it is important to know that Itunyoso Triqui words are mostly polysyllabic. About 70% of the lexicon is disyllabic or trisyllabic roots. Though, monosyllabic roots have higher token frequency in running speech (as per Zipf's law). The final syllable of these morphemes has special status. It is phonetically longer than non-final syllables and most of the contrasts occur on the final syllable (cf. DiCanio 2010).

What occurs in the final syllable in a polysyllabic word?
a. Every possible tone: /1, 2, 3, 4, (4)5, 13, 32, 43, 31/.
b. All consonants: /p, t, k, kʷ, tʃ, ʈʂ, ʔ, m, n, ⁿd, ᵑɡ, ᵑɡʷ, ɾ, β, s, l, j, ˀm, ˀn, ˀⁿd, ˀᵑɡ, ˀɾ, ˀβ, ˀl, ˀj/.
c. All vowels: /i, e, a, o, u, ĩ, ã, ũ/
d. Coda consonants /ʔ, ɦ/ (though all syllables are otherwise open).

What occurs in the non-final syllable of a polysyllabic word?
a. Only level tones /1, 2, 3, 4/, but the caveat is that tones /1/ and /4/ are not truly contrastive here - they only occur due to leftward tonal spreading onto the non-final syllable (cf. DiCanio, Martínez Cruz, and Martínez Cruz 2020). So, really it's just tone /2/ and tone /3/ that contrast here.
b. Only simple consonants (no prenasalized stops, no glottalized sonorants, no glottal stop): /p, t, k, kʷ, tʃ, ʈʂ, m, n, ɾ, β, s, l, j/.
c. Only oral vowels /i, e, a, o, u/ and mid vowels only occur if they also occur in the final syllable. So, really just /i, a, u/ are contrastive here.
d. All syllables are open.

So, we have many asymmetries in which sounds occur by syllable. We can call this stress or prominence or whatever term you wish, but the patterns above occur mostly without exception.

There is an additional observation too - a contrast between singletons and geminates only occurs in monosyllabic words, e.g. ta³ 'this' vs. tta³ 'field', nũ³² 'be inside' vs. nnũ³² 'epazote.' This contrast does not occur in polysyllabic words (cf. DiCanio 2010, 2012).

Now that we know about the stress-based consonant patterns, what does this mean for sound change? Consider that one very common type of word formation process in Triqui (and in Otomanguean languages more generally) is compounding. When each morpheme of a compound retains some of its phonological identity as a distinct root, there may be no sound changes. Yet, if the compound begins to lexicalize, the restrictions on phonological distributions above start to cause rather robust changes. Let's look at some examples.

1. The Triqui word 'de veras/truly' is a reduplicated form yya¹³ yya¹³, literally meaning 'true true.' Most adverbs in the language appear post-verbally before personal clitics (V+ADV+SUBJ order), so clitic morphophonology applies to them. The 1P clitic involves a > o, glottal stop insertion, and tone 4. Yet, with this word you get yyo¹³ yyoʔ⁴, with vowel harmony. Then with lexicalization, you can't get a contour tone on a non-final syllable and no geminates are permitted in polysyllabic words, so it's yo³yoʔ⁴.

2. The Triqui word 'each' is a reduplicated compound  ᵑɡo² ˀᵑɡo² 'one-one.' Yet, it is often pronounced as [ko²ˀᵑɡo²] in running speech. You lose the prenasalized stop in the penultimate syllable as per the patterns above.

3. The Triqui word 'soda/soft drink' is a compound nne³² tsiʔ¹ 'water + sweet.' Yet, it is often pronounced as [ne³siʔ¹]. You lose the contour tone and the gemination on the penultimate syllable because neither are permitted there.

4. The Triqui word for 'bread' is a historical compound /ʈʂːa³ ʈʂũɦ⁵/, lit. tortilla+horno (tortilla del horno). It is pronounced as [ʈʂa³ʈʂũɦ⁵] by older speakers but as [tʃa³tʃũɦ⁵] by younger speakers (who have mostly merged the retroflex and post-alveolar affricates). The historical gemination of 'tortilla' has been lost here.

5. The Triqui word for 'rifle' is [ʈʂu³ʈʂi³aʔ³], but the roots are ʈʂːũ³ 'wood' + ʈʂi³aʔ³  'to shoot.' In the compound, we see observe degemination (because it's in a disyllabic word now) and loss of the vowel nasalization too. And as mentioned above, many speakers now produce the retroflex series as post-alveolar.

I am mentioning this examples here because, as per Rensch (1976), it is extremely difficult to reconstruct non-final syllables in many Otomanguean languages. It may be that (a) processes of reduction in unstressed syllables and (b) a general pattern of distributional asymmetries in the phonological inventories will help to reconstruct them. The [k] you observe that comes from a reduced [ᵑɡ] (as in #2 above) might only occur in a handful of words because reduplicated compounds are relatively uncommon in Otomanguean languages.

In sum, neutralization due to stress-based distributional asymmetries can lead to superficial similarities between words, e.g. the /n/ onset in #3 'soda' is from */nn/ while a different word like /ne³tã³/ 'ejote/green bean' is probably related to Mixtec words like /ñityì/ (SJC Mixtec) where onset /n/ has a */ny/ reflex. 

Saturday, April 17, 2021

Linguistic tidbit

Some linguists obsessed with a theory of all
forget there are others who need to think small,
of how to inflect a verb that's perfective
or reasons why 'so' isn't just a connective.

And others might glean an elaborate fact
from language in use as a societal act
with agents whose motives are far from mundane
but an essence of self quite hard to contain.

There's meaning and purpose in digging quite deep
at cognates in history whose meaning we keep,
And time to get lost in the tangle of weeds,
a morphological context and the pattern it feeds.

And many a language, pattern, and word
hold secrets and histories that we've never heard
Of just how a people connect with the past
or just how a pattern changes so fast.

So before you admonish the detail-obsessed
those whose minutiae is seldomly blessed
with an appearance in Nature or Science and so
appears to be findings you don't need to know.

An ego obese with a theory so tangled
Can deflate in an instant when new data is wrangled.
Consider that details, however so small
are the basis of asking the biggest questions of all.

Saturday, January 2, 2021

What does not work for sentence elicitation with Triqui speakers

One part of doing fieldwork is discovering just what does not work while you're in the field. Several summers ago, after receiving some critical methodological remarks from a reviewer on a submission of mine, I started to seriously question just what works in my fieldwork.

We're all addicted to our past methods and sometimes we need a jolt to reconsider what we're doing in the field. I have a tendency to rely a lot of repetition among speakers because there is no literacy among most speakers in Triqui. There are three options for elicitation here, as it happens. One possibility is to just ask for speakers to provide a translation of a Spanish sentence, another is to have them see some image and describe it, and another is to have them repeat after another speaker who can read Triqui (my main consultants).

I rely a lot on the third method, but it's possible that Triqui speakers will overly mimic what the other speaker is doing when they are doing this. (There is a serious question as to what they would mimic - there is no non-tonal prosody in the language, but perhaps speech rate and optional pauses?) So, this logically leads reviewers and other linguists to suggest the first two options above. We can toss out the second option for anything that involves more carefully-controlled speech. If you are want to use identical nouns but change the verbs, for instance, this simply leaves way too much open to interpretation. Speakers will never provide the target sentence.

But what about the first option? This also often fails for various reasons. I'm in the process of looking at a large data set examining tonal changes with person morphology in Triqui across 11 speakers. We tried the translation method, but it regularly fails with speakers. Here's a transcription of one exchange:

Consultant: Cantaste una canción (You sang a song.)
Speaker: Ka³ra⁴³ ngo² chah³  (I sang a song.)
Consultant: Ka³raj⁵ ngo² chah³ (You sang a song.)
Speaker repeats consultant

Many fieldworkers might laugh at the following exchange - asking people to get personal pronouns correct in translation is a common issue. But if you're looking at how words change tone with personal pronouns, then it's important to get right.

There is an added issue though - we often assume that we can examine speech in translation because we assume strong bilingualism or a clear 1:1 mapping between words in a lingua franca and words in a language we're investigating in a field context. Sometimes neither can be found. In the same recording, we observe the speaker becoming confused when he has to distinguish between lavas 'you are washing' and lavaste 'you washed' in Triqui. 

Consultant: Lavas la ropa. (You wash the clothing.)
Speaker: nan...[s]... (long pause)
Me: Nanj⁵ reh¹...
Speaker: Nan⁴³ (I wash)....(pause)
Consultant: Nanj⁵ reh¹ a⁴sij⁴ (You wash the clothing.)
Speaker repeats consultant

In this exchange, the speaker is caught off guard because he is either uncertain about the aspect marking of 'wash' (as the previous exchanges involved him producing it with the perfective prefix - ki³nanj⁵) or he is confused about the pronominal referents again. The result is the same though - the speaker ends up relying on repetition from another speaker/consultant.

If you have to rely on repetition, perhaps a way around it is to have speakers count between hearing a sentence and repeating it. If the concern over repetition in elicited speech contexts in the field is that speakers are likely to mimic, then counting before repeating might resolve this. The idea here is that counting takes time and auditory memory decays quickly. So, if speakers have to say "one, two, three" (or ngoj¹³ bbi¹³ ba¹hnin³ in Triqui), then their reproductions of the target sentences might more closely resemble long-term memory representations for the words in the short sentences. I owe this idea to Lisa Davidson (via one of our interesting Facebook/Twitter discussions).

But in practice this only kinda ends up working. Speakers can do this, but they end up sometimes forgetting the target sentence. So, you get exchanges like the following:

Consultant: Ka³ne³ ni²hrua⁴¹ reh¹ chu⁴ba⁴³ beh³ (Te sentaste mucho en la casa.)
Speaker: ngoj¹³ bbi¹³ ba¹hnin³... ka³ne³..... ka³ne³...
Consultant: Ka³ne³ ni²hrua⁴¹ reh¹ chu⁴ba⁴³ beh³ 
Speaker repeats consultant

In effect, it is hard to pay attention to reproducing specific sentences when you have to produce numbers first. So, the end result is to just repeat what the consultant has said. When you add the additional stress of being recorded to this (many speakers become nervous knowing they are recorded), this can produce pauses/errors in the elicitation.

So, what is the way around all of this? One thing we might address head on is the assumption of mimicry. We seem to believe that all speakers/participants, when asked to repeat words, will focus on the specific phonetic characteristics of the signal they heard instead of the content. Though, the jury on this is still out. I have found two papers that have addressed the question - Cole and Shattuck-Hufnagel (2011) and D'Imperio, Cavone, and Petrone (2014). In both cases, speakers were told to explicitly imitate the form of the speech signal and they mostly imitated pitch accents, but not F0 level. In a language where only level is adjustable (lexical tone is fixed), what predictions does this previous work make for Itunyoso Triqui? I'm testing this with a study I ran in 2019. There is no work on what tone languages speakers do in such tasks (and we have no idea about what happens when the concern is just getting the words right - not trying to imitate fine phonetic detail).

I wish I could find an ideal way to do careful elicitation that was immune to these concerns. In the meanwhile though, prosody-folks might consider a warning mentioned in DiCanio, Benn, and Castillo García (2020) - no method for the elicitation of prosody is immune to stylistic effects. Read speech is just as much a speech style as repeated speech and most languages have no writing system or literacy (Harrison 2007). That means that this methodological concern must be addressed as we look at prosodic systems across more of the world's languages.

References:
Cole, J. and Shattuck-Hufnagel, S. (2011). The phonology and phonetics of perceived prosody: What do listeners imitate? In Proceedings from Interspeech 2011, pages 969–972. ISCA.

D’Imperio, M., Cavone, R., and Petrone, C. (2014). Phonetic and phonological imitation of intonation in two varieties of Italian. Frontiers in Psychology, 5(1226):1–10.

DiCanio, C., Benn, J., and Castillo García, R. (2020). Disentangling the effects of position and utterance-level declination on the production of complex tones in Yoloxóchitl Mixtec. Language and Speech, Onlinefirst (https://journals.sagepub.com/doi/10.1177/0023830920939132):1–43.

Harrison, K. D. (2007). When languages die. Oxford University Press.