Wednesday, November 20, 2024

November thoughts

11/19/24

I believed that I could put aside my anxious spiraling, catastrophizing thoughts into a box. I mostly succeeded in doing so. I quit social media a week before the election. It's now been about 3.5 weeks without Facebook. I eliminated my Twitter account. I've eliminated NY Times (about a month ago). I don't read the news. I actively try to avoid it.

I've embraced the new life that has come into Bsky (blue sky), but it's with some trepidation. If people post about politics, either via worry about the incoming president's staff picks or worrying about potential laws and actions, I can't help thinking about it throughout the day. It's certainly not as paralyzing as my dread in October was, but it's creeping in that direction.

I find that I'm still left with these strong feelings about politics. I get angry at the idea of completely unprofessional people being in charge of things. I get angry at how much I fear I will have to fight to prove my value, or to prove that my values are worth preserving. I worry I might have to spend years of my time in the future fighting a system that will eventually collapse anyways, all while my gentle, curious mind is snuffed out by all the chaos and noise. Will I grow to believe that research is futile? Is there no place for introverts in the future of America?

All these things have been on my mind of late. They're theoretical scenarios, I know, so nothing is real yet. But believing in what is happening, I find that I am forced to make some very negative conclusions about Americans - ones that I don't feel that I made in the past. This is what concerns me.

I used to feel that people were simply ill-informed about a lot of issues. That probably has always been true, but I find myself believing that people don't care about what is true nowadays. That's more concerning. And where my brain goes with this is to believe that others don't care about knowledge, just power. If knowledge has no access to power, then in a selfish world, it never makes sense to pursue it. You're left concluding that knowledgeable people are expendable. I'm also left concluding that most people believe I don't matter.

I feel like I used to believe that, fundamentally, people should care for each other. Show someone examples of how others struggle or talk to people about your life and they will empathize. It's human nature. Yet, we're given a political landscape where people have excuses to reject most other people. There are excuses to dislike immigrants. There are excuses to dislike Latinos and Black people. There are excuses to dislike LGBTQ folks. Accept all of them and you're left with people who have drunk the kool-aid of structural prejudice and dog-whistle politics. And it's not that I didn't believe that prejudiced people existed before, it's that I believed that they would want to change. If I now believe people don't want to change or learn about others, then I have trouble believing in their fundamental goodness. 

Taken to its conclusion, if no one wants to change their views about the world or others, then they learn nothing from me. After all, it makes no sense to engage with people who have no interest in learning from others. Dark thoughts, surely.

So, without social media, as I'm not anxiously spiraling worrying about the future, my brain has still made what seems to me to be very logical conclusions about the nature of people. I suppose that the scientist in me wants proof that most people are still fundamentally good, while the child in me wants to believe that I don't live in a country as hateful as I fear it is.

Sunday, December 10, 2023

On the generalization of linguistic discovery

Discovery is a crucial part of the evolution of most academic disciplines that take a scientific approach towards understanding the world. New empirical evidence of a phenomenon leads researchers to re-examine old perceptions they had. Or rather, as Kuhn (1962) would argue, those with the old perceptions of the world eventually die or fade away while those who only have these newer perceptions mature.

But how do we generalize discovery? There are certainly many disciplines where discovery is generalizable. Findings in many of the physical sciences and mathematics are truths that will continue to be true forever. Discover a solution to a long-held mathemetical problem and it will remain true from now on. 

In the social and cognitive sciences though, discoveries seem somewhat murkier. Where they relate to biological, neurobiological, biophysical principles, the discoveries seem more generalizable. In my main sub-discipline, phonetics, there are clear physical relationships between what a person does with their speech articulators and what this produces in an acoustic signal, for instance. This is true across languages because all humans have similar oral and laryngeal anatomy. Yet, since speakers can massively vary just how they produce similar speech sounds, generalization is challenging here too.

Where they do not relate to biological or physical principles, behavioral and linguistic discoveries are usually observational findings restricted to a certain type of population. Generalization here necessarily needs to proceed to multiple experiments or studies with different types of populations. From a linguists' perspective (and I can only speak as a linguist here), that necessarily means that discoveries need more languages. 

There's a danger here that comes out of a kind of science-envy with behavioral and linguistic sciences. Though some of the methods in the social/behavioral sciences have become more scientifically rigorous (mostly in relation to statistical testing and modeling), the findings are not magically more generalizable to new populations than they were in the past. Discovering that college-aged speakers of English prefer certain syntactic structures over others does not mean anything about any other language unless subsequent research is undertaken. It might make predictions about patterns in other languages, but predictions are not generalizations.

Can we ever generalize about "Language"? What if we can't?

There are a lot of half-truths that linguists hold about "Language" that arise from a casual extension of findings in a few languages. Demonstrate that some linguistic phenomenon occurs in American English, Spanish, and German and linguists will believe it is a universal or "strong tendency" without a very clear criterion for what "universal" or "strong tendency" would mean.

Why be so careful with formal and statistical methods but so uncareful regarding the scientific bread-and-butter of hypothesis testing? The answer seems to lie in a kind of all-or-nothing perspective about where linguistic discoveries have value to a discipline. Linguists either believe linguistic patterns demonstrate unique characteristics of individual languages or populations; -or- they are universal patterns reflecting something deep about human evolution or murkier things like universal grammar. The field tends to narrowly merit the latter type of work since it is smells like a generalization.

This all-or-nothing approach means that we often come up empty-handed when we wish to talk about the relevance of our findings to the discipline - we're delving deeply into specific languages with an empirical or historical goal or we're looking broadly (and more superficially) at patterns in a larger number of languages. What might exist in the middle? We're a small discipline examining a huge topic with a gigantic amount of variation. We can't do it all.

I think one future path for the discipline is to take a note from the quantitative revolution that has occurred over the past 20-25 years in the discipline. The more we examine phenomena that we once believed to be discrete (x occurs in context A, but y occurs in context B), the more we discover that these are strong statistical tendencies instead. And the reason for this is that linguistic phenomena are behavioral. They are not the formal mathematical proofs that remain true forever after being solved. We just keep wanting to commit our error of generalization because of this science envy.

Might there not be any true linguistic universals? Maybe there are but we can never be typologically-balanced enough to prove anything more than fairly superficial patterns. Maybe there aren't any at all and this is ok. Languages are endlessly fascinating and we can still demonstrate how many languages work along statistical lines. The idea that there is massive inter-language variation and that this is structured to occur in certain types of languages necessarily means that we can look at types of languages to construct complex cross-linguistic hypotheses. To provide a concrete example, do speakers of fusional languages or those with non-concatentive morphology store words differently than speakers of isolating languages? This is an interesting question but it does not require a model of what must be universal. It just requires experiments and cross-linguistic research.

This is a blog post, so take my musings with a grain of salt. I don't have the answers to my own subdiscipline, let alone all of linguistics. I think though that we need to be more careful distinguishing between the things that we believe are proven/demonstrated and the things that are demonstrated typological patterns or universals. 

Wednesday, November 1, 2023

Issues in choosing a statistical model in phonetics

What's the bar for deciding to use a new statistical model in research? It seems like often enough within linguistics or speech science, one chooses a model based on what is à la mode. That frequently translates into increasing complexity.

Is it always good to have a more complex model? No. It might reveal more intricate interactions in the data. It might also model interactions between terms better than competing models, usually by improving fit with non-linear terms (cf. GCA, GAMMS). Yet, there are missing evaluative criteria for choosing a model that end up being crucially important. 

1. Is the model easily implementable and understandable? 

If a model is easy to implement and understand, then it is easy enough for new users to emerge and for a set of standards to come about. Yet, if neither of these things are true, there is severe social cost. 

If there are a handful of researchers proposing using a new model, is there an existing infrastructure that can help with training and implementation? Usually there is not and, as a consequence, many researchers get frustrated if the field pushes a model where no infrastructure exists. The same people proposing the model will end up fielding hundreds or thousands of questions about how to use it. And nobody has time for that.

Now, why might the field (or paper reviewers, most likely) decide that everyone has to use one particularly new and popular model for one's data? Sometimes important new factors are discovered that need to be modeled. But sometimes it's just the impostor syndrome, i.e. we are only a serious field if we have increasingly more mathematically opaque models for our data. And it's easy to give a post-hoc reason to include all possible factors when our predictions are so weak.

2. Does the model enable us to generalize?

Do we actually need to model as many of the details as we can? Even models that take a fairly generic approach to avoiding overfitting can end up overfitting things like dynamics. Resultingly, researchers lose time needing to discuss details that end up being unimportant and we end up losing the ability to generalize.

I'll provide one personal example of this. In my co-authored paper on the phonetics of focus in Yoloxóchitl Mixtec, we provided statistical models for f0 dynamics alongside statistical models for midpoint f0 values. There is certainly good reason to model changes in f0, but in a language with a number of level tones (and tone levels), this type of modeling might not say much. Indeed, we found mostly the same results when we looked at f0 midpoint for many of the level tones than when we looked at dynamic trajectories for them. Including two sets of models resulted in twice as many statistical tests and twice as much reporting.

Why did we choose to do this? We favored being comprehensive over possibly missing some unknown pattern (maybe the lower level tones had some different dynamic behavior?) Given the subtlety of the resulting patterns, it's hard to say what might be important.

Nowadays, I think we would be asked to choose to use GAMS instead of the mixed effects modeling. Yet, that also results in a statistical bloat (e.g. you have to model each tone separately). The results of our research should lead us to make scientific conclusions about speech, not get lost in 101 statistical tests where we spend time analyzing our three-way interactions. 

I don't know the right answer to how the field might address this issue, but I do not believe that it has to do with reducing the purview of one's study. GAMs are great if you are looking at one pattern in one language, but they are terrible for generalizing over a language's inventory (of vowel formants, of tones, of prosodic contexts, etc). One finds either studies using GAMs for limited topics (one vowel or one context) or studies where 101 models are included to provide a comprehensive account of a language's patterns. The former are more likely in studies examining well-studied languages while the latter are more likely in exploratory analyses of languages.

The negative consequence here might be that the "clear case" for GAMs is made within the less complex pattern in a well-studied language, while no one can make heads or tails of all the analyses in the less well-studied language. I see this as just an extension of linguistic common ground as privilege. Yet, now it's done with statistics.




Sunday, January 8, 2023

The "Bender rule" in some linguistics journals in 2022

The Bender rule is the informal idea that one ought to explicitly mention the name of a language in a publication on language and linguistics. It is named after Emily Bender, a computational linguist at the University of Washington (Seattle) who has written and discussed the need to be explicit about languages that one studies. The impetus behind it is the observation that studies on English (or other commonly-studied languages) are typically understood as a default norm, while less commonly studied languages are more likely to be overtly mentioned. This contributes to a biased perspective in linguistics that only the conclusions from studies on English contribute to a general picture of Language, while similar conclusions from studies on other languages reflect language-specific phenomena and are less generalizable. A similar issue arises in work on indigenous languages that I've written about before.

People have talked about the Bender rule since 2019. I'd like to think that linguists have paid attention to what this means in academic publications since then. After all, it would be fairly simple for journal editors or editorial boards to implement a policy where languages are mentioned in titles or in abstracts. After all, people often read/skim the titles and abstracts of most publications without investing in more time to read all the details. If one were to apply the Bender rule to titles and/or abstracts (and yes, I am suggesting it), it has the additional benefit of helping your librarians organize publications better by topic language.

So, how have some popular journals fared in 2022? Are many publications mentioning the languages of study? I thought I would look at two popular journals that I am familiar with: the Journal of Memory and Language (JML), and the Journal of Phonetics (JPhon). Both journals heavily focus on experimental research. I decided to include two separate measures here: does the journal article mention the language of study in the title? and does it mention it in the abstract? I have excluded publications that reflect surveys of methodological reports, as these lack experiments and they tend not to focus on individual languages anyways.

For JML, between January 2022 - present, 43 relevant articles have been published. Of these, just 2/43 mention the language of study in the title. Within the abstracts, 8/43 articles mention the language of study. Studies that explicitly mentioned languages were those on Mandarin Chinese, ASL, and those involving bilingual populations.

For JPhon, between January 2022 - present, 40 relevant articles have been published. Of these, 18/40 mentioned the language of study in the title. Within the abstracts, 35/40 articles mention the language of study. 

Why might these numbers (and practices) might be so different across journals? Are the psycholinguistic patterns found in brains and minds in the articles in JML fundamentally different in terms of their language-specificity from studies on phonetic memory/perception, speech planning, speech coordination, and speech articulation found in JPhon? In other words, is it that only the phoneticians need worry about the Bender rule?

I think most phoneticians would probably state that a study on the articulatory and acoustic phonetics of one language is bound to be fundamentally different from a similar study on another language. Thus, there is less of an expectation that one's findings will immediately generalize to all of Language. Rather, one draws conclusions and amasses evidence for common patterns by looking across a large enough sample of languages. Existing theories are examined, tested with new data, and revised.

I don't know what psycholinguists believe here though. Perhaps it is the case that many still believe that English-focused studies in psycholinguistics are always uncovering something fundamental about Language in a way that studies in phonetics are not, despite apparent evidence to the contrary. I have to doubt that though. I know many psycholinguists and they seem to be a pretty open-minded group. For the time being, it would seem like JML is failing the Bender rule.


Friday, August 5, 2022

Open projects for collaboration

Open projects for possible collaboration

In the Whova event page for the meeting in Laboratory Phonology, I started a thread with the title "How are better collaborations created?" The goal of this was to really ask the question of "what works?" with labphon-related projects that involve multiple people and institutions. I suppose there is another kind of guilty reason - I have several things I've worked on but many things that are at various earlier stages of development. It would be great to see people tackle some of these types of projects and also be involved with other things along the way.

I received feedback from eight people: Jen Nycz, Anne Pycha, Valerie Freeman, Paul De Decker, Miao Zhang, Bihua Chen, Ivy Hauser, and Timo Roettger. 

Paul started by asking whether it is really clear if people want to collaborate. Is there a mechanism that we can think of to make this known to others? Valerie suggested a "collaborator's corner" at conferences with skills/preferences for a particular project. She also mentioned a resource for ensuring that collaborators are on the same page with regard to goals. Jen mentioned how each person has their strengths/preferences in research projects and that we might try to match along these preferences. This way we would be truly aiming to find not just collaborators, but ideal collaborators where all parties benefit. Ivy mentioned that more intentional networking at conferences might serve some of these goals. Timo's idea involved a special session proposal for a conference (maybe the next LabPhon?).

I like all these ideas. I think there are some separate threads:

(a) Identification. We could identify what we're doing and discuss our project goals with others. Maybe this is the collaborator's corner that become part of the networking process at conferences?

(b) Needs/Wants. We could focus on really identifying what we would like with each of the projects we are working on. Is it in the idea stage? Is the data already collected? Is the data already annotated? Is the data ready for analysis? Where are you stuck and what would you like to collaborate on?

(c) Goals and agreements. As per Ivy's point, each project could have a timeline and set of goals that collaborators agree upon. Is the project part of a larger project? Do you want to submit a paper this year? Next year? What about author order in submission? Will the collaboration continue or end at a certain point? Who is responsible for managing goals?

With these in mind, I'm going to try to identify some of my own projects that are seeking collaboration.

1. Speech rate and lenition in Spanish

Back in 2010, I collected a set of recordings from 9 young Oaxacan Spanish speakers (ages 19 - 26). They produced a short read passage (Sleeping Beauty), a retelling of a narrative after a short video (the pear story), and a free narrative. The initial goals of this project were to examine speech rate variation across speech styles across different dialects of Spanish. The cross-dialectal goal did not work out, but the data remains.

Team: Myself (UB Department of Linguistics, Colleen Balukas (UB Romance Languages and Literature), Jamieson Wezelis (UB Romance Languages and Literature)

The current goals of this project are rather open, but we have considered three topics:

a. An exploratory study on vowel sequences and vowel hiatus patterns across word boundaries. There is a literature on this topic in Spanish phonetics, but not with spontaneous speech data (and certainly not across speech styles).

b. An exploratory study on aspects of vowel reduction in Oaxacan Spanish.

c. An exploratory study on patterns of vowel devoicing in Mexican Spanish.

The eventual goal would be one (or more) papers on the acoustic phonetics of spontaneous speech in Spanish.

The current state: All the recordings have been trasnscribed in ELAN and force-aligned. The read passages have also now been hand-corrected. All recordings have also been syllabified using a custom Praat script. However, Jamieson can no longer be actively involved in the process of hand correction of the data.

An ideal collaborator is (1) interested in helping with the remaining hand-correction of the acoustic recordings (roughly 1.5 - 2 hours worth), (2) is either interested in one of our goals or has their own which we could all pursue once the alignments are corrected, (3) has some knowledge of statistics as it applies to analyzing acoustic phonetic data, (4) is interested in delving into some of the literature in Spanish phonetics (lots of dissertations), and (5) is literate in Spanish.

Timeline: We're kind of stuck right now (no progress for about a year), but we can devote some time to this starting in the next semester. It would be great to see results in 2023 (a talk, a paper, etc).

Bonus: I'm open to data sharing after collaboration.

2. Glottal reduction in Itunyoso Triqui

Throughout the course of my language documentation and phonetic data analysis grant, we collected about 35 hours of spontaneous speech in Itunyoso Triqui, an Otomanguean language spoken in Oaxaca, Mexico. Triqui languages are rather tonally complex and have orthogonal contrasts involving glottal consonants (/ʔ, ɦ/). While there is some description of glottalization in the language (DiCanio 2012), there is an open question as to how much lenition of glottal stops occurs. The goal would be to analyze the acoustic data to examine variation in the production of the glottalization. We are particularly interested in variation in glottalization as a function of word position (VCV vs. VC#) and contrast type (pre-glottalized sonorant vs. glottal stop). This project would tie in nicely with recent work on Hawaiian glottal stops (Davidson 2021).

Team: Myself (UB), Lisa Davidson (NYU), Richard Hatcher (postdoc, Hanyang University - former UB grad student)

The current state: All of the recordings are force aligned with a custom-built aligner for Triqui. The recordings of interest have also been hand-corrected. We have begun some analysis of variation in production of the glottalization using a script I wrote for Praat which allows users to identify glottal reduction types. We presented preliminary results from this work at Haskins Laboratories in Fall 2021. We would like a collaborator to help us analyze more of the existing data.

An ideal collaborator is (1) interested in non-modal phonation type in complex tone languages, (2) has some knowledge of the phonation literature and acoustic phonetics, (3) is familiar with running voice quality scripts in Praat (or at least scripts), (4) has some knowledge of statistics as it applies to analyzing acoustic phonetic data, (4) is interested in judging patterns of glottal reduction in field recordings, and (5) would like to get involved with work on phonetic variation in Itunyoso Triqui.

Timeline: We have not made new progress for about a year, but some of us can devote some time to this starting in the next semester. It would be great to see results in 2023 (a talk, a paper, etc).

Bonus: I'm open to data sharing after collaboration.

3. Triqui clitic phonetics study

Certain Triqui person clitics (speech act participant clitics) condition tonal changes on the right edge of the root they attach to. This is described in the literature on the language (DiCanio 2008, 2016, 2020, 2022). Consider that the 2S clitic /=ɾeʔ¹/ conditions (1) tonal raising on certain roots, e.g. /ɾa³ʔa³/ 'hand / mano' > /ɾa³ʔa⁴=ɾeʔ¹/ 'your hand', (2) leftward, low-tone spreading on others, e.g. /ka⁴ne⁴³/ 'bathed / se bañó' > /ka⁴ne¹=ɾeʔ¹/ `you bathed', and (3) no tonal change on others, e.g. /ki³ɾi¹/ `took out / sacar' > /ki³ɾi¹=ɾeʔ¹/ `you took out'. There are two research questions here. First, there is an empirical question as to what these tonal changes look like for roots containing the 9 lexical tones. Of particular interest is the observation that, in those roots where no tonal changes occur, pre-clitic lengthening may. Second, utterance-final prosodic lengthening takes place for lexical roots (DiCanio & Hatcher 2018, submitted), but the conditions on this are quite limited (almost no lengthening takes place for roots ending with coda /ʔ, ɦ/). Moreover, is prosodic lengthening limited to roots or may it also affect clitics? The study here sought to try to answer these empirical questions for Itunyoso Triqui.

Team: Myself (so far)

The current state: This has been on hold for 4 years now. The recordings that were collected alongside this data has been analyzed (DiCanio & Hatcher 2018, submitted). The relevant stimuli were recorded in 2018, consisting of 224 trials with target words in clitic and non-clitic conditions, in both utterance-final and non-final position, repeated 5 times per speaker, with 10 speakers (11,200 sentences). This data has not yet been transcribed or segmented in Praat, though all the stimuli and their (random) order of presentation are saved in an Excel file, so transcription should be relatively straightforward.

An ideal collaborator is (1) interested in tone production and the phonetics of tone sandhi, (2) has some knowledge of acoustic phonetics and Praat, (3) has some knowledge of statistics as it applies to analyzing acoustic phonetic data, and (4) is interested doing speech segmentation work with this data.

Timeline: No work has taken place on this since the recordings were made. It's a big project given the amount of data and speakers. So, it's completely open. I imagine an analysis of the data alongside segmentation would take at least several months with a few researchers.

Bonus: I'm open to data sharing after collaboration.

Sunday, November 28, 2021

On the lexicalization of Triqui compounds

In the process of doing historical reconstruction, one is often led to believe that the conditioning factors leading to sound change are specific to a phonotactic context, i.e. one finds /k/ > [tʃ]/_i and perhaps only in onsets. Yet, there are several variable patterns in Itunyoso Triqui compounds that suggest that stress-induced simplification might also cause unique types of sound changes.

As a bit of background, it is important to know that Itunyoso Triqui words are mostly polysyllabic. About 70% of the lexicon is disyllabic or trisyllabic roots. Though, monosyllabic roots have higher token frequency in running speech (as per Zipf's law). The final syllable of these morphemes has special status. It is phonetically longer than non-final syllables and most of the contrasts occur on the final syllable (cf. DiCanio 2010).

What occurs in the final syllable in a polysyllabic word?
a. Every possible tone: /1, 2, 3, 4, (4)5, 13, 32, 43, 31/.
b. All consonants: /p, t, k, kʷ, tʃ, ʈʂ, ʔ, m, n, ⁿd, ᵑɡ, ᵑɡʷ, ɾ, β, s, l, j, ˀm, ˀn, ˀⁿd, ˀᵑɡ, ˀɾ, ˀβ, ˀl, ˀj/.
c. All vowels: /i, e, a, o, u, ĩ, ã, ũ/
d. Coda consonants /ʔ, ɦ/ (though all syllables are otherwise open).

What occurs in the non-final syllable of a polysyllabic word?
a. Only level tones /1, 2, 3, 4/, but the caveat is that tones /1/ and /4/ are not truly contrastive here - they only occur due to leftward tonal spreading onto the non-final syllable (cf. DiCanio, Martínez Cruz, and Martínez Cruz 2020). So, really it's just tone /2/ and tone /3/ that contrast here.
b. Only simple consonants (no prenasalized stops, no glottalized sonorants, no glottal stop): /p, t, k, kʷ, tʃ, ʈʂ, m, n, ɾ, β, s, l, j/.
c. Only oral vowels /i, e, a, o, u/ and mid vowels only occur if they also occur in the final syllable. So, really just /i, a, u/ are contrastive here.
d. All syllables are open.

So, we have many asymmetries in which sounds occur by syllable. We can call this stress or prominence or whatever term you wish, but the patterns above occur mostly without exception.

There is an additional observation too - a contrast between singletons and geminates only occurs in monosyllabic words, e.g. ta³ 'this' vs. tta³ 'field', nũ³² 'be inside' vs. nnũ³² 'epazote.' This contrast does not occur in polysyllabic words (cf. DiCanio 2010, 2012).

Now that we know about the stress-based consonant patterns, what does this mean for sound change? Consider that one very common type of word formation process in Triqui (and in Otomanguean languages more generally) is compounding. When each morpheme of a compound retains some of its phonological identity as a distinct root, there may be no sound changes. Yet, if the compound begins to lexicalize, the restrictions on phonological distributions above start to cause rather robust changes. Let's look at some examples.

1. The Triqui word 'de veras/truly' is a reduplicated form yya¹³ yya¹³, literally meaning 'true true.' Most adverbs in the language appear post-verbally before personal clitics (V+ADV+SUBJ order), so clitic morphophonology applies to them. The 1P clitic involves a > o, glottal stop insertion, and tone 4. Yet, with this word you get yyo¹³ yyoʔ⁴, with vowel harmony. Then with lexicalization, you can't get a contour tone on a non-final syllable and no geminates are permitted in polysyllabic words, so it's yo³yoʔ⁴.

2. The Triqui word 'each' is a reduplicated compound  ᵑɡo² ˀᵑɡo² 'one-one.' Yet, it is often pronounced as [ko²ˀᵑɡo²] in running speech. You lose the prenasalized stop in the penultimate syllable as per the patterns above.

3. The Triqui word 'soda/soft drink' is a compound nne³² tsiʔ¹ 'water + sweet.' Yet, it is often pronounced as [ne³siʔ¹]. You lose the contour tone and the gemination on the penultimate syllable because neither are permitted there.

4. The Triqui word for 'bread' is a historical compound /ʈʂːa³ ʈʂũɦ⁵/, lit. tortilla+horno (tortilla del horno). It is pronounced as [ʈʂa³ʈʂũɦ⁵] by older speakers but as [tʃa³tʃũɦ⁵] by younger speakers (who have mostly merged the retroflex and post-alveolar affricates). The historical gemination of 'tortilla' has been lost here.

5. The Triqui word for 'rifle' is [ʈʂu³ʈʂi³aʔ³], but the roots are ʈʂːũ³ 'wood' + ʈʂi³aʔ³  'to shoot.' In the compound, we see observe degemination (because it's in a disyllabic word now) and loss of the vowel nasalization too. And as mentioned above, many speakers now produce the retroflex series as post-alveolar.

I am mentioning this examples here because, as per Rensch (1976), it is extremely difficult to reconstruct non-final syllables in many Otomanguean languages. It may be that (a) processes of reduction in unstressed syllables and (b) a general pattern of distributional asymmetries in the phonological inventories will help to reconstruct them. The [k] you observe that comes from a reduced [ᵑɡ] (as in #2 above) might only occur in a handful of words because reduplicated compounds are relatively uncommon in Otomanguean languages.

In sum, neutralization due to stress-based distributional asymmetries can lead to superficial similarities between words, e.g. the /n/ onset in #3 'soda' is from */nn/ while a different word like /ne³tã³/ 'ejote/green bean' is probably related to Mixtec words like /ñityì/ (SJC Mixtec) where onset /n/ has a */ny/ reflex. 

Saturday, April 17, 2021

Linguistic tidbit

Some linguists obsessed with a theory of all
forget there are others who need to think small,
of how to inflect a verb that's perfective
or reasons why 'so' isn't just a connective.

And others might glean an elaborate fact
from language in use as a societal act
with agents whose motives are far from mundane
but an essence of self quite hard to contain.

There's meaning and purpose in digging quite deep
at cognates in history whose meaning we keep,
And time to get lost in the tangle of weeds,
a morphological context and the pattern it feeds.

And many a language, pattern, and word
hold secrets and histories that we've never heard
Of just how a people connect with the past
or just how a pattern changes so fast.

So before you admonish the detail-obsessed
those whose minutiae is seldomly blessed
with an appearance in Nature or Science and so
appears to be findings you don't need to know.

An ego obese with a theory so tangled
Can deflate in an instant when new data is wrangled.
Consider that details, however so small
are the basis of asking the biggest questions of all.