Sunday, December 10, 2023

On the generalization of linguistic discovery

Discovery is a crucial part of the evolution of most academic disciplines that take a scientific approach towards understanding the world. New empirical evidence of a phenomenon leads researchers to re-examine old perceptions they had. Or rather, as Kuhn (1962) would argue, those with the old perceptions of the world eventually die or fade away while those who only have these newer perceptions mature.

But how do we generalize discovery? There are certainly many disciplines where discovery is generalizable. Findings in many of the physical sciences and mathematics are truths that will continue to be true forever. Discover a solution to a long-held mathemetical problem and it will remain true from now on. 

In the social and cognitive sciences though, discoveries seem somewhat murkier. Where they relate to biological, neurobiological, biophysical principles, the discoveries seem more generalizable. In my main sub-discipline, phonetics, there are clear physical relationships between what a person does with their speech articulators and what this produces in an acoustic signal, for instance. This is true across languages because all humans have similar oral and laryngeal anatomy. Yet, since speakers can massively vary just how they produce similar speech sounds, generalization is challenging here too.

Where they do not relate to biological or physical principles, behavioral and linguistic discoveries are usually observational findings restricted to a certain type of population. Generalization here necessarily needs to proceed to multiple experiments or studies with different types of populations. From a linguists' perspective (and I can only speak as a linguist here), that necessarily means that discoveries need more languages. 

There's a danger here that comes out of a kind of science-envy with behavioral and linguistic sciences. Though some of the methods in the social/behavioral sciences have become more scientifically rigorous (mostly in relation to statistical testing and modeling), the findings are not magically more generalizable to new populations than they were in the past. Discovering that college-aged speakers of English prefer certain syntactic structures over others does not mean anything about any other language unless subsequent research is undertaken. It might make predictions about patterns in other languages, but predictions are not generalizations.

Can we ever generalize about "Language"? What if we can't?

There are a lot of half-truths that linguists hold about "Language" that arise from a casual extension of findings in a few languages. Demonstrate that some linguistic phenomenon occurs in American English, Spanish, and German and linguists will believe it is a universal or "strong tendency" without a very clear criterion for what "universal" or "strong tendency" would mean.

Why be so careful with formal and statistical methods but so uncareful regarding the scientific bread-and-butter of hypothesis testing? The answer seems to lie in a kind of all-or-nothing perspective about where linguistic discoveries have value to a discipline. Linguists either believe linguistic patterns demonstrate unique characteristics of individual languages or populations; -or- they are universal patterns reflecting something deep about human evolution or murkier things like universal grammar. The field tends to narrowly merit the latter type of work since it is smells like a generalization.

This all-or-nothing approach means that we often come up empty-handed when we wish to talk about the relevance of our findings to the discipline - we're delving deeply into specific languages with an empirical or historical goal or we're looking broadly (and more superficially) at patterns in a larger number of languages. What might exist in the middle? We're a small discipline examining a huge topic with a gigantic amount of variation. We can't do it all.

I think one future path for the discipline is to take a note from the quantitative revolution that has occurred over the past 20-25 years in the discipline. The more we examine phenomena that we once believed to be discrete (x occurs in context A, but y occurs in context B), the more we discover that these are strong statistical tendencies instead. And the reason for this is that linguistic phenomena are behavioral. They are not the formal mathematical proofs that remain true forever after being solved. We just keep wanting to commit our error of generalization because of this science envy.

Might there not be any true linguistic universals? Maybe there are but we can never be typologically-balanced enough to prove anything more than fairly superficial patterns. Maybe there aren't any at all and this is ok. Languages are endlessly fascinating and we can still demonstrate how many languages work along statistical lines. The idea that there is massive inter-language variation and that this is structured to occur in certain types of languages necessarily means that we can look at types of languages to construct complex cross-linguistic hypotheses. To provide a concrete example, do speakers of fusional languages or those with non-concatentive morphology store words differently than speakers of isolating languages? This is an interesting question but it does not require a model of what must be universal. It just requires experiments and cross-linguistic research.

This is a blog post, so take my musings with a grain of salt. I don't have the answers to my own subdiscipline, let alone all of linguistics. I think though that we need to be more careful distinguishing between the things that we believe are proven/demonstrated and the things that are demonstrated typological patterns or universals. 

Wednesday, November 1, 2023

Issues in choosing a statistical model in phonetics

What's the bar for deciding to use a new statistical model in research? It seems like often enough within linguistics or speech science, one chooses a model based on what is à la mode. That frequently translates into increasing complexity.

Is it always good to have a more complex model? No. It might reveal more intricate interactions in the data. It might also model interactions between terms better than competing models, usually by improving fit with non-linear terms (cf. GCA, GAMMS). Yet, there are missing evaluative criteria for choosing a model that end up being crucially important. 

1. Is the model easily implementable and understandable? 

If a model is easy to implement and understand, then it is easy enough for new users to emerge and for a set of standards to come about. Yet, if neither of these things are true, there is severe social cost. 

If there are a handful of researchers proposing using a new model, is there an existing infrastructure that can help with training and implementation? Usually there is not and, as a consequence, many researchers get frustrated if the field pushes a model where no infrastructure exists. The same people proposing the model will end up fielding hundreds or thousands of questions about how to use it. And nobody has time for that.

Now, why might the field (or paper reviewers, most likely) decide that everyone has to use one particularly new and popular model for one's data? Sometimes important new factors are discovered that need to be modeled. But sometimes it's just the impostor syndrome, i.e. we are only a serious field if we have increasingly more mathematically opaque models for our data. And it's easy to give a post-hoc reason to include all possible factors when our predictions are so weak.

2. Does the model enable us to generalize?

Do we actually need to model as many of the details as we can? Even models that take a fairly generic approach to avoiding overfitting can end up overfitting things like dynamics. Resultingly, researchers lose time needing to discuss details that end up being unimportant and we end up losing the ability to generalize.

I'll provide one personal example of this. In my co-authored paper on the phonetics of focus in Yoloxóchitl Mixtec, we provided statistical models for f0 dynamics alongside statistical models for midpoint f0 values. There is certainly good reason to model changes in f0, but in a language with a number of level tones (and tone levels), this type of modeling might not say much. Indeed, we found mostly the same results when we looked at f0 midpoint for many of the level tones than when we looked at dynamic trajectories for them. Including two sets of models resulted in twice as many statistical tests and twice as much reporting.

Why did we choose to do this? We favored being comprehensive over possibly missing some unknown pattern (maybe the lower level tones had some different dynamic behavior?) Given the subtlety of the resulting patterns, it's hard to say what might be important.

Nowadays, I think we would be asked to choose to use GAMS instead of the mixed effects modeling. Yet, that also results in a statistical bloat (e.g. you have to model each tone separately). The results of our research should lead us to make scientific conclusions about speech, not get lost in 101 statistical tests where we spend time analyzing our three-way interactions. 

I don't know the right answer to how the field might address this issue, but I do not believe that it has to do with reducing the purview of one's study. GAMs are great if you are looking at one pattern in one language, but they are terrible for generalizing over a language's inventory (of vowel formants, of tones, of prosodic contexts, etc). One finds either studies using GAMs for limited topics (one vowel or one context) or studies where 101 models are included to provide a comprehensive account of a language's patterns. The former are more likely in studies examining well-studied languages while the latter are more likely in exploratory analyses of languages.

The negative consequence here might be that the "clear case" for GAMs is made within the less complex pattern in a well-studied language, while no one can make heads or tails of all the analyses in the less well-studied language. I see this as just an extension of linguistic common ground as privilege. Yet, now it's done with statistics.




Sunday, January 8, 2023

The "Bender rule" in some linguistics journals in 2022

The Bender rule is the informal idea that one ought to explicitly mention the name of a language in a publication on language and linguistics. It is named after Emily Bender, a computational linguist at the University of Washington (Seattle) who has written and discussed the need to be explicit about languages that one studies. The impetus behind it is the observation that studies on English (or other commonly-studied languages) are typically understood as a default norm, while less commonly studied languages are more likely to be overtly mentioned. This contributes to a biased perspective in linguistics that only the conclusions from studies on English contribute to a general picture of Language, while similar conclusions from studies on other languages reflect language-specific phenomena and are less generalizable. A similar issue arises in work on indigenous languages that I've written about before.

People have talked about the Bender rule since 2019. I'd like to think that linguists have paid attention to what this means in academic publications since then. After all, it would be fairly simple for journal editors or editorial boards to implement a policy where languages are mentioned in titles or in abstracts. After all, people often read/skim the titles and abstracts of most publications without investing in more time to read all the details. If one were to apply the Bender rule to titles and/or abstracts (and yes, I am suggesting it), it has the additional benefit of helping your librarians organize publications better by topic language.

So, how have some popular journals fared in 2022? Are many publications mentioning the languages of study? I thought I would look at two popular journals that I am familiar with: the Journal of Memory and Language (JML), and the Journal of Phonetics (JPhon). Both journals heavily focus on experimental research. I decided to include two separate measures here: does the journal article mention the language of study in the title? and does it mention it in the abstract? I have excluded publications that reflect surveys of methodological reports, as these lack experiments and they tend not to focus on individual languages anyways.

For JML, between January 2022 - present, 43 relevant articles have been published. Of these, just 2/43 mention the language of study in the title. Within the abstracts, 8/43 articles mention the language of study. Studies that explicitly mentioned languages were those on Mandarin Chinese, ASL, and those involving bilingual populations.

For JPhon, between January 2022 - present, 40 relevant articles have been published. Of these, 18/40 mentioned the language of study in the title. Within the abstracts, 35/40 articles mention the language of study. 

Why might these numbers (and practices) might be so different across journals? Are the psycholinguistic patterns found in brains and minds in the articles in JML fundamentally different in terms of their language-specificity from studies on phonetic memory/perception, speech planning, speech coordination, and speech articulation found in JPhon? In other words, is it that only the phoneticians need worry about the Bender rule?

I think most phoneticians would probably state that a study on the articulatory and acoustic phonetics of one language is bound to be fundamentally different from a similar study on another language. Thus, there is less of an expectation that one's findings will immediately generalize to all of Language. Rather, one draws conclusions and amasses evidence for common patterns by looking across a large enough sample of languages. Existing theories are examined, tested with new data, and revised.

I don't know what psycholinguists believe here though. Perhaps it is the case that many still believe that English-focused studies in psycholinguistics are always uncovering something fundamental about Language in a way that studies in phonetics are not, despite apparent evidence to the contrary. I have to doubt that though. I know many psycholinguists and they seem to be a pretty open-minded group. For the time being, it would seem like JML is failing the Bender rule.