Sunday, December 10, 2023

On the generalization of linguistic discovery

Discovery is a crucial part of the evolution of most academic disciplines that take a scientific approach towards understanding the world. New empirical evidence of a phenomenon leads researchers to re-examine old perceptions they had. Or rather, as Kuhn (1962) would argue, those with the old perceptions of the world eventually die or fade away while those who only have these newer perceptions mature.

But how do we generalize discovery? There are certainly many disciplines where discovery is generalizable. Findings in many of the physical sciences and mathematics are truths that will continue to be true forever. Discover a solution to a long-held mathemetical problem and it will remain true from now on. 

In the social and cognitive sciences though, discoveries seem somewhat murkier. Where they relate to biological, neurobiological, biophysical principles, the discoveries seem more generalizable. In my main sub-discipline, phonetics, there are clear physical relationships between what a person does with their speech articulators and what this produces in an acoustic signal, for instance. This is true across languages because all humans have similar oral and laryngeal anatomy. Yet, since speakers can massively vary just how they produce similar speech sounds, generalization is challenging here too.

Where they do not relate to biological or physical principles, behavioral and linguistic discoveries are usually observational findings restricted to a certain type of population. Generalization here necessarily needs to proceed to multiple experiments or studies with different types of populations. From a linguists' perspective (and I can only speak as a linguist here), that necessarily means that discoveries need more languages. 

There's a danger here that comes out of a kind of science-envy with behavioral and linguistic sciences. Though some of the methods in the social/behavioral sciences have become more scientifically rigorous (mostly in relation to statistical testing and modeling), the findings are not magically more generalizable to new populations than they were in the past. Discovering that college-aged speakers of English prefer certain syntactic structures over others does not mean anything about any other language unless subsequent research is undertaken. It might make predictions about patterns in other languages, but predictions are not generalizations.

Can we ever generalize about "Language"? What if we can't?

There are a lot of half-truths that linguists hold about "Language" that arise from a casual extension of findings in a few languages. Demonstrate that some linguistic phenomenon occurs in American English, Spanish, and German and linguists will believe it is a universal or "strong tendency" without a very clear criterion for what "universal" or "strong tendency" would mean.

Why be so careful with formal and statistical methods but so uncareful regarding the scientific bread-and-butter of hypothesis testing? The answer seems to lie in a kind of all-or-nothing perspective about where linguistic discoveries have value to a discipline. Linguists either believe linguistic patterns demonstrate unique characteristics of individual languages or populations; -or- they are universal patterns reflecting something deep about human evolution or murkier things like universal grammar. The field tends to narrowly merit the latter type of work since it is smells like a generalization.

This all-or-nothing approach means that we often come up empty-handed when we wish to talk about the relevance of our findings to the discipline - we're delving deeply into specific languages with an empirical or historical goal or we're looking broadly (and more superficially) at patterns in a larger number of languages. What might exist in the middle? We're a small discipline examining a huge topic with a gigantic amount of variation. We can't do it all.

I think one future path for the discipline is to take a note from the quantitative revolution that has occurred over the past 20-25 years in the discipline. The more we examine phenomena that we once believed to be discrete (x occurs in context A, but y occurs in context B), the more we discover that these are strong statistical tendencies instead. And the reason for this is that linguistic phenomena are behavioral. They are not the formal mathematical proofs that remain true forever after being solved. We just keep wanting to commit our error of generalization because of this science envy.

Might there not be any true linguistic universals? Maybe there are but we can never be typologically-balanced enough to prove anything more than fairly superficial patterns. Maybe there aren't any at all and this is ok. Languages are endlessly fascinating and we can still demonstrate how many languages work along statistical lines. The idea that there is massive inter-language variation and that this is structured to occur in certain types of languages necessarily means that we can look at types of languages to construct complex cross-linguistic hypotheses. To provide a concrete example, do speakers of fusional languages or those with non-concatentive morphology store words differently than speakers of isolating languages? This is an interesting question but it does not require a model of what must be universal. It just requires experiments and cross-linguistic research.

This is a blog post, so take my musings with a grain of salt. I don't have the answers to my own subdiscipline, let alone all of linguistics. I think though that we need to be more careful distinguishing between the things that we believe are proven/demonstrated and the things that are demonstrated typological patterns or universals. 

No comments:

Post a Comment