Wednesday, January 13, 2016

Segmenting running Mixtec speech

My research falls within two fields: fieldwork and phonetics. I am enamored with the languages that I study but also enamored with investigating the fine details found in these languages. One major area where there is overlap between fieldwork, or more specifically, documentation, and phonetics is in corpus phonetic research.

Corpus phonetics is usually considered an area of phonetics moreso than an area of corpus linguistics; the methods are phonetic methods (mostly), while corpus linguists frequently concern themselves with textual materials and not with the raw speech signal. When phoneticians want to investigate aspects of the speech signal, either from experiments or from a corpus, it is often useful to (a) have a transcription of the speech signal and (b) segment individual sounds or syllables. The former is obviously useful for the purpose of knowing what you're looking at (and being able to go back to it) and the latter is useful for any tool which automatically extracts acoustic measures from the speech signal. It is possible (and common) nowadays to write short programs that will measure aspects of these individual segments very quickly.

Segmentation is usually done in Praat, a program for viewing, analyzing, and processing acoustic recordings. A text file is saved along with the sound file with which, when both are opened together, one can view a time-aligned segmentation of words/segments in the speech signal. As part of research on my NSF grant, we are doing corpus phonetic research on both Itunyoso Triqui and Yoloxóchitl Mixtec (YM), two endangered languages spoken in Southern Mexico. Right now, we are (a) segmenting speech from YM and (b) evaluating a program we are developing which will automatically segment speech from this language. After we have improved this program, we will be able to extract phonetic data from a large corpus of over 100 hours of YM speech and answer scientific questions about both the language's phonetics and speech production more generally. This is corpus phonetics.

Yet, the process of segmentation is not without problems and it is these problems that I wish to write about here. When segmentation is done with careful speech, it is usually a fairly straighforward to segment the consonants and vowels that are produced in the speech signal. Observe Figure 1, below.

Carefully produced Triqui sentence /a3chinj5 sinj5 cha3kaj5/ [a³tʃĩh⁵ sĩh⁵ tʃa³kah⁵], 'The man asked for a pig.'

For those of you unfamiliar with segmenting spoken language, the first thing you might notice is that there are actually no pauses between the words, shown below the acoustic signal. This is as true of careful speech as connected speech. Yet, here, the boundaries between vowels and consonants here are fairly easy to spot. There is silence in the initial portions of the two affricates [tʃ], "ch", that distinguish them from adjacent vowels, silence in the initial portion of the stop [k], and noise in the production of the fricative [s]. The only thing here that might be difficult to parse is the aspiration that appears at the end of certain vowels (transcribed with "j" here, following a Spanish convention). This is left unparsed.

As it turns out, parsing Mixtec speech is much harder than this. The language doesn't have aspirated vowels like Triqui does and the consonant inventory, as a whole, is much smaller. However, Mixtec is inordinately fast (approximately 7-9 syllables/second in running speech) and most of the consonants that would otherwise be easy to segment, e.g. /s, ʃ, t, tʃ, k, kw/, undergo lenition. This means that they can be realized as [z, ɦ, ð, j, ɣ, ɣw], respectively. All of these realizations are voiced and make parsing substantially more difficult. An example is given below.

Running Mixtec speech; sentence /tan3 ka4chi2 sa3ba3=na2 ndi4.../ [tã³ ka⁴tʃi² sa³βa³=na² ⁿdi⁴] 'Then they said half of them, and...'
The initial [t] here is easy to spot - it involves silence and it is released into the vowel. However, the following /k/ in the word /ka4chi2/ is difficult to discern in spectrogram (this is actually a fairly clear example), because it is produced as a frictionless continuant rather than a stop. The same is true of /tʃ/ (labelled "JH"), which is produced as frictionless continuant ([ʒ]) rather than as an affricate. The /s/ above is produced as [z] and the "b" as [w], a bilabial glide. In this latter case, it is extremely difficult to locate a clear set of boundaries between the adjacent vowels [a] and the bilabial glide. However, one hears the glide in the acoustic signal and it appears that some weakening of F3 amplitude corresponds to this percept.

The net result of this is a speech signal that rarely includes a loss of voicing and that is frequently difficult to examine. Is the "w" above deleted? If it is deleted, is this now a long vowel? These are difficult questions to answer just from the acoustic signal. This fusion of speech events is not specific to Mixtec either; we know that the speech involves overlapping gestures produced for different consonant and vowel sounds. Thus, things always overlap to a certain degree.

Yet, the patterns of lenition above are still rather notable. Perhaps the voicing of the consonants here is helpful to listeners; as there is no contrast in voicing in the language, voicing the consonants allows tone to be carried on consonants as well as adjacent vowels. Since tone is so important in Mixtec as a marker of aspect and person, such a possibility is a plausible hypothesis, but one that remains to be tested. For the time being, parsing Mixtec is hard.

Tuesday, July 28, 2015

The hard business of trying to specify allomorphs in FLEx

While a substantial part of my research is on the phonetics and phonology of different Otomanguean languages, I have been working on the morphophonology of the Itunyoso Triqui language for many years. Ever since I first started my work on the language, I was fascinated by the many ways in which a single verb root, for instance, could have a multitude of forms when one includes aspectual prefixes and personal enclitics.

One of the most notable things about Triqui morphology is just how much tone plays a role in marking different distinctions. Take the verb /a³chi³/ 'to peel', for example. There are four possible tonal shapes of stems, shown below (note "j" is /h/, "h" is /ʔ/, and a post-vocalic "n" in the final syllable marks contrastive vowel nasality):

Table 1: Stem shapes of verb /a³chi³/ 'to peel.'
This particular paradigm displays some common patterns in Triqui morphology. First, the 1st person singular is marked by a change in tone (to /5/) and involves the insertion of a coda "j" /h/. Second, the 2nd person singular is marked by tone raising to /4/ before the clitic. Third, the perfective prefix on vowel-initial stems is just /k-/. Fourth, the potential prefix involves prefixation of /k-/ and a change of tone on the initial syllable of the root. 

The result of these processes is five possible stem shapes: /a³chi³, a³chij⁵, a³chi⁴, a²chij⁵, a²chi³/, marked in bold above. Each of these morphological processes can be described well enough. However, things start to get rather messy when we wish to include additional verbs. Note the verb /a³chinj⁵/ 'to request' below.

Table 2: Stem shapes of verb /a³chinj⁵/ 'to request.'
We notice different patterns here. Instead of inserting a coda "j" /h/ to mark first person, we delete it from the root and change tone /5/ to /43/. Since the verb stem already has a high final stem tone, we do not observe any tone raising before the 2S clitic /=reh¹/. However, the form of the potential is rather different. Like in the habitual or unmarked form of the verb, we find that the coda "j" /h/ is deleted, but the entire stem changes its tone to /2/. This change is not particular to the 1S either - it occurs with all other persons in the potential, as the example with the 3SM clitic demonstrates. As a result of these processes, we have four possible stem shapes for the verb in Table 2: /a³chinj⁵, a³chin⁴³, a²chin², a²chinj²/.

I won't begin to provide a full analysis of the tonal morphology in Trique here (but see DiCanio, forthcoming). Rather, I wish to focus on two particular patterns and to discuss how they might be analyzed from a practical point of view. The first pattern is the marking of the 1S. This involves either the insertion of a coda "j" if it is not present on the stem or its deletion if it is present. Such a process is called a morphological reversal or exchange rule (see Inkelas, 2014). Tonal changes co-occur with this process for verbs with upper register tones (DiCanio, forthcoming), but we will not focus on these here.

The second pattern involves the way in which the potential aspect is marked. For certain verbs, it is marked by a change to tone /2/ on the syllable to which the prefix is attached, as in Table 1. On other verbs, it is marked by a change to tone /2/ on every syllable of the stem, as in Table 2. In such cases, the 1st person clitic no longer involves a tone change since the tone on the stem is now /2/, which belongs to the lower register. (Incidentally, one might describe this as a case of morphological opacity, where stage 1 prefixal/aspectual morphology bleeds the conditions for the application of clitic tone raising.)

At least segmentally, the 1S clitic is easy enough to characterize, though how might one go about marking such forms in a digital lexicon/dictionary like FLEx? One procedure might be to mark each and every 1S form, e.g. include /a³chij⁵/ 'peel.1S' as a variant of /a³chi³/ 'peel.' While certain of the morphological patterns are motivated by phonological well-formedness constraints (DiCanio, forthcoming), listing the variants in a table or paradigm as above provides a useful framework for describing the morphological patterns within the Triqui lexicon. 

This "listing" approach is the one that I currently use. However, doing this is rather time-consuming, as all words in the Triqui lexicon undergo this very regular alternation (though the tonal processes are rather complex). Doing this also loses the broader generalization of the rule. Moreover, there is currently no neat way of including paradigms within FLEx; one must specify additional forms as variants or allomorphs derived via a rule.

Another approach might be to create a phonological rule within FLEx's phonological grammar. However, the only available way to encode such rules is via a classical rewrite rule. This would produce rules of the form: Vh > V /_# ; and V > Vh /_#. Yet, there is no way to connect this particular rule with the set of morphological processes that it affects. It is an alternation that is primarily used for marking the 1st person singular (though similar alternations also mark previously-mentioned 3rd person discourse referents and derive nominal forms from quantifiers).


The same possibilities seem to be relevant for the potential aspect marking. It is either specified in a paradigm or it can be derived via a rule. However, a new problem presents itself when one considers the latter possibility. For those verbs, as in Table 2, which undergo an entire stem change to tone /2/ with the potential aspect, what is the phonological environment for a rewrite rule? It is the entire word's tonal melody. FLEx currently provides no way of separating the stem's tonal shape from the stem itself as one might do with an autosegmental representation. Thus, FLEx is unable to make sense of a string like /ka²chin²/ 'request.POT.1S.' when it comes to morphological parsing.

This problem is compounded by the nature of Triqui morphology when one considers the interaction between the potential aspect and 1S marking mentioned above. If there are a specific set of rewrite rules for the 1S clitic, one must specify that the tonal part of the alternation does not apply if the stem has undergone a change to the potential aspect. I currently know of no solution as to how one might resolve these issues within a FLEx lexicon.

References:

DiCanio, C. (forthcoming) Tonal classes in Itunyoso Triqui person morphology, in Tone and Inflection, Empirical Approaches to Language Typology series, Mouton de Gruyter, Palancar, Enrique and Léonard, Jean-Léo (eds).

Inkelas, Sharon (2014) The interplay of Morphology and Phonology. Oxford Surveys in Syntax and Morphology. Oxford, UK.

Friday, July 24, 2015

The healthy and unhealthy vocal fries

There has been much discussion in the news media lately about the phenomenon known as "vocal fry" and its use among English-speaking women in the United States. Vocal fry refers to the irregular vibration of one's vocal folds and it is normally produced with low pitch. In an interview with Terry Gross, Susan Sankin, a speech-language pathologist stated that vocal fry is harmful to one's vocal folds. In a follow-up piece on 7/23/15 on NPR, she maintains this view, stating
...I have heard ENTs say that it can cause damage. And for a lot of the languages where it's a habitual pattern - as you develop from a young age, that's how you're training and using your vocal cords. And I think when you start to fall into that pattern later on, I think that it can cause some damage. Again, I'm not a doctor, so I can't say that I've looked at people's vocal cords and I've seen it, but I have heard ENTs say that they do notice that it can cause damage. And sometimes the jury is out on that as well.
Just what is behind this notion that vocal fry may be damaging for one's vocal folds? After all, what we're calling "vocal fry" is used in many languages to contrast meaning among words, just like one might contrast the words 'heed' and 'hid' by their vowel sounds. It is also ubiquitous throughout the languages of the world to mark boundaries between phrases. How can something that is so common be considered a vocal pathology?
To answer this question, it's necessary to first make a distinction between speech articulation and speech acoustics. Speech articulation involves what you do in your oral cavity to produce speech sounds. Speech acoustics involves what sounds you hear that convey a linguistic message. Phonetics involves the study of both these things and phoneticians are interested in understanding how certain articulations produce certain acoustic characteristics. One can more easily investigate this relationship for sounds with un-hidden articulations. For instance, the 'p', as in 'pan', is made with the lips. One can see them close when this sound is produced and observe silence in the acoustic signal while one's lips remain closed. 
The same thing is not true for the vocal folds though. When it comes to the vocal folds, it's often a rather messy business to investigate what they are actually doing. They're quite small (just about 1 - 2.5 cm in length, depending on one's sex) and taking a video recording of them moving during speech involves inserting a small camera attached to a wire through one's nostrils to hang near the upper portion of one's pharynx (throat) and peer downward. As you might imagine, many people object to having foreign objects inserted into their noses.
One way around this is to just look at the acoustic signal and interpret what the configuration of the vocal folds must be. People don't object nearly as much to being recorded as to having wires inserted into their noses. Moreover, plenty of other articulations have consistent acoustic consequences. For instance, lowering one's tongue and jaw during speech changes the acoustic resonances of the oral cavity in a rather consistent manner. So, the theory goes, one can rely on the acoustics of the speech signal to tell us what the speech articulators are doing. So far, so good.
While this method is fairly robust, there's something problematic about it with the vocal folds. What is called "vocal fry" involves irregular vibration of the vocal folds (see below, taken from a previous post). In the figure here, one notices the irregular vocal fold vibrations on the right. Each glottal pulse is individually stronger (has higher amplitude) but the timing between each is erratic. To quote a well-known linguist, this voice quality sounds like "a stick being dragged along a fence."

But, to return to our main interest, what is the articulation that gives rise to this acoustic pattern. The term "vocal fry" refers not to the articulatory configuration, but to one's perception of the acoustics. As it turns out, there are many things that can produce the type of vocal fold vibration that we observe above. Much like a wheel that is fastened too tightly, if one constricts the larynx (where the vocal folds sit), it is harder for the vocal folds to vibrate regularly. Since the vibration of the vocal folds requires consistent airflow from the lungs, if one runs out of breath at the end of a sentence, the vocal folds also do not vibrate so regularly.
For people who have developed vocal fold nodules, brought on by laryngeal cancer or other pathologies, the vocal folds also do not vibrate so regularlyClearly, the same acoustic pattern matches a number of different articulatory configurations. Yet, all of this irregular vibration is described with a cover term, "vocal fry." 
So, if one were to observe vocal fry in different speakers, what could one conclude? While there is independent evidence for the health of speakers in a clinical setting, the notion that vocal fry is pathological is a case of the symptom getting confused with the cause. Since we rely on the acoustic signal to tell us about articulation, we associate the presence of a certain characteristic of the acoustic signal with an articulatory pathology. In other words, vocal fry must be pathological, right? No, in fact this is a classical logical error (affirming the consequent).
Research on the production of voice quality across languages has shown that speakers use a number of different configurations to constrict the larynx and produce what is known as "vocal fry." Acoustically, and only acoustically, these might appear similar to pathologies that produce irregular vibration of the vocal folds. Yet, the cause of the irregular vibration is different. The articulation of the vocal folds is difficult to examine. So, researchers have assumed aspects of their configuration on the basis of what the acoustic signal says. Yet, this only works insofar as there is not a one-to-many association between the acoustic signal and the articulatory mechanism involved. 
The problem is, we do have a many-to-one relationship when it comes to voice quality. Thus, one can not just infer on the basis of one part of the acoustic signal what articulation is involved. Speech-language pathologists, like Susan Sankin, might heed this before they label "vocal fry" as damaging to one's vocal folds. It's not the voice quality that is damaging, but this misunderstanding of cause and effect.
What does this mean for the young women whose vocal fry is singled out as being unhealthy and damaging for their careers? It's the attitudes and knowledge about women's voices that needs to change, not the voices themselves.

Monday, July 6, 2015

Being cooperative is not evidence of confirmation bias

A few days ago, the New York Times posted a piece which argued that confirmation bias is a common failure of human thinking. Confirmation bias is the idea that one tends to interpret new facts in terms of one's existing preconceptions.

The author of the study, David Leonhardt, discusses confirmation bias by way of a mathematical example where the reader is asked to guess the rule determining the sequence "2, 4, 8" by testing additional examples. Thus, one can type in sequences like "4, 8, 16" or "10, 95, 387" and see if they follow the same rule as the sequence "2, 4, 8." If one enters a sequence like "4, 8, 16" into the boxes in Leonhardt's article, one receives a confirmation that it also follows the same rule as that which produced "2, 4, 8."

So, just what is this rule? Leonhardt states:

"...most people start off with the incorrect assumption that if we’re asking them to solve a problem, it must be a somewhat tricky problem. They come up with a theory for what the answer is, like: Each number is double the previous number."

The true rule, Leonhardt explains, is not that each number is double the previous number, but rather that each subsequent value is greater than the preceding value. That people assume the former rule is taken as evidence for confirmation bias. As stated, "Not only are people more likely to believe information that fits their pre-existing beliefs, but they’re also more likely to go looking for such information."

However, it strikes me that there are other, rather sensible reasons that people will assume the former rule that Leonhardt does not consider. One is found among the the well-known maxims of conversation, created by the famous philosopher of language, Paul Grice. These maxims, well-known to any introductory linguistics student, state that conversation is guided by constraints of quantity, quality, relation, and manner. As a default, we assume that speakers will give only enough information, be truthful with it, be relevant to the topic, and be clear, respectively. When speakers deviate from these expectations, we are annoyed with the conversation. In such cases, we might state "He was long-winded." or "He kept going off on tangents." Our ability to follow these maxims demonstrates our cooperation within a conversation. Hence, they fall under what Grice terms the cooperative principle.

Grice's maxim of quantity states that one should not make his/her contribution more informative than required. Thus, when someone asks for directions to a particular room in a building, one does not expect the speaker to provide instructions on how to open a door nor the history of certain rooms that the listener will likely pass. If additional details are provided, our interpretation is that they must somehow be relevant (incidentally, another maxim). So, just what might Grice have to do with Leonhardt's example here?

Consider the initial example that he provides: 2, 4, 8. The reader's expectation from this example is that it is as informative as necessary. If the author chose a sequence where each subsequent value is double the previous value, then this must be relevant to the question. After all, the expectation is that the author has provided this information and it must be important. When we hear that the rule is, "Haha!", not what we assumed, our reaction is one of surprise. Why provide this particular example if any random sequence, like 1, 5/3, 9, would have sufficed?

Providing too much information in this way would seem to be a case of conversational deception. The listeners/readers are led astray believing that the author was following the maxim of quantity and relevance when, in fact, the example was intended to be overly informative. So, are 78% of those who participate in this particular task guilty of confirmation bias? Perhaps some are, but Leonhardt would be wise to consider that most people are guided not by the expectation that the problem is tricky, but rather by the expectation that the author's example does not provide too much information. An entirely different outcome would be produced if the example were as simple as "1, 2, 3."

*Incidentally, the rule "Each number is double the previous number." is just a more specific case of the rule "Each number is greater than the previous number." The first entails the second.


Wednesday, March 4, 2015

More on Triqui reduplication

Back in December, I posted a quick summary of this interesting pattern on Triqui verbs to Facebook. The post read as follows:

===
So, I discovered a new pattern today. There is a process of partial reduplication on Triqui verbs that indicates something like an emphatic, but non-specific third person. The reduplicant is partially fixed, always containing tone /4/ and a coda /ʔ/, but the vowel identity (including nasality) is taken from the stem. The stem has to be marked with the non-specific 3rd person (which is marked by tone/glottal deletion).

/βĩ³/ 'to be/exist' > bĩ³-ĩʔ⁴
/a³βi³²/ 'to leave' > a³βi³-iʔ⁴
/tʃeh⁴/ 'to walk' > tʃe³-eʔ⁴
/a³tah³/ 'to say' > a³ta³-aʔ⁴   
===

Well, as it turns out, the pattern doesn't quite just signify an emphatic. It appears to be a way of marking a wish/blessing on someone, e.g. "May they do X." Thus, to state "May they eat" or "Que comerían", the form is /tʃa²=aʔ⁴/ and "May they run" would be /ku²nã²=ãʔ⁴/. The interesting thing about these new forms is that both of the verbs occur in the potential aspect. The unmarked, progressive/habitual form for 'run' is /u⁴nãh³/ and the perfective form for 'eat' is /tʃa⁴³/. So, it is not clear if this reading of these forms just derives from the fact that they are in the potential aspect. This reading also appears to differ from the hortative, e.g. /tʃa²=yũʔ¹/ 'Let's eat!' (Incidentally, this is different from the regular /tʃa²=ũh⁴/ 'We will eat ~ We are eating.')

So, the mysteries now are: (a) Is there some semantic consistency when reduplication is used on verbs with varying aspects? (b) Might we be able to tease apart just what this means with unmarked aspect?

On a related note, one of the things I've come to realize about Triqui clitic morphology in doing documentation is that there is a very productive system for marking vocatives (or perhaps just 'terms of address') in the language. I used to think that only certain words had distinct, lexicalized vocative forms, e.g. /nni³/ 'mother' vs. /nnãh⁴³/ 'mother!', but it turns out that most names, and even clitics can undergo this process. This is done by a process that looks eerily like the reduplicative system above. You take the word, e.g. /tʃu³be³/ 'dog' and just change it's tone to /4/ and add a coda /ʔ/, e.g. /tʃu⁴beʔ⁴/ as in something like 'Hey, dog, how are you?'. One can do this with proper names too, e.g. 'Enrique' is /li⁴ki⁴³/ normally, but would be /li⁴kiʔ⁴/ when referred to directly. I have a feeling there is some relation between emphatic forms (maybe this is the good term for it after all?) and direct terms of address, but just how the expression of a wish/blessing fits in here is a mystery.

Incidentally, I learned the above form for 'dog' through a story about a magical dog who learned how to grind corn and make tortillas for a man who went to farm. When the man came into the house, he had some conversations with this dog and had to address it directly.

Saturday, January 3, 2015

Ode to 2014

Here's to that email you never answered,
the message you never got,
still flagged and beeping,
haunting in being unknown
to all but the sender.
Nagging and distracting
like an unscratched lotto ticket
or a joke, unheard because you arrived too late.
"You had to be there"
but you were unavailable
as the message sits,
bored.

Whatever emergency we are waiting for
we wait too long,
with false flashing yanking us
to action-reaction,
seeing and skimming and sitting
and revealing to us no less than we expect;
minutiae.

The attack move to click
to see or hear what might be there
gives only a hasty rush
as we unfurl objects of glitter
to finally know no more than we did before.
Must a hamster run in its wheel
just because it's there?

Monday, August 4, 2014

If airlines were restaurants...

If airlines were restaurants...

We'd stand in line to get in,
but we could pay more to be put at the front of the line.
We'd be seated at a table of screaming children,
but we could pay more to get our own table.
We could also pay more to sit near a window
or in a "premium" seat near the fire escape.
But, just think, if we called in ahead of time to reserve a seat,
we wouldn't deal with such things.
...but we'd pay more to even do that.

We might wait for hours for our food
or suffer the injustice of being told
"no food today"
And we might have to go wait in another line.
Because it's not the cook's fault
that it's raining.

We could get a frequent diner card
and come in with a coupon
to be told that the terms had changed
because the money we once spent
at this fine dining establishment
was in the past.

If airlines were restaurants,
we might not visit them
or just decide to occasionally
visit the only other restaurant
in our town.

We might just make our own lunches
or forage for food in the streets
where the restaurants have ensured
a slow food movement.

Just don't complain to the chef or the wait staff.
You might not be allowed to come back.