Category Archives: Historical

Talk slides

This week I’ve been giving three talks in the Department of Linguistics at UC Berkeley. It’s been a very stimulating week, with lots of good feedback, brainstorming for new directions, and problem troubleshooting. I’ve also met with many of the graduate students (including two who were my students as undergraduates and who worked on the data that led to some of the results presented in the talks) to hear about their work.

I’m posting slides for two of the talks here. On Monday, I gave an overview of the Pama-Nyungan project and talked about how the tree was created, what it implies to take an ‘evolutionary’ view of language (in this framework), and some off-shoots of the project (MondayPhylogenetics powerpoint). On Wednesday, I talked about one further extension, using the tree to investigate the evolution of colo(u)r terminology. (WednesdayColor powerpoint.)

The other talk, on Tuesday, was on using my Bardi corpus to study life-span changes and variation. At the end of the talk when I was chatting, one of the sociolinguists expressed surprise that I hadn’t anonymized the identities of the Bardi speakers. I hadn’t thought about it. As a fieldworker working on Bardi, I did talk about this general point with the people I worked with, and all were keen to be acknowledged for their work on the language, and recognized among the key custodians of their culture. However, with the work on aging and variation, I am no longer talking directly about Bardi as a language, and more about the properties of the speech of individual speakers, and the more I thought about it, the less comfortable I am with putting that up online without talking to Bardi people about it first. It’s not just a matter of anonymizing the slides, because there are so few speakers, anyone who has any of my other Bardi work will be able to easily work out who I’m talking about. There will be a paper (soon I hope) on using forced alignment in field research, so some of the results will be in that paper, probably now along with a discussion of the ethics of same.

 

Tasmanian language data

The CHIRILA database contains materials from the Aboriginal languages of Tasmania. The excel spreadsheets contain all the records from Plomley’s (1976) Tasmanian language data, and additional spreadsheets contain explanatory data about the speakers represented in the text, the regions where data were recorded, and who the recorders were. This is the data used in Bowern (2012).

A word of warning is warranted here. This is not easy data to use; there’s a steep learning curve both for understanding the original transcription conventions, Plomley’s groupings, and the abbreviations.

See http://www.pamanyungan.net/2016/02/tasmanian-language-data/ for downloads.

Introducing CHIRILA

I am very pleased to announce that the first phase of CHIRILA (Contemporary and Historical Resources for the Indigenous Languages of Australia) has been released. This represents approximately 180,000 words from 155 different Australian languages. It is a subset of the full database (of approx 780,000 items); eventually I hope to be able to release most of the data. Currently, the first phase is that for which we have explicit permission, or which is already in the public domain.
The material is hosted at pamanyungan.net/chirila; please see the web site for more information about the contents of the database, how to download data, what formats are available, and the like. We do not provide a web interface to the data; you download it and use excel or a database program to read the files.
We hope the data will be useful to researchers, community members, and others with an interest in Australia’s Indigenous language heritage.
pamanyungan.net/chirila also includes access to the preprint of a paper describing the database (both the online and full versions).

Explorations in Pama-Nyungan Phylogenetics

I recently gave one of the plenary talks at a workshop on phylogenetic algorithms at the Lorentz Center in Leiden (Netherlands). In the talk I gave an overview of a number of recent results from my research program, including the creation of a Pama-Nyungan phylogeny and some of the research results that come from that.

The slides are available from academia.edu, from this link.

One of the results that is worth highlighting is the distribution of innovative languages within subgroups. A standard theory argues that languages innovate in the center of their ranges. The innovations diffuse across the language area over times, and therefore areas around the periphery tend to show more archaisms than those in the center. This distribution should also apply to language subgroups, assuming that language split occurs through the gradual accretion of isoglosses so that dialects split into separate languages.

If this is true, subgroup areas should show the same distributions, if not in absolute terms, but in large measure. That is, more innovative languages should lie towards the center of subgroups, and more conservative ones should lie around that edges.

It turns out that it is straightforward to plot the most innovative languages in each subgroup, according to how much basic vocabulary they have replaced. In the Chirila database, there are basic vocabulary lists coded by cognacy. To get a sense of how innovative a language is, we can simply sum, for each word in the language, the number of languages that share that cognate and divide it by the total number of language-cognate items. That gives us a sense of the extent to which languages participate in the most archaic vocabulary in the famiy. Plotting the most innovative language in each subgroup gives us the following map.


As you can see, the most innovative languages are not, in most cases, in the center of the subgroups, but rather on the peripheries.

What can explain the discrepency? It’s probably the result of migratory expansions. That is, the languages that are the most innovative are the ones as the ‘ends’ of their subgroup phylogenetic expansions. That is, the most innovative languages are the ones that have undergone the most branching; another way of thinking about this is that more innovation happens on lineages with more branching events. This echoes a result from other work by Atkinson, Pagel, and colleagues, who also found that lineage splitting speeds up change.

One might think that this result reflects language contact; that is, that languages on the periphery might be in contact with more different languages, which leads to an increase in unidentifiable vocabulary. But these languages are not the only ones which are in contact with languages from other subgroups. In fact, if we map the most conservative languages in each subgroup, they are also often to be found around the periphery.

It may still be the case that the center-periphery model still holds in areas where languages have stopped expanding, and that Pama-Nyungan subgroups were (on the whole) not formed by diversification in situ.

It’s also interesting to plot the most and least conservative subgroups:


This is a bit more dodgy. For example, I strongly suspect that Thura-Yura’s place in this list is inflated by Wirangu having (as loans) a number of items that are otherwise found only in Western Pama-Nyungan languages, and by Wirangu overall showing some Pama-Nyungan retentions that are otherwise replaced in the rest of Thura-Yura. The broad trend, however, is that the further east, the less conservative. The correlation between longitude and retention is -0.49. The correlation doesn’t hold for latitude (0.05) or number of languages in the subgroup (-0.02).

Pama-Nyungan language locations

As noted in a previous post, I’ve started to put some of the results of my Pama-Nyungan prehistory grant on my lab web site, at pamanyungan.net. One of the recent updates is a language map. The data are not new; this map was released in about 2011 (though with updates since). It is released through a wordpress plugin on the PamaNyungan.net site, which allows easy embedding of maps into sites. I highly recommend it for its ease of use, except for the fact that it doesn’t seem  to render in Chrome on a Mac (at least, not on my mac).

Comments on language locations, names, etc, on the map are very welcome. Please use the comment form on the map’s page.

Phylogenetics of kinship

[Update: materials are now available at pamanyungan.sites.yale.edu/kinship]

I am presenting work at the upcoming LSA meeting with a former undergraduate student and a postdoc (Amalia Skilton and Hannah Haynie). We have been working on kinship structures in Australian languages, using a combination of the comparative method and phylogenetic trait analysis.

The basic idea is that we can use our hypotheses of family tree relationships among Australian languages to reconstruct aspects of linguistic and cultural systems. In this case, we’re using the structure of sibling systems; that is, how many distinctions speakers of different languages make when referring to siblings. English just has two basic terms: ‘brother’ and ‘sister’; Bardi, however, has three terms: oombarn for older brother, bola or babili for ‘younger brother’, and marrir for ‘sister’ (Note that the Bardi system is asymmetrical, with two terms for brothers but only one for sisters.) Yan-nhangu also has a three-term system, but their system has a distinction for ‘older brother’ (waawa) vs older sister (yapa), but one term for ‘younger sibling’ (yapayapa). There are four fairly common systems in Australian languages (two, a four-way system and the Yan-nhangu-type three-term system, are the most common).

We reconstructed the sibling terms probabilitistically and then compared them to reconstructions of kinship lexical items, using the comparative method. We found that where the terms could be reconstructed, there was a great deal of congruity between the probabilistic state reconstruction and the comparative method reconstruction. However,

This sort of work isn’t well motivated for all systems. For example, it would not make a lot of sense to work on phoneme inventories in this way, because the inventories do not change independently of the lexical items in which they appear. That is, just because two languages both have a phoneme /p/, it doesn’t necessarily mean that those /p/s are “cognate” (because /p/ in one language could be cognate with /w/ in another, for example).

Congratulations to the ANU!

A big congratulations to Jane Simpson, Nick Evans, Simon Greenhill, and the linguistics team at ANU on their successful ARC application for a Centre of Excellence in language change!