Introducing CHIRILA

I am very pleased to announce that the first phase of CHIRILA (Contemporary and Historical Resources for the Indigenous Languages of Australia) has been released. This represents approximately 180,000 words from 155 different Australian languages. It is a subset of the full database (of approx 780,000 items); eventually I hope to be able to release most of the data. Currently, the first phase is that for which we have explicit permission, or which is already in the public domain.
The material is hosted at; please see the web site for more information about the contents of the database, how to download data, what formats are available, and the like. We do not provide a web interface to the data; you download it and use excel or a database program to read the files.
We hope the data will be useful to researchers, community members, and others with an interest in Australia’s Indigenous language heritage. also includes access to the preprint of a paper describing the database (both the online and full versions).

Documenting Endangered Languages outreach videos

A new set of videos have been released which provide information on how to apply for a grant to do language documentation. The series is focused on the requirements for the National Science Foundation’s DEL program, but there is much information that would be useful to anyone applying for funding for their language projects. The videos are aimed at community members as much as (if not more than) academic linguists.

I have two of the video segments: components of an application, and 6 things that tank a grant proposal. The first segment is DEL-specific; we walk through the sections of an application. The second one, however, is very general, and applies to just about all grant applications.

In brief, the six things are

  1. A project outside the agency’s mandate (e.g. DEL funds linguistic work on endangered languages)
  2. Project doesn’t meet the agency requirements (e.g. they ask for X, Y, and Z in the application, but if that’s not provided, it’ll be rejected;
  3. Unrealistic aims, budget, time frame.
  4. Too vague
  5. Too specific, too narrow for the scope of the budget or time, ie not good value for money
  6. Inconsistency in the proposal.

You can watch the video here for further information.

Edited to add: Production of the videos was funded by NSF grant BCS#1500695, awarded to Racquel Sapien and Carlos Nash. Any opinions, findings, and conclusions or recommendations expressed in this material are those of the author(s) and do not necessarily reflect the views of the National Science Foundation.

Pama-Nyungan language locations

As noted in a previous post, I’ve started to put some of the results of my Pama-Nyungan prehistory grant on my lab web site, at One of the recent updates is a language map. The data are not new; this map was released in about 2011 (though with updates since). It is released through a wordpress plugin on the site, which allows easy embedding of maps into sites. I highly recommend it for its ease of use, except for the fact that it doesn’t seem  to render in Chrome on a Mac (at least, not on my mac).

Comments on language locations, names, etc, on the map are very welcome. Please use the comment form on the map’s page.

Endangered Languages Catalogue is Out

Point your browsers right now to and check out your favourite language. It’s definitely a work in progress, but it’s nicely presented and I hope it will provide a realization of just how precarious so many languages are.

I prepared a FAQ for the Australian section that explains some of the choices of for including languages, as well as other things to do with Australia. It got overlooked in the original launch, so I’m copying it here. It will hopefully answer some of the questions that linguists and the general public will immediately ask when they see data like this.

Another thing I need to say, since it also didn’t make it into the launched database, is that there *is* source material in the underlying database for language names, speaker numbers, work on the language, and so forth. In fact, there’s a huge bibliography behind this project that the ElCat team compiled over the last year. Linguistic aficionados will recognize data from, as well as other sources. Hopefully that will be more clearly acknowledged before too long.

Finally, many thanks to everyone who gave me feedback earlier in the year. I did submit changes based on your comments, however it seems that many of those changes are not reflected in the current dataset; the changes are still being worked through. Same deal with all the resources I was sent; I’m sorry that so few of them appear to have made it into the launched version, but they are there and should hopefully appear soon. The map data is, however, mostly my responsibility: so if it’s wrong, do tell me directly or submit a comment on the site.


Who compiled this list?

The Australian section was a joint effort between the LinguistList and Claire Bowern, using data from many sources. Claire’s work was funded by NSF grant 844550 “Pama-Nyungan and Australian Prehistory”, though any opinions expressed in those parts of the site do not reflect opinions of the NSF.

I thought Australian languages were just dialects. Why are so many languages listed?

There are 27 different language families in Australia, and about 380 languages. (By way of comparison, Europe has about 250 languages in 4 families.) Some of the languages are quite similar to each other, while others are as different from each other as Chinese and Hebrew, or English and Japanese.

Some languages have no speakers listed – why are they “endangered”? Aren’t they “dead”?

For some languages, we haven’t been able to confirm speaker numbers. In other cases, there isn’t anyone who has grown up speaking the language, but there are still people who identify with the language, and who are working to revitalize their languages.

What does it mean to say a language is “sleeping”?

Some languages aren’t spoken daily anymore, but there are community groups who are working to bring their languages back into use. Some of those communities refer to their languages as “sleeping” rather than “dead”, since those languages are still an important part of the life and identity of the community, even if they aren’t regularly used.

Why are so many Australian languages endangered?

There are a lot of reasons, many of which date back to the early years of European settlement. Introduced diseases killed many Aboriginal people, along with hunger from reduced access to hunting grounds. In some cases, it’s because of massacres. At the Mindiri massacre at Kooncherie Point in the mid-1880s, well over 100 people were killed, including most of the speakers of Wadikali, Pirlatapa, Yarluyandi, and Malyangapa. Later, other groups were disproportionately affected by Stolen Generations policies [link:]. Social and economic reasons have also led to many Aboriginal people shifting to English, Kriol [link:], and other Aboriginal languages.

My language is strong! Why are you calling it “endangered”?

There are many different ways that a language can be endangered. Because the number of speakers of Aboriginal/TSI languages is small overall, it doesn’t take much for some languages to come under threat. Children find it hard to resist the pressure from the media, schools and the internet to switch to speaking English most of the time. Once children have made that switch, the language is severely endangered.

Some communities don’t realize at first that their languages are under threat. For example, they might think that the language is healthy because it’s still used in the community, but it might be only the elders who are using it – that’s a sign that the language is endangered.

We recognize that some languages in the catalogue are still strong, that children are learning them and they are actively used in the community, and we want to support that work. Let us know what you’re doing, and we’ll make sure we update the catalogue.

Where can I find out more information about Aboriginal/TSI languages?

New South Wales: [Koori Centre, Universirty of Sydney]


Western Australia:

South Australia:

General: [a recent report on language use in Aboriginal Australia]

I want to learn an Aboriginal language: where can I find more information?

For starters: for Pitjantjatjara publishes books on Aboriginal languages for the Kaurna language of the Adelaide Region Charles Darwin University’s Yolŋu Studies unit

I want to find out more about my language – where do I go?

Try for published sources, and for the archives of the Australian Institute of Aboriginal and Torres Strait Islander Studies (they have a lot of unpublished information about language and culture).

I’m a speaker of an Aboriginal/TSI language and I’d like to work with a linguist – who should I contact?

Submit a comment on the language with your contact details and we’ll put you in touch with local people – we’d love to hear from you!

The Documenting and Revitalizing Indigenous Languages (DRIL) Program team may be able to help you:, or

Or if you live in South Australia the Mobile Language Team may be able to help:

My mum/dad/grandparents speak some Language and I’d like to record them. Do you have any advice?

Have a look at the links at, or the tutorials at

We have good ideas for helping maintain Aboriginal/TSI languages but it needs some funding and support – where can we get it from?

If you live in an area with an Indigenous language centre, ask them. The Federal Government funds some language work through the Indigenous Language Support program:

Dialect survey

I’ve posted around a bit about the dialect survey that colleagues at Auckland and I are doing (but neglected to put anything on my own blog).

We’re collecting data on varieties of English in the US. This includes geographically based dialects but also varieties of English associated with ethnicity, and differences in gender and age. So far we have about 1500 participants. Here’s a map by current location.

Participants in North American Dialects survey, December 2010 (current zip/post code)

Participants in North American Dialects survey, December 2010 (current zip/post code)

As you can see, there’s a lot of data here (we’re already more than twice as big as the next largest audio survey), but there is still a way to go. We’re still collecting data from everyone, of course, but there are a few particular groups of people we’d like to hear from.

  • The data is highly skewed by age and ethnicity. This is a fantastic data set to study the accents of Caucasian/White people under 35, but we’d like a more representative view of North American English. If you would answer anything other than “white” to a question about your ethnicity, we’d love to include you!
  • If you live now or went to high school between the Rockies and the Mississippi, we’d like to hear from you! As you can see, we have a lot of data from the larger (and coastal) states but coverage is more sparse in the central and western inland areas.
  • If you’re over 35, we’d also like to hear from you! (the age skewing is a relic of our original recruiting, which has been largely through Facebook and college classes).

I will be doing more recruiting in the coming months. Feel free to help me by forwarding the link widely among your networks (family, friends, book clubs, sports clubs, etc).

More fun stuff with google earth

A while ago I posted a picture and some information about using google earthto display family tree hypotheses. I’ve been doing more things with google earth in the meantime.

One is a .kmz file for Australian languages. Justin Lo has been working on my Pama-Nyungan grant over the summer and he did a lot of the work of added centroid points. We now have a version of the file that can be downloaded. Some further information is available at the Pama-Nyungan grant blog. Let me reiterate that this is a work in progress, there will be new versions, and stuff is gonna change. It’s got mistakes in it, which we will correct as we find them. These are data points (that is, they don’t show the full geographical range of the language area). You can download this file and use it in your research but please ackonwledge the use of the file. Comments, corrections and notes on what you’ve used the data for can be sent to or added as a comment here or at the other blog.


Database of structures

There’s a new online database of structures being developed. This from the LinguistList:

Dear LinguistList Readers,

We are pleased to announce SSWL, an open-ended database of the syntactic
structures of the world’s languages:

(alternatively, Google: sswl database)

Please feel free to go to the site and play around with it, doing searches
and browsing the languages and properties.

Ultimately, we hope to fill the database with thousands of grammatical
properties and thousands of languages all provided by members of the wider
linguistic community.

If you have any questions or comments, please send them to:

Chris Collins
Department of Linguistics

I will have more to comment on this once I’ve played with it a bit more.