New preprint (etc) archive for Australian languages!

I worry about data. It’s my job. I worry about how to analyze it, how to collect it, how to present it, and what happens to it. A particular worry for me at the moment is the very large amount of ‘grey’ publications for Australian language: that is, the language materials that are published locally, for example, by language centres or smaller publishers. There are also gems in working papers collections, some of which only exist in photocopies of photocopies at this stage. Some important work has come out of Hono(u)rs Theses, but that work isn’t often widely available, and unlike PhD theses, it tends not to make its way to university repositories. I have a large collection of such materials, both in print and in scanned format, and I presume that others do too, particularly the “older generation” of Australianists who did most of their work before putting stuff on the web was what one did as a matter of course.

Another area of accessibility in work on Australian languages is fire-walled publications (or subscription-only publications). There is an increasing attention to Open Access, but for various reasons, much work is either print-only or e-print but behind a firewall. But in many cases, authors are able to upload freely available preprints.

It’s important to make our work available to the many groups who are interested in language: to our linguist colleagues, to the wider scientific community, to the general public, and in particular to members of the Aboriginal community.

So, I’ve started a ‘community’ on Zenodo for Australian languages. Zenodo is an archive platform for sharing research. In a nutshell, you upload your paper, handout, or other item, give the site some information about the work’s metadata, and publish it. You choose a license to share your work under (it can be closed (archived), for example), upload the file(s), and presto!

Zenodo is somewhat similar to and, in that it takes work and makes it available. However, there are also a couple of big differences. Both and researchgate are for profit, while Zenodo is not for profit, and funded by CERN and programs in the EU’s Open Science Initiative. Zenodo uploads are publications, while the others’ aren’t. Zenodo assigns DOIs, allowing for referencing versions of publications (which makes it great for databases or dictionaries or other work that might have multiple editions or versions). It also lets you upload collections of files as a single item (which the others don’t), and it works well with code repositories like github, so you can publish the paper, supporting documentation, and code at the same time. If you have sound files, you can include them with the paper under the same DOI (which you can’t do on academia, for example)

Another issue is findability – in theory, everything on the web is ‘findable’ if you know what to search for. Search engines, however, optimize results, weighting results from different places differentially. I know from the experience of finding papers for ozpapers that it can be hard to find work on Australian languages, even when I have regular alerts set up. For example, not all university thesis repositories show up in google alerts (you have to know what to look for)

To contribute to Zenodo, go to

You’ll see a list of current contributions and a button to upload.

If you have old handouts, or other useful information about Australian languages, that you would like to contribute but do not have the time/inclination to upload them, if you can get me the scans (or even paper copies), my students and I will upload them for you.


New bootcamp under way!

The 2017 grammar boot camp starts tomorrow. Three students (with bios below) will be working with me on materials for Noongar. We’re very lucky to be working with Denise Smith-Ali, Noongar linguist, and Sue Hanson from the Goldfields Language Centre. Our main focus for the month is to put together a phonological description of Noongar, with sound files to illustrate what we are describing. In some ways, this is pretty straightforward (in that it’s the sort of thing linguists do, the scope is known, etc) but in other ways, it’ll be a challenge! For example, we want to make something easy to access, and easy to edit and update. We’ll be posting more about this as we make decisions.

Akshay Aitha: Akshay is a rising senior at UC Berkeley working on a double major in Linguistics and Applied Mathematics (with a concentration in Logic). My main research interest at the moment is the functional structure of nominals, especially in my heritage language, Telugu. I also have a strong enthusiasm for linguistic fieldwork. Outside of my coursework, I’ve been involved as a research assistant on various phonetics and fieldwork projects under graduate students in the Berkeley Linguistics department, and I’m also involved in my department as an officer of our club for undergraduates, SLUgS.

Lydia Ding: Lydia is a recent graduate of Carleton College, where she majored in Linguistics and completed a senior thesis for distinction on wh-questions in Nukuoro [nkr] (Polynesian). Her primary interests lie in language documentation, syntax, morphology, and computational linguistics.

Sarah Mihuc: Sarah is a recent graduate of McGill University with a BA Honours in Linguistics & Computer Science. She works on anti-agreement and on word order in Kabyle Berber. She also has experience in experimental and computational linguistics, and fieldwork on two Mayan languages.

Teaching statement

I’ve finally figured out what I want to put in a teaching statement:

I am a linguist and I teach about linguistics, particularly language change and language documentation. My teaching is research centered in that I want my classes, from freshman classes to graduate seminars, to be places where my students learn how to ‘figure stuff out’ – how to step outside their starting assumptions to figure out what language tells us about how our world works, how to find out what they don’t know, even when they think they know it, and how to be constructive critics of their own and others’ work. I want them to be excited about learning and not to see the syllabus as simply a set of hoops to go through to earn a grade. In short, I teach students how to think, not what to think.

If language were spoken in a vacuum, my teaching statement could probably end there, vague though it is. But language is spoken by humans and researched by humans, and humans are complex. Views about language, from the appropriateness of teaching spelling, to when to introduce a second language, to who should be bilingual, to who speaks better than others, pervade our lives. They affect the type of data that linguists can use, and more concretely, they directly affect the lived experience of a large fraction of the population, for better or for worse.

Linguists can, and should, have a lot to say about this. Our commitment to the ‘scientific’ study of language has implications, both for how to study social dynamics, and the ways in which language is used to reinforce or deny power. Our work as academics gives us tools to critically examine social constructs, to separate the content of claims about the world from the language used to deliver those claims, and to see the implications of such arguments.

My practical focus in this lab is on a combination of educational outreach and training, and the commitments that this entails. Quite simply, students need to be able to do the best work they can in my classes and research group, and if they can’t because they are systematically disadvantaged, that’s not just their problem, it’s my problem too.

How does this translate into concrete activities? For me, this means a twin focus on the broader impacts of training current and future researchers, and of making our methods, results, and approaches more available to others.

Within the lab and classroom, it means fostering an atmosphere of excellence and respect, where everyone’s contributions are acknowledged and valued. It means acknowledging the realities of implicit bias and how it can affect both our work and our perceptions of excellence. It means acknowledging and leaving time to explore history in the classroom.

For training, it means working from a broad definition of ‘excellence’ that factors in opportunity and potential as well as results achieved to date. It means recognizing that ‘pipeline’ questions won’t solve themselves without effort.

For activities, it means a genuine commitment to outreach. This includes making sure language materials are accessible to the people who need them, that we preferentially publish in open access journals, that we provide plain English summaries of our work, that the results of our work are integrated into general outlets such as Wikipedia, and that we help people who want to learn about linguistics and don’t have the resources to do so. It means not just an informational role, but an advocacy role for topics where our research is relevant, such as language endangerment.

Routledge Historical Linguistics

I am the editor of a new book series in historical linguistics, to be published by Routledge. We are now accepting publication proposals. The series blurb is below. As you can see, we are interested in historical linguistics as broadly defined. I would welcome proposals for monographs or edited volumes. Books based on dissertations are also welcome.

Historical linguistics is one of the oldest disciplines of linguistics. It is the glue of much other foundational work on the nature of language and provides crucial insights into how humans have migrated and interacted with one another over the last five thousand years. It is both a highly theoretical and profoundly empirical field, rooted in the traditions of language documentation. Routledge Studies in Historical Linguistics reflects the diversity of work that studies language from a diachronic perspective. The scope of the series includes: 1) diachronic issues within specific subfields of linguistics, including (but not limited to) issues related to the theory of change in phonology, morphology, phonetics, syntax, semantics, pragmatics, language contact, and typology; 2) Within historical linguistics, work dealing with language classification, grammaticalisation, contact‐induced change, reconstruction, as well as theoretical perspectives on language evolution; 3) comparative/historical grammars of specific languages and language families, including detailed comparative reconstruction in phonology, morphology, and syntax; 4) interdisciplinary studies which combine language change with insights from (but not limited to) archaeology, anthropology, history, or geography.

For more information about the series or to submit a proposal, please contact me:

To view more of our recently published linguistics research monographs and all of our Routledge Research linguistics series:

Grammar Boot Camp, 2017

I will again be holding a summer ‘grammar boot camp’ at Yale this summer. The dates will be from Wednesday, June 28 to July 26, 2017. (Note that these dates overlap with the LSA Institute at the University of Kentucky.) The idea is to have up to four advanced undergraduate students work intensively on existing high-quality archival field notes and recordings with the aim of producing a publishable sketch grammar. Students will receive a stipend and travel expenses to come to Yale.
This project is funded by the National Science Foundation’s Research Experiences for Undergraduates program; as such, applicants are limited to US citizens or permanent residents. Students who will have graduated in Spring 2017 are eligible to apply. The targeted cohort is undergraduates who will have just finished either their junior or senior year.
The materials to be worked on will be from an Australian Aboriginal language from Western Australia and will include both print materials and audio files. It is probable that the ‘print’ materials will already be digitized and in Toolbox.
Students will meet once a day as a group with me to discuss analyses and writing. They will spend the rest of the time working with the materials in the Linguistics department. They will receive regular detailed feedback on the analysis and writing. Familiarity with Australian languages is not required but I would expect that successful applicants would do some reading of grammars of related languages prior to the start of the boot camp.
Applications for the boot camp are now open. The deadline for applications is January 31, 2017, and applicants will be notified of the result in mid-February.
To apply, please send the following materials electronically:
. a letter of application, describing your experience in linguistics, including research experience, experience with language documentation and relevant software, your future plans, and why you’d like to join the boot camp.
. a writing sample, such as a linguistics term paper
. course transcript (this can be an unofficial transcript)
Please send materials as file attachments to, cc’ed to Applications will be acknowledged within 2 days – if you don’t get an acknowledgment, please let me know.
Please also arrange for one or two letters of recommendation/support from faculty to be sent to the same email addresses, also by January 31.
Students will need to show some evidence of prior research experience (e.g. through an RA-ship or by having a senior thesis in progress) and some familiarity with language documentation procedures (e.g. through having taken a field methods class or equivalent, such as having attended CoLang or a LSA Institute class). Applicants will need to show attention to detail and ability to focus on a project for a sustained period. Students will need to be able to travel to New Haven for the entire period of the boot camp and should expect to work solely on this project during that time, including some evenings and weekends.
Please forward to anyone you think would be interested and feel free to contact me with any questions.

New paper on language and genetics in Aboriginal Australia

Somewhat belatedly, here is a link to new work of mine and colleagues’ on gene-language coevolution in Pama-Nyungan, the peopling of Sahul, and migration and admixture in the Pleistocene. It was recently published in NatureThere’s a lot in this paper, a Genomic History indeed. There has been some media attention, particularly Michael Erard’s piece on Pama-Nyungan phylogenetics and how important computational work has been to recent advances in Australian language history. There’s also a summary piece in The Conversation, particularly about the genetic side of the paper.

Google Drive is dangerous

Following yet another sync failure, I decided to migrate my lab’s digital files to google drive. The reasons for picking them were that Yale offers unlimited storage, and that many of my students use the web version of google docs to draft materials. We were having issues that students weren’t syncing their files with box regularly.

So, about a week ago I copied a lot of folders from box to drive, and removed the materials from box once it was clear that google drive had synced. All the files are showing on the google drive web site.

However, google drive offline did not sync more than 3 folders deep, even though it recognized the directory structure. The folders are simply empty. Moreover, it seems to have had a lot of trouble with files which aren’t .docx, .xlsx, .pdf or .mp4. Given that all the materials are on the web drive, there’s been no data loss. I just need to resync or at worst re-download the folders, right?

Not so simple. There’s no way to force google drive to resync. Signing out and singing in again is supposed to “encourage” it to sync again, but if the local copy of the file structure is corrupt, that won’t help. So, we download the folder off the web version, right? Again, not so simple. There’s a 2gig file limit on downloading, and if the limit is reached, google drive produces an error, but it also just goes ahead with the download.

I am obsessive about backing up in multiple locations as well as web backups, so I haven’t lost any data. It’s just a bit of a pain to re-integrate the recently modified files.

And it’s safe to say that we will *not* be using google drive for actual backing up. We will probably continue to use it for sharing work in progress and writing the articles associated with the projects, but for actual data curatorship, we’ll be going with dropbox.