I have a new paper out today in the Proceedings of the Royal Society B (Biology). It’s based on a project that I’ve been working on for some time now (most of it, though not this paper, is joint work with my student Tyler Lau). I’ve been aware of the Tasmanian wordlist data in Plomley (1976) for many years, of course, but it was only after getting more familiar with computational phylogenetics that ideas for work with the dataset came up.*

The paper has some (now) fairly standard phylogenetic analyses using tools that will be familiar to most people who know this area. NeighborNets are now a common sight in historical linguistics, and Bayesian frameworks for tree building are also increasingly well known (if not always accepted). But admixture models are less well known, so let me explain a bit about them here. One criticism commonly leveled at work that straddles evolutionary biology and linguistics is that the tools are adopted wholesale, without regard for whether they are appropriate for analyzing linguistic data; I would like to avoid that claim here.

STRUCTURE is a clustering algorithm (and associated software). It’s designed to solve the problem of how to assign individuals to groups when we don’t know what features characterize each group. For example, say we have a new wordlist of a European language. It’s easy for us to tell whether that wordlist is of a language we know already (and whether it’s French, German, English, or Russian, etc), or whether it comes from a language we’ve never seen before. That’s because we have a lot of high quality, independent data from those languages. In the case of Tasmanian data, however, we don’t have those independent sources. We don’t know in advance how to assign wordlists to languages.

And it gets more complicated. Not only do we not know how many languages are represented in the dataset, we also have a suspicion that at least some of the wordlists may contain words from more than one language. For example, the “Norman” vocabulary represents material recorded over a number of years from many different people. Other lists were recorded on the Flinders Island mission, where there were Tasmanians from all over the island. That means that similarities between two wordlists could be due to them belonging to the same language, or it could be that only part of the list belongs to the same language.

This is where STRUCTURE comes in. We need a way to simultaneously infer the number of meaningful groups represented in the data, and what the signals of each group are, and STRUCTURE provides that. STRUCTURE was designed to work with genetic data; it uses allele frequencies at unlinked loci to assign individuals to populations. What does that mean? Genes have variants, called alleles. Some variants are much more common in some populations than others, and so we can use the information on those frequencies to work out what the frequency signatures of different groups look like if we look at lots of different genes.

In the Tasmanian case, instead of looking at genes or allele frequencies, I’m looking at words and translation frequencies. The ‘words’ in English are the genes and the ‘translation frequencies’ are the equivalent of alleles in the original model.  STRUCTURE makes some assumptions about the data it is grouping. For example, it assumes that each locus of sampling is independent (that is, that the alleles aren’t conditioned by one another). This assumption holds for the language data too. Changing your word for ‘hand’ doesn’t also mean you need to change your word for ‘water’: they vary independently. STRUCTURE also assumes that the genetic data are in Hardy-Weinberg equilibrium. This assumes that the proportion of allele frequencies are constant across generations (that is, that the loci chosen for study are not under selection pressures). This is a controversial assumption in genetics, and lexical selection pressures are not widely studied in historical linguistics in a comparable way. What this most likely means for the data analyzed here is that we should treat the inferred clusters as synchronic groups only: that is, we can’t use STRUCTURE to infer the number of languages or families in the data—but there are other tools for that. Once we’ve identified the wordlists with mixed data, we can exclude them from initial analysis and use the remaining wordlists to study the number of languages and families represented in the dataset.

That’s where the tree-building and network algorithms come in. SplitsTree has an implementation of the NeighborNet algorithm, which is very useful both for inferring language clusters and for detecting conflicting signal. For building a tree, I used BEAST. The tree allows some provisional dating and gives an idea of the degree of confidence in higher-level clusters. In this case, I see little (if any) evidence for a single Tasmanian family. Remember that the tree-building programs build a tree out of all items in the analysis, so the fact that all nodes are linked in a tree doesn’t mean that all the languages are related.

That’s where analysis of the cognate data comes in, and where we return to the comparative method. The supplementary materials provide some discussion here. Arguments about cognacy and similarity ultimately come down to the judgments of the linguists involved. In the supplementary materials, I take all 26 (yes, 26, out of more than 3000) words which Plomley judged to belong to the same word-family and find problems with just about all of them. Most are either data errors or clear loanwords.

I hope this paper will show some of the ways in which computational tools can be useful in historical linguistics. I also hope that it encourages linguists not to give up on sketchy data. Study of Tasmanian languages has been written off because the data are too messy to work with, and indeed, it is a fragile data set. 18th Century Tasmania is pretty high on the list of places that linguists will want to visit when time machines are invented. However, I’ve shown here that we can get more information about the languages than we’ve previously assumed. Now that we have internally coherent information about the language clusters on the island, we can start to identify systematic differences between the clusters. Tyler Lau and I have projects in progress on this topic.

*Part of working on Tasmanian involved digitizing all the wordlists in Plomley (1976). I haven’t made the file generally available yet, because I am still waiting to hear back from Brian Plomley’s literary executors about whether it is ok to do so.


13 responses to “Tasmanian Languages

  1. Professor Bowern-

    A very interesting and stimulating article, my compliments. However, there is an unspoken assumption which permeates your entire article: namely, you seem to assume that after the rise in sea levels which turned Tasmania into an island some 10 000 years ago the inhabitants of Tasmania and those of mainland Australia remained isolated from one another.

    I would maintain that this asumption is unwarranted and that for ought we know *none* of the recorded Tasmanian languages may descend from the language(s) spoken by the first Tasmanians 10 000 years ago: it is quite possible that *all* of the recorded Tasmanian languages were in fact brought to Tasmania from the mainland at a later date, indeed perhaps at a much later date. And there needn’t have been a single migration: indeed, since, as you show, there is no evidence of there having existed a “Proto-Tasmanian” it may well be that the recorded languages/language families of Tasmania were all languages/language families originally spoken on the mainland before the spread of Pama-Nyungan, and perhaps transplanted in Tasmania at different periods.

    Indeed, I will go a step further and claim that the above scenario is in fact much likelier than one which treats Tasmania and Mainland Australia as wholly isolated from one another for 10 000 years.

  2. Thanks for your comment Etienne. There are two things that make repeated colonisations by sea unlikely (though not impossible). One is the length of the journey (400+km over rough seas). That’s a formidable barrier. Not impenetrable, but it makes it pretty unlikely that there was contact of any substance. The second is that the genetic evidence points to Australian and Tasmanian populations being separate.

  3. Thank you for your prompt reply. Regarding your first point: I cannot help but note that the expansion of Austronesian over the Polynesian triangle, or from Borneo to Madagascar, involved pre-modern maritime migration over distances that make the range between Tasmania and the Mainland look unimpressive, to say the least. Regarding your second point, languages could easily have spread from the Mainland to Tasmania without this language spread having involved demographic replacement of the original inhabitants of Tasmania.

    And the problem I have with any claim that Tasmanian languages go back to the language(s) of the original Tasmanians of 10 000 years ago is the following: it leaves the similarities with Mainland Australian languages unexplained. Dixon & Crowley have shown that Tasmanian languages, phonologically, were quite faithful to the phonological template of a typical Australian Mainland language.

    Unfortunately, I am unaware of any instance where languages remain phonologically similar after a 10 000 year separation. In much less time than that Indo-European turned into such phonologically and phonotactically different languages as Gaelic, English, Italian, Russian, Armenian and Hindi.

    It seems to me that Australianists take for granted a “continuity hypothesis”: that is to say, they assume that present-day Australian languages *must* go back to the language(s) of the first human beings who settled Australia. But this is not what is observed elsewhere, even in insular settings: neither English nor the Celtic languages of the British Isles go back to the language(s) of the first human beings to settle there, for example. The same could be said of the Arawakan languages spoken in the West Indies in the fifteenth century, of Greenlandic in Greenland, of Aleutian on the Aleut islands, of Maltese in Malta, of Sinhalese and Tamil in Sri Lanka…why should Australia be different?

  4. Sure, but Austronesians had outriggers and celestial navigation; the tropics are also rather more friendly for sailing small boats than the roaring forties. It’s not a coincidence that it took the Polynesians ages to get to New Zealand in comparison to the rest of the Pacific.

    ” languages could easily have spread from the Mainland to Tasmania without this language spread having involved demographic replacement of the original inhabitants of Tasmania.” – By what mechanism? People don’t give up their languages on a whim.

    “Dixon & Crowley have shown that Tasmanian languages, phonologically, were quite faithful to the phonological template of a typical Australian Mainland language.” Dixon and Crowley have shown no such thing. Their study was flawed by considering all the data at once (which is rather like trying to deduce the phoneme inventory of “European” as though it were a single language). I’ve been working on this with a student recently and it’s pretty clear that some Tasmanian languages had contrastive voicing, while others didn’t; that there were likely different vowel inventories and different phonotactics among the languages too. Remember that C&D said you couldn’t tell how many languages were spoken on the island, and they didn’t do any statistical investigation of the inventories.

    I do not assume a continuity hypothesis for Australia at all (as I think Harold Koch and I wrote in the introduction to our 2004 book). I assume (based on a combination of methods, including phylogenetic dating) that Pama-Nyungan is about 5000 years old. I have no idea why Australian languages appear to be phonologically conservative, though I suspect it has something to do with the relative stability of phonological systems without voicing or fricatives (two major sources of sound change in the languages you mention).

  5. Professor Bowern-

    I am puzzled.

    You do not assume a continuity hypothesis for Australia, yet you do assume a continuity hypothesis for Tasmania. You ask me, in response to my bringing up the possibility that (a) language(s) may have spread from Mainland Australia to Tasmania: “by what mechanism? People don’t give up their languages on a whim.”

    Here’s my answer: the same mechanisms whereby Pama-Nyungan has spread over four-fifths of Australia, whereby all of the other languages I had listed in my earlier comment have expanded and replaced earlier languages. The surface area of Tasmania is quite modest when compared to the surface area over which (for example) Pama-Nyungan has spread; the topography and oceanic conditions quite clement compared to the areas over which (for example) Inuktitut and Aleutian have expanded.

    You write that Austronesians had “outriggers and celestial navigation”. Indeed we know this. You take it for granted that the inhabitants of the southern coast of Australia, for the past 10 000 years, utterly lacked any maritime technology allowing them to cross the Bass Straights. We do NOT know this. Considering the fact that sea levels have steadily risen over the past 10 000 years, any archeological traces of an earlier maritime culture linking Tasmania and coastal Victoria must be underwater by now.

    In short: absence of evidence is definitely not evidence of absence.

    I hope your future work will clarify the phonologies (and other aspects) of the various languages which were recorded. At the same time I hope you will keep your eyes open for any evidence of links between Tasmanian and Australian languages. Here’s a thought: while the known languages of Victoria State are Pama-Nyungan, might there be traces left (toponymy, borrowed vocabulary, special registers) of whatever pre-Proto-Nyungan languages were once spoken there? If so, comparing these data to the Tasmanian data might yield a surprise or two.

    • In the case of Pama-Nyungan there were probably substantial population size asymmetries between the groups (if one buys arguments that relate the spread of Pama-Nyungan to intensification). Also, for a fair part of the continent, Pama-Nyungan speakers probably weren’t directly replacing non-Pama-Nyungan speakers, but were re-colonising sites that had been abandoned due to climate change.

      Sea levels have not “steadily” risen; they rose rapidly following the end of the LGM and then leveled off in that area, due to the topography of the continental shelf.

      I didn’t say that it was impossible that Tasmania was repeatedly settled from the mainland, only that it is very unlikely (and in the paper my conclusions were in favour of archaic diversity but I made no claims about how that diversity might have arisen). Another point that speaks against multiple waves of immigration and influxes of new technology into Tasmania is the demographic collapse which follows the flooding of Bass Strait, where tools like fish hooks and needles disappear from the archaeological record. It’s hard to reconcile that with your claim that there was sufficient migration to lead to language shift.

      Various people have claimed evidence for substrate effects in Victorian languages, but given that we don’t know anything about those languages, it’s easy to come up with a story about some feature being related to pre-Proto-Pama-Nyungan languages. Unfettered by data, it’s possible for the imagination to run wild.

  6. Minor correction: my last comment, third to last line: “pre-Proto-Nyungan” should be “pre-Pama-Nyungan”, of course.

  7. Professor Bowern–

    “Unfettered by data, it’s possible for the imagination to run wild”. Indeed it is. However, re-reading your comments I find that unchallenged assumptions are no less dangerous. In your January 29 comment you end by presenting your suspicions as to why Australian languages are so phonologically “conservative”. I would use the word “uniform” instead, and would argue that this is due to diffusion (much in the same fashion that the Modern Indo-European languages of Western Europe typically exhibit a voicing opposition for stops and fricatives, despite Proto-Indo-European having only a single fricative:/s/, and quite possibly no voicing opposition either, if the “glottalic theory” is to be believed…in which case your “guess” that lack of fricatives and of voicing yields diachronic phonological stability would appear to be in need of revision).

    You point out that Dixon and Crowley’s work is flawed. I accept this. Yet the data salvaged by Crowley is so oddly “Mainland Australia”-like that an explanation appears to be called for. And if you assume that no migration across the Bass Straights can have taken place, then indeed you must assume extreme phonological conservatism to account for this Tasmanian-Continental Australian similarity. But this strikes me as a needless OBSCURUM PER OBSCURIUS-type explanation: I am unaware of any instance of such phonological conservatism anywhere. A migration scenario would account for the data just as well, and would not require postulating linguistically exceptional circumstances.

    And if the archeological data do not appear to be compatible with such a theory…well, so much the worse for archeology. We’re linguists, not archeologists, and the blunt fact of the matter is that there are plenty of instances of language spread which have left no trace in the archeological record.

    One final point: you write that the Bass Straights are 400 kilometers wide. According to the Wikipedia article on the Bass Straights, at its narrowest the Bass Straight is 240 kilometers wide, and that does not include various islands.

    • I didn’t say that Bass Strait is 400 km wide. The island-hopping route (which one would most likely take if in a small craft) involves about 400km of paddling.

      Crowley may have seen mainland features in the data because he didn’t realise that one of the Ben Lomond wordlists is actually Kaurna (see Amery 1996 for details).

  8. Professor Bowern: thank you for the reference. However, I expressed myself poorly: when referring to the data Crowley salvaged I referred to the little Tasmanian data (individual words, songs) he obtained through field work.

    • Crowley’s stuff is all at my office, and I am not, but from what I remember, the song data had heavily Anglicised phonology and the speakers did not know where the word divisions were.

  9. Professor Bowern: I just checked a library copy of Dixon and Crowley’s article, and while the data elicited by Crowley himself is somewhat (but not wholly) assimilated to English phonology, he does quote data collected by an earlier fieldworker whose meaning is known, and whose phonotactics and phonology (notably a seeming distinction between a lamino-dental and a plain dental stop) both appear surprisingly Mainland Australian-like.

    It isn’t much, but I cannot help but note that a single sentence of a typical language of (for example) the American Pacific Northwest, even the contact pidgin Chinook Jargon, would look nothing like a typical Australian language. And yet a separation of 10 000 years would be more than long enough to make the languages of Tasmania wholly un-Mainland Australia-like. Somehow this is not what the little data available points to. So while my hunch about overseas migrations from the mainland to Tasmania may be wrong, I believe it deserves to be considered seriously. Anyway, I have made my case, I shan’t pursue it further.

    A request for clarification, however, if I may: in your February 2 comment you mention that we don’t know anything about the languages once spoken in the State of Victoria. Yet every linguistic map of Aboriginal Australia I have seen shows the State of Victoria as being solidly Pama-Nyunga-speaking. So I am wondering: did you mean that we know nothing of the pre-Pama-Nyungan languages once spoken there, or did you mean that we know little of the aboriginal languages spoken there at the time of European colonisation? If the latter, do we even have grounds for assuming that (some or all) said languages were indeed Pama-Nyungan? I have noticed that, unlike some linguistic maps of indigenous languages of the Americas, linguistic maps of Aboriginal Australia do not show “gaps”, i.e. areas where no language family is shown because nothing is known of the language(s) once spoken there. Are there likewise parts of “Pama-Nyunga”-speaking Australia that are simply indicated as Pama-Nyunga-speaking because this is assumed on the basis of the (known) surrounding languages being Pama-Nyungan?

    • ah, ok, the Westlake and Tindale samples. The lamino-dental and palatal distinction is probably fairly recent in Pama-Nyungan as a phonemic distinction. It’s marginal in some of the languages that have it (though it’s there allophonically in others).
      For Victoria, sorry that was unclear. I was indeed referring to the Victorian languages spoken there before Pama-Nyungan. The currently attested Victorian languages are all solidly Pama-Nyungan: see Bowern and Atkinson (2012) for details.

