Category Archives: Technology and Software

Ipads for research

I’m taking part in a trial of ipads for the field methods class this semester. I’m not totally convinced that it’s going to work yet, since I’m a bit suspicious of the recording capabilities and of how seamless it will be to get items on and off the devices. We will certainly be making backup recordings using my field equipment for at least the first few weeks.

However, one of the side effects of this is that I’ve been spending a lot more time working on an ipad recently, trying out apps. I’m even not taking my laptop to the LSA (I’m writing this post on an ipad on the plane to Minneapolis).

Couple of observations:

The ipad I’m trialling came with a ‘Zagg’ keyboard case. The keyboard itself is quite good. It’s comfortable to use and very responsive. The cover itself is rather clunky and heavy, and the charging position for the keyboard is in an irritating position (the keyboard has to be partly removed from the cover to charge it). It’s also fairly straightforward to pair the keyboard with multiple ipads.

I have an ipad mini and while that size of tablet is mostly great, it is very helpful to have the larger size when working on latexed documents. My ipad mini is also heavily child-proofed, which makes it almost impossible to use with a stylus. I have yet to find a decent handwriting ap that might be useful for field methods. Let me know if anyone knows of one (the stumbling block is the need to be able to use handwriting recognition with accented characters).

We are using Auria for the recording app, dictapad for transcription, and we will be loading the class data into LingSync (which has an online version for minimal data entry). We are syncing files through Dropbox and Box. TeX Writer is great (LaTeX app allowing fill compilation on the ipad) and Zotero for reference management.

So far the biggest issues have been a) the usual problem of syncing between multiple devices and making sure they are all up to date (forgot to do that before leaving…) and b) only having one window at a time. On the other hand, only having one window does make email much less of a distraction.

I will continue to provide updates as the semester progresses and we use the ipads.

domain name update

The domain is now defunct. (I registered it in 2005 through yahoo “small business” as part of my Houston home phone account, and now can’t recover the account information. It’s somewhere in internet limbo.) My blog is still available from, though. At some point I may even have time to write some more contentful posts.

Great digital tools

Nick Thieberger has a great post on new digital tools in the humanities (bleeding over into linguistics). It’s been a while since I’ve done any trawling for new programs and it looks like there are plenty of new things available for lots of different types of projects. Some are a big enigmatic for my liking. NewRadial, for example is ‘data analysis for the humanities,’ but exactly what that entails isn’t exactly clear. Catma looks kind of useful though. I can imagine using it to tag texts for interesting grammatical features, for example. Text Analysis Markup System is another program in the same vein.

I couldn’t quite see the point of voyant-tools, though it does produce pretty word graphics. Nodex looks like it might be a handy network mapping tool (e.g. for mapping loanword data). It’s windows-only though, I see. OpenHeatMap is a simpler version of google’s fusion tables. Lots of bibliographical software here, including some nice plugins for Zotero. And here’s a list of transcription tools.



One of the great things about co-teaching is all the stuff you learn from your co-instructor. Arienne gave a nice demo today of TextStat, a flexible concordance program from the Dutch studies dept at the Freie Universitaet Berlin. It’s free, and available for Windows, PC, and Linux.

Its major advantage is that it will read Word and OpenOffice files. That is, you don’t need to format the input text in any special format before it’s imported into the program. It will also retrieve web pages.

As programs go, it’s pretty simple. It does wordlist generation and concordancing, and you can view citations in context or in list format. But that’s already pretty useful. It’s very memory-light and doesn’t take up much space on the hard drive. Installation is easy (just unzip the archive on windows). If you want high-powered concordance software, NLP tools are for you, but if you want an easy way to see what’s in your data, this is definitely the way to go.

Australian Language Polygons and new Centroid files

I’ve finished a *draft* google earth (.kmz) file with locations of Australian languages, organised by family and subgroup.

Some things to note:

  • You may use these files for education and research purposes only.
  • NO commercial use under any circumstances without my written permission.
  • NO republication any any circumstances without my written permission.
  • You may quote from these files. Please use the following citation: Bowern, C. (2011). Centroid Coordinates for Australian Languages v2.0. Google Earth .kmz file, available from
  • These files represent my compilation of many available sources, but are known to be deficient in a number of areas. Some sources are irreconcilable. This work is unsuitable for use as evidence in Native Title (land) claims.
  • Please do not repost or circulate these files. Send interested people to this page. I will be updating the files from time to time.
  • Please let me know of errors! The easiest way to do this is to change the polygon or centroid point for the language(s) you are correcting, and send me that item as a kml file.
  • If you use derivatives of this file (e.g. you calculate language areas from it, convert it to ArcGIS, etc), that’s fine, but please send me a copy of the derivative file

Python script to convert backslash codes to tabbed text

Sophia Gilman is a Yale student who’s been working on my NSF Pama-Nyungan project this year. One of the things she’s been working on is a script to convert irregularly ordered backslash codes to a tabbed text file (for further import into database programs). The script takes the backslash file, detects the headword code, asks you a bunch of questions about it, and sets up a file with each backslash code as a column in a table.

The script was developed specifically for our project and its needs, but it’s flexible enough that it might be useful for others too. We’re making the script available for free, but it’s Sophia’s work, and she (and the NSF project BCS-844550) should be acknowledged if you use the script in work that results in publications.

A couple of notes on the script:

  • It’s a python script. You need to have python installed on your computer to run it. If you don’t have python and you have backslash coded files for Australian languages that you would like to convert, we can help. If you’re working on another area, though, I’m afraid we can’t provide any support for script use.
  • SIL’s MDF (Toolbox) standard codes are hard-coded into the program.
  • Some features are specific to the needs of my NSF project and may be irritating to others:
  • Subentries are converted to main entries. The program makes some effort to treat material appropriate to the entry as a whole as belong to each newly created record.
  • Multiple glosses are converted to multiple records.
  • Examples are not split into multiple table columns; they are grouped into a single column.

The script is available here. If you modify it for your own use, we’d appreciate a copy.

Call for discussion

As readers may know, I’ve been compiling a database of lexical items for Australian languages (funded by the NSF). It started with Pama-Nyungan but I’m now expanding it to the rest of the families. It’s reached 600,000 items now so it’s quite a nifty research tool for looking at language relationships.
I have more than 1000 sources, including data from many people in the Australianist community, for which I’m very grateful. This data is subject to a whole bunch of different requirements, permissions, and restrictions, ranging from “sure, here’s the file, feel free to pass it on” to “I can give you this but you can’t tell anyone you have a copy.” This makes long-term planning for the materials rather complicated.
Currently the only people with access to this database are me, my research assistants, the post-doc working on the project, and 3 people I’ve given downloads to. As word is getting out about the database, I’m now receiving requests look things up. That’s great, it was a heap of work of many people to put it together and it’s right that it shouldn’t be my private play-database; there’s many life-times of work that could be done on this.
This now raises the question about how (and whether) to make the materials more available in a way that’s useful, and that protects the original copyright and IP rights of contributors. For example, I have e-copies (in some cases, re-typed copies) of material that’s still in print. I would not want the original authors to lose sales.
Here is something I’m starting to think about in terms of db development, and I’d welcome readers’ feedback. I want to stress that this will be a very long process, well beyond the life of the current grant (which will be up in 2012).
  • Users would need to register to get any access to data. In order to register, they would have to sign an agreement to respect the rights of the depositors and the database owners. This would at minimum include strictly non-commercial use and would be subject to fair use agreements.
  • There would be different user types which would provide access to different data levels. Queries might originally be limited to a single language or a certain number of word lookups, for example.
  • User types/roles might include the following (these roles are, of course, not mutually exclusive):
  • community member for a language or group (total access to sources on that language);
  • data contributor (provider of wordlists or reconstructions, in one tongue-in-cheek game-theory-oriented case, I am thinking of an access level such as ‘if you allow your data to be viewed by others, you can get full access to the data from other people who have made the same agreement’*);
  • student;
  • general researcher
  • Every single data point would be referenced with its original source (that is currently how the db has words; it is not like some of these online dbs that have the original source only on the dictionary ‘front page'; the LEGO lexicon project, for example, does that). Citations would need to be to the original data source (as well as some acknowledgement of this db).
  • I am also considering how reconstructions might be shown; for example, would all supporting data be shown, or just a subset of it?
  • There are many other issues to consider; please let me know your thoughts. Anyone is welcome to comment, and I’d particularly like to hear from potential users and Aboriginal community members.
    *In another, even more tongue-in-cheek access view, I am thinking that you can have as many search returns scaled by contributed data points; in that view, Luise Hercus and Patrick McConvell, for example, would get essentially unlimited searching, while the linguist who shall remain nameless but whose response was “over my dead body” would be restricted to, say, 10 hits.