Category Archives: Technology and Software

Google Drive is dangerous

Following yet another box.com sync failure, I decided to migrate my lab’s digital files to google drive. The reasons for picking them were that Yale offers unlimited storage, and that many of my students use the web version of google docs to draft materials. We were having issues that students weren’t syncing their files with box regularly.

So, about a week ago I copied a lot of folders from box to drive, and removed the materials from box once it was clear that google drive had synced. All the files are showing on the google drive web site.

However, google drive offline did not sync more than 3 folders deep, even though it recognized the directory structure. The folders are simply empty. Moreover, it seems to have had a lot of trouble with files which aren’t .docx, .xlsx, .pdf or .mp4. Given that all the materials are on the web drive, there’s been no data loss. I just need to resync or at worst re-download the folders, right?

Not so simple. There’s no way to force google drive to resync. Signing out and singing in again is supposed to “encourage” it to sync again, but if the local copy of the file structure is corrupt, that won’t help. So, we download the folder off the web version, right? Again, not so simple. There’s a 2gig file limit on downloading, and if the limit is reached, google drive produces an error, but it also just goes ahead with the download.

I am obsessive about backing up in multiple locations as well as web backups, so I haven’t lost any data. It’s just a bit of a pain to re-integrate the recently modified files.

And it’s safe to say that we will *not* be using google drive for actual backing up. We will probably continue to use it for sharing work in progress and writing the articles associated with the projects, but for actual data curatorship, we’ll be going with dropbox.

Introducing CHIRILA

I am very pleased to announce that the first phase of CHIRILA (Contemporary and Historical Resources for the Indigenous Languages of Australia) has been released. This represents approximately 180,000 words from 155 different Australian languages. It is a subset of the full database (of approx 780,000 items); eventually I hope to be able to release most of the data. Currently, the first phase is that for which we have explicit permission, or which is already in the public domain.
The material is hosted at pamanyungan.net/chirila; please see the web site for more information about the contents of the database, how to download data, what formats are available, and the like. We do not provide a web interface to the data; you download it and use excel or a database program to read the files.
We hope the data will be useful to researchers, community members, and others with an interest in Australia’s Indigenous language heritage.
pamanyungan.net/chirila also includes access to the preprint of a paper describing the database (both the online and full versions).

Pama-Nyungan language locations

As noted in a previous post, I’ve started to put some of the results of my Pama-Nyungan prehistory grant on my lab web site, at pamanyungan.net. One of the recent updates is a language map. The data are not new; this map was released in about 2011 (though with updates since). It is released through a wordpress plugin on the PamaNyungan.net site, which allows easy embedding of maps into sites. I highly recommend it for its ease of use, except for the fact that it doesn’t seem  to render in Chrome on a Mac (at least, not on my mac).

Comments on language locations, names, etc, on the map are very welcome. Please use the comment form on the map’s page.

Ipads for research

I’m taking part in a trial of ipads for the field methods class this semester. I’m not totally convinced that it’s going to work yet, since I’m a bit suspicious of the recording capabilities and of how seamless it will be to get items on and off the devices. We will certainly be making backup recordings using my field equipment for at least the first few weeks.

However, one of the side effects of this is that I’ve been spending a lot more time working on an ipad recently, trying out apps. I’m even not taking my laptop to the LSA (I’m writing this post on an ipad on the plane to Minneapolis).

Couple of observations:

The ipad I’m trialling came with a ‘Zagg’ keyboard case. The keyboard itself is quite good. It’s comfortable to use and very responsive. The cover itself is rather clunky and heavy, and the charging position for the keyboard is in an irritating position (the keyboard has to be partly removed from the cover to charge it). It’s also fairly straightforward to pair the keyboard with multiple ipads.

I have an ipad mini and while that size of tablet is mostly great, it is very helpful to have the larger size when working on latexed documents. My ipad mini is also heavily child-proofed, which makes it almost impossible to use with a stylus. I have yet to find a decent handwriting ap that might be useful for field methods. Let me know if anyone knows of one (the stumbling block is the need to be able to use handwriting recognition with accented characters).

We are using Auria for the recording app, dictapad for transcription, and we will be loading the class data into LingSync (which has an online version for minimal data entry). We are syncing files through Dropbox and Box. TeX Writer is great (LaTeX app allowing fill compilation on the ipad) and Zotero for reference management.

So far the biggest issues have been a) the usual problem of syncing between multiple devices and making sure they are all up to date (forgot to do that before leaving…) and b) only having one window at a time. On the other hand, only having one window does make email much less of a distraction.

I will continue to provide updates as the semester progresses and we use the ipads.

domain name update

The domain http://www.anggarrgoon.org is now defunct. (I registered it in 2005 through yahoo “small business” as part of my Houston home phone account, and now can’t recover the account information. It’s somewhere in internet limbo.) My blog is still available from anggarrgoon.wordpress.com, though. At some point I may even have time to write some more contentful posts.

Great digital tools

Nick Thieberger has a great post on new digital tools in the humanities (bleeding over into linguistics). It’s been a while since I’ve done any trawling for new programs and it looks like there are plenty of new things available for lots of different types of projects. Some are a big enigmatic for my liking. NewRadial, for example is ‘data analysis for the humanities,’ but exactly what that entails isn’t exactly clear. Catma looks kind of useful though. I can imagine using it to tag texts for interesting grammatical features, for example. Text Analysis Markup System is another program in the same vein.

I couldn’t quite see the point of voyant-tools, though it does produce pretty word graphics. Nodex looks like it might be a handy network mapping tool (e.g. for mapping loanword data). It’s windows-only though, I see. OpenHeatMap is a simpler version of google’s fusion tables. Lots of bibliographical software here, including some nice plugins for Zotero. And here’s a list of transcription tools.

Enjoy!

TextStat

One of the great things about co-teaching is all the stuff you learn from your co-instructor. Arienne gave a nice demo today of TextStat, a flexible concordance program from the Dutch studies dept at the Freie Universitaet Berlin. It’s free, and available for Windows, PC, and Linux.

Its major advantage is that it will read Word and OpenOffice files. That is, you don’t need to format the input text in any special format before it’s imported into the program. It will also retrieve web pages.

As programs go, it’s pretty simple. It does wordlist generation and concordancing, and you can view citations in context or in list format. But that’s already pretty useful. It’s very memory-light and doesn’t take up much space on the hard drive. Installation is easy (just unzip the archive on windows). If you want high-powered concordance software, NLP tools are for you, but if you want an easy way to see what’s in your data, this is definitely the way to go.