As readers may know, I’ve been compiling a database of lexical items for Australian languages (funded by the NSF). It started with Pama-Nyungan but I’m now expanding it to the rest of the families. It’s reached 600,000 items now so it’s quite a nifty research tool for looking at language relationships.
I have more than 1000 sources, including data from many people in the Australianist community, for which I’m very grateful. This data is subject to a whole bunch of different requirements, permissions, and restrictions, ranging from “sure, here’s the file, feel free to pass it on” to “I can give you this but you can’t tell anyone you have a copy.” This makes long-term planning for the materials rather complicated.
Currently the only people with access to this database are me, my research assistants, the post-doc working on the project, and 3 people I’ve given downloads to. As word is getting out about the database, I’m now receiving requests look things up. That’s great, it was a heap of work of many people to put it together and it’s right that it shouldn’t be my private play-database; there’s many life-times of work that could be done on this.
This now raises the question about how (and whether) to make the materials more available in a way that’s useful, and that protects the original copyright and IP rights of contributors. For example, I have e-copies (in some cases, re-typed copies) of material that’s still in print. I would not want the original authors to lose sales.
Here is something I’m starting to think about in terms of db development, and I’d welcome readers’ feedback. I want to stress that this will be a very long process, well beyond the life of the current grant (which will be up in 2012).
- Users would need to register to get any access to data. In order to register, they would have to sign an agreement to respect the rights of the depositors and the database owners. This would at minimum include strictly non-commercial use and would be subject to fair use agreements.
- There would be different user types which would provide access to different data levels. Queries might originally be limited to a single language or a certain number of word lookups, for example.
- User types/roles might include the following (these roles are, of course, not mutually exclusive):
Every single data point would be referenced with its original source (that is currently how the db has words; it is not like some of these online dbs that have the original source only on the dictionary ‘front page’; the LEGO lexicon project, for example, does that). Citations would need to be to the original data source (as well as some acknowledgement of this db).
I am also considering how reconstructions might be shown; for example, would all supporting data be shown, or just a subset of it?
- community member for a language or group (total access to sources on that language);
- data contributor (provider of wordlists or reconstructions, in one tongue-in-cheek game-theory-oriented case, I am thinking of an access level such as ‘if you allow your data to be viewed by others, you can get full access to the data from other people who have made the same agreement’*);
- general researcher
There are many other issues to consider; please let me know your thoughts. Anyone is welcome to comment, and I’d particularly like to hear from potential users and Aboriginal community members.
*In another, even more tongue-in-cheek access view, I am thinking that you can have as many search returns scaled by contributed data points; in that view, Luise Hercus and Patrick McConvell, for example, would get essentially unlimited searching, while the linguist who shall remain nameless but whose response was “over my dead body” would be restricted to, say, 10 hits.