Tag Archives: lexicography

Word of the year

I was pretty uninspired by this year’s word of the year candidates, so I’m going to be proactive about next year’s word.I have a candidate for a new word of the year, one that I made up a while ago. It will be of use for anyone in the NE US at the moment. It’s

snow dags

and it refers to the cruddy bits of snow that accumulate on the mud flaps and hang down behind the tyres on cars when you drive in snow. Use it, spread it, and nominate it next year.


TshwaneLex again

Given that I’ve now been using TshwaneLex for a few weeks I thought I’d post an update on how it’s going. In general, it’s going pretty well.

  • If you’re going to change the DTD (and you’ll probably need to, if you have custom information or a trilingual dictionary), it’s important to plan in advance and ideally document what you’ve done. It’s easy to introduce inconsistencies and they are a pain to change later on (it’s possible, but it’s easier not to have to). For example, I ended up with the scientific name field in two places. I reimported the data to the correct field instance, checked it was ok, then deleted the unwanted field).
  • The field structure enforcement is worth the time it took to import the data. Generally sorting stuff out has been pretty good.
  • I’ve set up a few versions of the dictionary, including a wordlist, a full version, and a Toolbox export (which puts the backslash codes back before the items). This has been easier than modifying CCT exports by hand for Toolbox, although the difference isn’t all that big.
  • Editing is taking longer. The navigation is a little more time-consuming, since adding fields is a hierarchical process, but compared to the amount of time it’s taking to edit the damn dictionary because of inconsistent entry in the first place, it’s probably worth the tradeoff.
  • I really like being able to work on the English – Yan-nhangu section of the dictionary at the same time. This takes some doing in Toolbox (well, it requires the creation of two lexica…) I wish I could work on the Dhuwal section too, but since i know that language much less well than English or Yan-nhangu, the processing and addition of information won’t be as intensive. I suspect, since there’s a high degree of syntactic and semantic isomorphism between Yan-nhangu and Dhuwal, I’ll be able to reverse the Dhuwal and Yan-nhangu glosses, change the audience and layout but keep most of the structure. That’s not true for the English part of the dictionary.