In 2019, Translations Commons published “Indigenous Languages: Zero to Digital”, a guide to creating digital infrastructure for indigenous communities. Using flowcharts and clear instructions, it explains how to create every level of the technology stack required to make a language usable online. This easy-to-understand and ground-breaking resource was co-authored with several partners in language and technology, and in coordination with the United Nations’ International Year of Indigenous Languages.
Translation Commons is a nonprofit, volunteer-run resource-sharing platform for language professionals.
Words for animals often have interesting histories. Some, like English mouse, have remained almost unchanged for centuries (millennia, if you go back to Indo-European). Others, like English dog, can be tracked only so far before the trail runs dry. The word for bear was altered in many Indo-European languages through a process called taboo deformation.
This post brings together some English small animal names with interesting histories, including some bonus notes on other languages.
The immediate source of English squirrel is Anglo-French esquirel, in turn derived from Old French escurueil (compare Modern French écureuil). This word derives from Vulgar Latin *scuriolus (the asterisk means the form is reconstructed—inferred from evidence rather than directly attested), which is a diminutive of *scurius. The reconstructed *scurius is a metathesized variant of attested Latin sciurus ‘squirrel’. We cannot say for certain why sciurus was metathesized into *scurius, but a likely contributing factor is that *scurius better fits typical Latin word patterns; Latin has many nouns ending with -ius and few words beginning with sciu-.
Statue of Saxon leader Widukind in Herford, Germany. (Image by M. Kunz)
Every November 5, the United Kingdom celebrates Guy Fawkes Night. Guy Fawkes was an Englishman who attempted to blow up the House of Parliament in 1605. The story is fairly well known—but why was this guy named Guy? What kind of a name is that, anyway? As it turns out, it’s kind of a long story!
Proto-Germanic, the reconstructed ancestor language of Germanic languages such as English and German, had a word *widuz ‘wood’—this, in fact, is the source of the English word wood. This root was used in names such as Old Saxon Widukind, literally ‘child of the wood’. These names could be shortened to Wido. The short form was borrowed into Old French as the name Guy and into Italian as Guido. The initial g-sound was added to fit the sound pattern of these languages; neither allowed w at the beginning of a word, and borrowed words originally beginning with w were pronounced with g. (The same process is evident in French guerre and Italian guerra ‘war’, which derive from a Frankish word similar to English war.)
On October 25, 02019, PanLex was honored to present the first keynote speech at WikidataCon in Berlin, Germany. As our representative, I was excited to share PanLex’s ideas about the importance of linguistic diversity and lexical data’s role in helping to preserve that diversity with the staff, volunteers, and users of Wikidata.
The Wikidata audience was wonderfully receptive to PanLex’s mission and work. A significant portion of the talks and workshops at the conference were on how Wikidata can help underserved, minority, and indigenous language communities, so the ground was ripe for discussions of how our respective missions aligned. Read More…
Sugi Lanus (left), the author, and other contributors to the lontar project at a cafe in Denpasar.
In the previous two updates, we described the Balinese lontar digitization project that PanLex is managing for Internet Archive. The goal is to continue the digitization of the Balinese Digital Library’s scanned lontar (palm-leaf manuscripts) by transcribing them into Unicode text, using the keyboards discussed in the last update. This work has now gotten underway in earnest, with over 2,000 lontar leaves transcribed and available at Palmleaf.org, comprising more than 60 complete works! Our current goal is to transcribe 3,000 leaves by the end of October.
The transcribed lontar are mostly in Kawi (Old Javanese), Balinese, or a mixture of the two, all written in Balinese script. The works cover a wide range of fascinating topics. There are chronicles (babad), medicinal texts (usada), mantras, several genres of poems ranging from high style (kakawin) to colloquial (geguritan), village regulations (awig-awig), horoscopes, classifications of things (carcan), and more. One entertaining example is Carcan Kucing, a “classification of cats” that serves as a guide for choosing a cat. Another is Pangayam-ayam, a cockfighting horoscope; it is a bettor’s guide organized by calendar date, suggesting which cocks are likely to win on each day.