As we reported in our March newsletter, we were honored to contribute the entire PanLex Database to the Arch Mission Foundation’s Lunar Library™, a 30-million-page archive of civilization contained in a long-duration time-capsule that traveled to the Moon last month aboard the SpaceIL Beresheet lunar lander.
In 2011, the Internet Archive photographed nearly the entire collection of Balinese palm-leaf manuscripts (130,000 leaves in all) as part of an effort to bring out of the shadows the lesser-known literatures of the world and to inspire others to do the same.
These traditional Balinese texts were inscribed with a special triangular iron stylus on treated Lontar palm leaves that come from the Borassus fabellifer family of palms. Subjects span a variety of aspects of life, including religious ceremonies, guidelines, and magic; medical, astrological, and astronomical knowledge; epic stories, histories, and genealogies; and the performing arts and illustrations. Many of the texts are centuries old and have been re-copied many times over the years, as the leaves themselves break down over time.
The island of Halmahera is a spider-shaped island located in Indonesia’s Maluku Islands. It was these islands, the so-called “spice islands”, that several European nations sought in the 15th and 16th centuries as the source of cloves, nutmeg, and mace. Along the east coast of Halmahera, the closely related Austronesian languages Patani and Sawai (among others) are spoken. Patani has 10,600 speakers and Sawai has 12,000. I did a few days of fieldwork on Patani (in 2015) and Sawai (in 2018) and uncovered some interesting things.
An interesting feature of Patani and Sawai is that when expressing possession — expressing who owns or is associated with something, as in English “my house” or “their friends” — it is necessary to distinguish between edible and inedible items. (Many Oceanic languages, which are related to Patani and Sawai, do this as well.) But what exactly does it mean for something to be edible? The answer isn’t as obvious as it might seem.
The PanLex Database contains a large diversity of languages and dialects. This diversity allows us to explore interesting language facts, illuminated by casting PanLex’s wide net across the languages of the world.
One question, originally suggested by our founder and director emeritus Dr. Jonathan Pool was:
What’s the most common word in the PanLex Database?
To answer this question, we surveyed each word in the PanLex Database and tallied the number of languages it occurs in, regardless of differences in meaning across languages. Showing up in a grand total of 1,166 languages and dialects is:
This is actually quite expected—ma (or similar sounding words) is an extremely common word for “mother” in many languages around the world due to the fact that ma is often the first syllable babies are able to make. (See this Wikipedia article for more information).
In a recent article, linguist Gretchen McCulloch of WIRED magazine echoes PanLex’s series on radically inclusive machine translation. She notes that only a small number of languages have well-supported machine translation. Most of the world’s 7,000 languages have little or no machine translation support, including some with tens of millions of speakers. However, that could all change with the progress made by researchers using monolingual social media posts to begin constructing translation data sets. How are they doing it? …read McCulloch’s article here