Edible Possession in Halmahera

April 9, 2019 ||

The island of Halmahera is a spider-shaped island located in Indonesia’s Maluku Islands. It was these islands, the so-called “spice islands”, that several European nations sought in the 15th and 16th centuries as the source of cloves, nutmeg, and mace. Along the east coast of Halmahera, the closely related Austronesian languages Patani and Sawai (among others) are spoken. Patani has 10,600 speakers and Sawai has 12,000. I did a few days of fieldwork on Patani (in 2015) and Sawai (in 2018) and uncovered some interesting things.

An interesting feature of Patani and Sawai is that when expressing possession — expressing who owns or is associated with something, as in English “my house” or “their friends” — it is necessary to distinguish between edible and inedible items. (Many Oceanic languages, which are related to Patani and Sawai, do this as well.) But what exactly does it mean for something to be edible? The answer isn’t as obvious as it might seem.

beach with ocean and sky

View from the beach in Lelilef, a Sawai village. (Photo by author.)

Read More…

The Most Common Words in PanLex

March 7, 2019 ||

The PanLex Database contains a large diversity of languages and dialects. This diversity allows us to explore interesting language facts, illuminated by casting PanLex’s wide net across the languages of the world.

One question, originally suggested by our founder and director emeritus Dr. Jonathan Pool was:

What’s the most common word in the PanLex Database?

To answer this question, we surveyed each word in the PanLex Database and tallied the number of languages it occurs in, regardless of differences in meaning across languages. Showing up in a grand total of 1,166 languages and dialects is:


This is actually quite expected—ma (or similar sounding words) is an extremely common word for “mother” in many languages around the world due to the fact that ma is often the first syllable babies are able to make. (See this Wikipedia article for more information).

Painting of mother standing and holding baby in her arms, foreheads touching.

Mother and baby. (Image by KaMenezes.)

Read More…

Radically inclusive machine translation gets mainstream press

March 7, 2019 ||

In a recent article, linguist Gretchen McCulloch of WIRED magazine echoes PanLex’s series on radically inclusive machine translation. She notes that only a small number of languages have well-supported machine translation. Most of the world’s 7,000 languages have little or no machine translation support, including some with tens of millions of speakers. However, that could all change with the progress made by researchers using monolingual social media posts to begin constructing translation data sets. How are they doing it? …read McCulloch’s article here

PanLex on the Moon

February 21, 2019 ||

PanLex has long envisioned having a global impact for the good of humanity. Now PanLex is going beyond Earth, to the Moon!

Full moon as seen from space with dark sky in background.

Full moon. (Image by Wikimedia Commons.)

Press release:

The Arch Mission Foundation today announced the upcoming launch of the first installment of their Lunar Library™, a 30 million page archive of civilization, created as a backup to planet Earth. The library will be delivered to the Moon as part of SpaceIL’s lunar mission, scheduled for launch on Thursday, February 21st, starting 8:45 PM EST.

Israel’s SpaceIL Beresheet lunar lander launched from Cape Canaveral on a used SpaceX Falcon 9 rocket. It is traveling with Nusantara Satu, an Indonesian communications satellite, and a U.S. Air Force satellite and is scheduled to land on the Moon in April 02019.Read More…

Fake Words Are Based On Real Words

January 14, 2019 ||

Each month, PanLex generates and publishes new “fake words” such as “unequalitis” and “adjustache” to entertain our newsletter readers in the Fake Word of the Month challenge. But how, exactly, are these fake words generated? We use an emergent property of the linguistic information contained in the PanLex Database, and a simple probabilistic algorithm.

Translation quality

If you have used the PanLex Translator, you may have noticed that beside each translated word is a small red bar, of varying lengths. This bar represents PanLex’s translation quality score, a measure of the level of confidence we have in that particular translation of the original word into the target language. The translation quality score is based on the number of PanLex sources the translation is found in, and the quality of those sources. (PanLex can also infer translations that are not directly attested in any single source. We will leave discussion of inferred translation quality scores to a future post.)

PanLex Translator App with red bars indicating relative quality of translations of English “house” into French.

Read More…