Each month, PanLex generates and publishes new “fake words” such as “unequalitis” and “adjustache” to entertain our newsletter readers in the Fake Word of the Month challenge. But how, exactly, are these fake words generated? We use an emergent property of the linguistic information contained in the PanLex Database, and a simple probabilistic algorithm.
If you have used the PanLex Translator, you may have noticed that beside each translated word is a small red bar, of varying lengths. This bar represents PanLex’s translation quality score, a measure of the level of confidence we have in that particular translation of the original word into the target language. The translation quality score is based on the number of PanLex sources the translation is found in, and the quality of those sources. (PanLex can also infer translations that are not directly attested in any single source. We will leave discussion of inferred translation quality scores to a future post.)
PanLex Translator App with red bars indicating relative quality of translations of English “house” into French.
Hidden in plain sight
In 2010, I traveled to Los Angeles to meet thought leaders and funders, and raise awareness about the issue of plastic pollution – which at the time was a largely unknown problem, and one almost completely absent from the societal discussion and the mainstream media. At a fundraiser event, I happened to meet filmmaker Michael Nash, who had just finished a documentary movie entitled Climate Refugees. We instantly found commonality in our seemingly quixotic quests: we were both passionate individuals trying to shine light on global issues with massive impact, that at the time were being largely ignored by the mainstream. I owe to this meeting, and to watching his still very relevant film, an early concern with one of the biggest humanitarian and security crises our world is facing today.
Nine years later, still not enough people know that climate change has been the main source of displaced people in our world for over a decade. In fact, since 2008, weather-related events triggered by climate change have displaced an average of 21.5 million people every year.Read More…
Proper nouns (names of unique things in the world, such as Berkeley and James) can be translated in the same way as common nouns (names of classes of things, such as city and person). For example, the same city in Ukraine is known as Lviv in English, Львів (L’viv) in Ukrainian, Львов (L’vov) in Russian, Lwów in Polish, and Lemberg in German. Traditional translation dictionaries often exclude proper nouns, but the PanLex Database has many of them; it can translate Львів from Ukrainian into English. Proper noun translations often provide a fascinating window into political and cultural history. The Slavic root contained in name Lviv is a great example, as it appears in the names of two famous Russians: the novelist Leo Tolstoy and the revolutionary Leon Trotsky. This post explores how their first names were translated into English and why they do not match.Read More…
PanLex’s coverage of languages in Indonesia. (Image by Benjamin Yang, licensed under CC BY-SA)
The PanLex Database currently contains lexical translation data from 549 languages spoken in Indonesia. Each dot on this map represents one of those languages, scaled to show the number of words in that language that PanLex has collected.
With the help of our supporters, PanLex will be able to increase our coverage of languages for which we have few words, and eventually we intend to have the ability to translate to and from all 714 Indonesian languages.
On November 25, the PanLex team began a month-long stay in Yogyakarta, a city on the island of Java in Indonesia. Mataram, the historical region in which Yogyakarta is located, was controlled by several medieval and early modern kingdoms, and then for two centuries was part of the Dutch East Indies. The region is home to two famous ancient temples, Borobudur and Prambanan.
The PanLex team is in Indonesia in order to investigate ways to support local under-served languages.We chose Indonesia for several reasons. First, it has many under-served languages with large numbers of speakers, such as Javanese (84M speakers), Sundanese (34M), Batak languages (7M), Buginese (5M), and Acehnese (3.5M). Second, our team already has extensive experience in Indonesia, and two of us speak Indonesian. Finally, it is a fascinating and beautiful place to spend a month!Read More…