Will climate change kill linguistic diversity? Or save it?

Hidden in plain sight

In 2010, I traveled to Los Angeles to meet thought leaders and funders, and raise awareness about the issue of plastic pollution – which at the time was a largely unknown problem, and one almost completely absent from the societal discussion and the mainstream media. At a fundraiser event, I happened to meet filmmaker Michael Nash, who had just finished a documentary movie entitled Climate Refugees. We instantly found commonality in our seemingly quixotic quests: we were both passionate individuals trying to shine light on global issues with massive impact, that at the time were being largely ignored by the mainstream. I owe to this meeting, and to watching his still very relevant film, an early concern with one of the biggest humanitarian and security crises our world is facing today.

Refugee shelters in the Dadaab camp, northern Kenya, July 2011. (Image by Pete Lewis, Department for International Development.)

Nine years later, still not enough people know that climate change has been the main source of displaced people in our world for over a decade. In fact, since 2008, weather-related events triggered by climate change have displaced an average of 21.5 million people every year.

How Lev became Leo and Leon

Proper nouns (names of unique things in the world, such as Berkeley and James) can be translated in the same way as common nouns (names of classes of things, such as city and person). For example, the same city in Ukraine is known as Lviv in English, Львів (L'viv) in Ukrainian, Львов (L'vov) in Russian, Lwów in Polish, and Lemberg in German. Traditional translation dictionaries often exclude proper nouns, but the PanLex Database has many of them; it can translate Львів from Ukrainian into English. Proper noun translations often provide a fascinating window into political and cultural history. The Slavic root contained in name Lviv is a great example, as it appears in the names of two famous Russians: the novelist Leo Tolstoy and the revolutionary Leon Trotsky. This post explores how their first names were translated into English and why they do not match.

Indonesian languages in PanLex

Red dots on map of Indonesia show location of Indonesian languages for which PanLex has data

PanLex’s coverage of languages in Indonesia. (Image by Benjamin Yang, licensed under CC BY-SA)

The PanLex Database currently contains lexical translation data from 549 languages spoken in Indonesia. Each dot on this map represents one of those languages, scaled to show the number of words in that language that PanLex has collected.

With the help of our supporters, PanLex will be able to increase our coverage of languages for which we have few words, and eventually we intend to have the ability to translate to and from all 714 Indonesian languages.

PanLex in Yogyakarta

Borobudur Temple with mountains in background

Borobudur Temple. (Image by Valery Bocman)

On November 25, the PanLex team began a month-long stay in Yogyakarta, a city on the island of Java in Indonesia. Mataram, the historical region in which Yogyakarta is located, was controlled by several medieval and early modern kingdoms, and then for two centuries was part of the Dutch East Indies. The region is home to two famous ancient temples, Borobudur and Prambanan.

The PanLex team is in Indonesia in order to investigate ways to support local under-served languages.We chose Indonesia for several reasons. First, it has many under-served languages with large numbers of speakers, such as Javanese (84M speakers), Sundanese (34M), Batak languages (7M), Buginese (5M), and Acehnese (3.5M). Second, our team already has extensive experience in Indonesia, and two of us speak Indonesian. Finally, it is a fascinating and beautiful place to spend a month!

Enabling Radically Inclusive Machine Translation (part 3)

In the first two posts in this series, we elaborated our belief that all people should be able to use their native language to exercise human rights and have access to opportunity. We showed that machine translation technology currently falls far short of this goal, but that there are realistic ways to make progress. In this third and final installment, we will describe in more detail our work at PanLex and how we are uniquely positioned to improve translation support in under-served languages.

We consider under-served languages to be those lacking institutional support from governments or support from major technologies such as Google Translate, Android, or Microsoft Windows. Of the world’s 7,500 languages, 6,900-7,400 are under-served. More than 2 billion people speak under-served languages, including large languages such as Western Punjabi (90M speakers), Javanese (84M), Wu Chinese (80M), Egyptian Arabic (62M), and Uyghur (10M).

Uyghur boys

Uyghur boys. (Image by OMF)

