In 2019, Translations Commons published “Indigenous Languages: Zero to Digital”, a guide to creating digital infrastructure for indigenous communities. Using flowcharts and clear instructions, it explains how to create every level of the technology stack required to make a language usable online. This easy-to-understand and ground-breaking resource was co-authored with several partners in language and technology, and in coordination with the United Nations’ International Year of Indigenous Languages.
Translation Commons is a nonprofit, volunteer-run resource-sharing platform for language professionals.
Technology underpinning digital text
Many of us who use a well-supported language online may not realize how many layers of technology underpin the implementation of language on our devices. When we buy a new phone or laptop, we barely notice these layers, because our language usually functions and displays seamlessly on the device without hiccup. But that’s not the experience for speakers of the under-resourced languages, even including some with millions of speakers. For example, Lahnda (Pakistan) with 93M speakers and Wu Chinese with 81M are not well supported.
Here are some of the foundational layers that must be implemented behind the scenes for a language to be digitally supported.
- A writing system (such as Latin, Devanagari, or a newly created system) and orthography (spelling rules) must be identified, chosen, or developed for the language.
- Each letter or character of the writing system (like capital B, lowercase c, or !) must exist in Unicode, giving each one a standard numeric code point.
- A font that displays all letters and characters in the writing system must be developed.
- Using typography design software, font designers identify and create all necessary glyphs, or graphic shapes that represent the letters and characters of the writing system.
- The font designers write rules to handle any required ligatures and other complex cases in order to combine, stack, or connect the glyphs properly, in accordance with the writing system’s conventions. (See graphic below.) This step can be labor intensive for complex writing systems.
- The font must be available on a device at the moment the user needs it.
- Keyboards must be created so that text can be input in a language. These can be either on-screen keyboards or mappings from common physical keyboard layouts such as QWERTY.
For many languages, some or all of these steps are still needed. Beyond these foundational layers, other important steps are needed for digital readiness. The guide covers the most important of these: conventions for a language’s locale (date, time, currency, numbers, etc.), segmentation and word breaking, line breaking, language identification codes, language detectors, word lists for predictive text and spelling correction, and optical character recognition for converting print images to digital text.
A guide for language advocates
“Indigenous Languages: Zero to Digital” is a resource that guides language advocates and technologists through the necessary steps in building the infrastructure to use their language on digital platforms. While the process is time-consuming, the results open a world of possibilities in digital communication, including education, commerce, entertainment, health care, and interaction on the global stage. As for language revitalization efforts, making it possible for grand kids to start texting with their grandparents in their native language will hopefully bring a welcome new flow of energy in both directions.
Congratulations to Jeannette Stewart of Translation Commons, and to all our friends who contributed to this ground-breaking project. Here’s to many years of success to the communities who wish to bring their languages online.