PanLex: Become a summer intern

Fort Mason Center

The PanLex Project of The Long Now Foundation is building a database to document all known translations among all the words of all languages in the world.

Supplementing its year-round local and remote volunteer opportunities, the project is now offering internships in San Francisco during the summer of 02013. Interns will receive training and practice in lexical data processing, as we add millions of translations to the database.


As a PanLex summer intern, you will receive training in the process of validating legacy lexical data in a variety of formats, languages, and scripts and converting this knowledge into a unified structure. You will become an active participant in the enrichment of the (currently 18-million-word) PanLex database. The internship will require a half-time commitment from late June to early August, leaving time for part-time employment and/or enjoyment of Bay Area attractions.



The internship will take place near the offices of the PanLex and Rosetta Projects at Fort Mason Center in San Francisco. The work schedule will be 20 hours per week, Mondays through Thursdays, from 10 a.m. to 12:30 p.m. and 1:30 p.m. to 4 p.m., for 7 weeks, from 24 June to 8 August. Each intern will also have the option of doing additional supervised work, which may involve designing and implementing a subproject based on the intern’s particular skills and academic objectives. Interns selecting this option will work 10 more hours per week and/or a 20-to-30-hour supplementary week.

Our program will begin with 4 days of intensive training during week 1. In week 2, you will begin adding knowledge to the PanLex database in a mentored group environment, while also receiving supplemental training one day each week. During the final week some time will be available for you to document your methods and experience, including any reporting that your home institution may expect.

Your training will cover aspects of lexical database organization, language identification, character encoding, character normalization, Unicode compliance, lemmatic standardization, lexical classification, standards for lexical data and documentation, and lexical resource parsing with regular expressions.

The PanLex Project will provide your training, mentoring, working space, Wi-Fi access, and training-day lunches. The project will not provide monetary compensation to you as an intern. Our personnel will cooperate with your efforts to obtain a fellowship, credit, and/or certification of accomplishment. In addition, your contributions of knowledge to the PanLex database will have your name on them.

Details are subject to change.

For this internship, we think you need:

All the better if you also have:

To apply

Please send an email message describing your interests and qualifications to jobs@longnow.org with a subject line that reads “PanLex Project - Summer Internship”. Applications received by 1 March 02013 will receive priority.

Valid XHTML 1.1!