IMUG: Building Scalable i18n and L10n tools for 300 Languages at Wikipedia

Wikipedia Logo
Alolita Sharma, Director of Engineering at Wikipedia, did a talk tonite at IMUG on “Building Scalable i18n and L10n tools for 300 Languages at Wikipedia”.

Alolita Sharma

She is driving the initiative for Wikipedia to build open source tools and technologies to support hundreds of languages for Wikipedia. An engineering manager and software engineer, she has been working with open source software and has promoted open source adoption for more than a decade.

Some Wikipedia statistics:

– 24.6 million articles
– 286 languages
– 500 million unique users monthly
– 22 billion page views per month
– 310 incubator languages (596 total languages)
– 792 wikis for projects plus Mediawiki

Some notes on her talk:

– WMF Language Engineering Team (10 staff)
– equality for all languages
– great user experience for all languages
– nobody wants tofu blocks (empty rectangular glyphs for missing font) as their language user experience – jQuery.webfonts
– 50+ high-quality fonts for 20+ non-latin scripts in font formats ttf, woff, eot, svg
Language Coverage Matrix for Wikimedia websites
– interesting demo in lohit devenagari, other Indian languages, phonetic keymaps
– UIs are easy compared to computationally-hard problems like content translation and selection
– crowdsourced translations
– audience question about showing context for string translations
– you might get 70% accuracy in machine translation of English, but 10% in other languages.
– big difference between rich web and new mobile platform support
– crowdsourced translation works when there is a common interest (twitter, facebook, WMF)
– crowdsourced translation quality is not an issue for WMF because they guinea-pig their users and feedback is provided rapidly
– use CLDR, but there needs to be bulk import tools to really give back
– listen-in on Unicode Consortium and provide comments, too small to drive it

Wikipedia talk audience members

Alolita is on the board of the Open Source Initiative, adviser to Software Freedom Law Center and a passionate advocate of open source and the open Web. She holds Bachelors and Masters degrees in Computer Science and speaks internationally on language technologies, i18n, L10n, open web standards, open source trends, technologies and building successful developer communities.

Joe Katz did quite an IMUG introduction, strolling down memory lane back to 1987.

Thanks again to Adobe for hosting the event tonite.

IMUG Meetup: Home Page, This Event’s Comments

This entry was posted in i18n, Open Source, Tech, User Groups. Bookmark the permalink.

Leave a Reply

Your email address will not be published.

This site uses Akismet to reduce spam. Learn how your comment data is processed.