Mark Sanford did a comprehensive talk on i18n at Twitter.
He was on the Summize team that Twitter bought, so originally worked on search, then ended up being the i18n guy.
Twitter had a lot of things stacked against it i18n-wise:
- unilingual source tree
- no budget to do any real engineering initially
- Japanese tree translated by outside volunteer partner, Digital Garage
- primitive Unicode support in MySQL and Ruby
- now a legacy system full of data, hard to add metadata like lang/locale at this point.
Japanese cell phone support was challenging because:
- Shift_JIS, not Unicode
- each of 3 carriers uses a different image format and different emoji codepoints, some overlapping
- lack of l10n resources in house for Japanese-flavor design, including being cute, dense and also having an ad to demonstrate business seriousness
- Japan uses cellular emails, not SMS like other places
- mobile browsers don’t support cookies, so URL sessions needed unlike the regular Ruby web app
- hard to tokenize short messages in Japanese.
- need QR (Quick Response) code support (2D barcode)

QR Code
Remarkably, Twitter was invited by a number of carriers to support their phones.
Crowdsourced translations using Google Groups, interns and app integrated with Twitter site. Now up to 3,700 strings and 2,600 translators. Hard to translate informal terms like tweet and follower though.
There was very good turnout, with 60 attendees live and 6 online.
Thanks to Adobe for hosting the venue. (Though I don’t understand why there is a 3-year NDA for attending a public meeting.)
techcrunch.com: Twitter Has Basically Doubled In Staff In The Past 6 Months (June, 2010)
Official Twitter Blog


