Archive for the ‘Tech’ Category

IMUG: How Google Built a Strong & Robust I18N Organization in Four Years

Friday, August 20th, 2010

At IMUG tonite Manish Bhargava from Google reprised his talk on “How Google Built a Strong & Robust I18N Organization in Four Years”, previously presented at the WorldWare Conference. Manish is the product manager for Google’s 40 language initiative.

This was a fairly non-technical general talk on Google’s efforts to realize their mission statement.

What was most notable about the talk is that no mention was made of where their i18n staff came from. Google largely gained their deep Internet i18n knowledge from hiring former Netscape and IBM ICU staff. Currently the group hires based on referrals of experienced people.

It was decided to pick the 40 most natural languages as they represented 99.7% of web traffic. (To get to 100% would require 120 more languages.) Google search itself is in 113 languages, and GMail in 54, soon to be 58. Eric Schmidt, Google’s CEO, is a strong supporter of this effort and quality of user experience is considered more important than cost of translation.

Lux-IQ: program to get feedback on international User experience and localization quality of various Google products from a network of in-market evaluators.

Example findings:

Issue type

Language/translation, interaction design, feature missing, feature bugs, visual design, data quality, other. Total.

Google translation toolkit used for ads. Machine translation. Some ad customers request translation into 50 languages for example.

Language Findits. 3 hour testing party for language-related products. Very successful.

Language console would help with finding already translated strings.

Globalization continuum

I18n prd, intl 1-stop, i18n checklist, country planningn legal, content, l10n checklist, translation, review and qa

I18n, planning, deployment

All is global, weekly pushes, 0.25 seconds for search query response

Quality is more important than cost.

High level advice alone – not effective
Deliver concrete solutions, hands-on
Adapt to product needs, constraints and priorities
Earn credibility
Success breeds success
Be persistent

Metrics: intl revenue, top10 problems
Graph of i18n api adoption

Challenges

Unicode redesign
Bidi in webapps
Broad range of environments
I18n technologies
Deep dives: android, chrome, gmail, youtube: to help critical area, new areas

40 language initiative

Take aways

I18n by design
Educate, evangelize, communicate
Design globally, implement locally
Build credibility. Success breeds success.
Retrofitting happens. C’ets la vie. Learn from it.
“Make it easy to do right, and hard to do wrong.”

3 engineers for 7 months to fix gmail

Thanks to Google for hosting.

Twitter: IMUG

# XIHA Connects Facebook and Twitter Friends with New Multilingual Translation http://tinyurl.com/2ankev2 #L10n about 3 hours ago via web

# Zynga Launches First Localized Game In China: Texas Poker. http://tinyurl.com/2fxp9tm #L10n about 3 hours ago via web

# @localization sorry didn’t see your Q (no hashtag) but their i18n team seems fairly centralized but serves all projects and offices. about 4 hours ago via web in reply to localization

# And the event is over! Thank you Manish, and thank you Andrew Swerdow & the Google i18n intergrouplet for hosting! #imug408 about 6 hours ago via web

# Question: is there a process for self-localization of smaller language? Yes, for example Search recently translated into Hawaiian. #imug408 about 6 hours ago via web

# Question: why 40 languages? Those 40 can reach 99.7% of all internet users. Actually 42 now. ~100 more needed for remaining 0.03%. #imug408 about 6 hours ago via web

# Many questions from audience. One was how well-integrated is bug-management system? Manish summarized end-to-end process for that. #imug408 about 6 hours ago via web

# @renatobeninatto approximately 60 attended tonight’s Google #i18n event. #imug408 about 6 hours ago via web in reply to renatobeninatto

# Manish now summing up: #i18n by Design and other take-aways for any organization. #imug408 about 6 hours ago via web

# Google #i18n API adoption has grown 173% since start of the 40 language initiative. #imug408 about 6 hours ago via web

# Google has a team dedicated to #Unicode “Redesign”. #imug408 about 6 hours ago via web

# First two points on successful #i18n: not high-level advice alone; deliver concrete solutions or even hands-on help. #imug408 about 6 hours ago via web

# How can an #i18n team make an impact on projects from the outside? Manish offers 6 points. #imug408 about 6 hours ago via web

# Amazing how much the Web challenges but also offers opportunities in #i18n. #imug408 about 6 hours ago via web

# Interesting timeline now of Google’s Globalization process from i18n thru Planning to L10n. But I’m not going to give it away. :-) #imug408 about 6 hours ago via web

# Apparently this internal Language FindIt program results in far less mischief than other firms’ community translation efforts. :-) #imug408 about 6 hours ago via web

# Manish now onto Language FindIts: Googlers identifying translation & #L10n issues to improve Google products in own languages. #imug408 about 6 hours ago via web

# Interesting discussion going on with audience about MT vs. transcreation. And rule-based vs. statistical translation. #imug408 about 6 hours ago via web

# @renatobeninatto Yes they do use Google Translator Toolkit internally for example in automated ad translation. User can then edit. #imug408 about 6 hours ago via web in reply to renatobeninatto

# @renatobeninatto I found Petra, she says Hi, but she is still making me ask your question. :-) #imug408 about 6 hours ago via web in reply to renatobeninatto

# @ken_lunde yes similar to Wordware but the entrance fee was far less! :-) Very good presentation, great interaction with audience. #imug408 about 6 hours ago via web in reply to ken_lunde

# Google has a program called Lux-IQ to get feedback from local market-savvy non-technical users in all 40 language markets. #imug408 about 6 hours ago via web

# #i18n quality issues include not only basic encoding and locale issues, but also missing features important locally. #imug408 about 7 hours ago via web

# The presentation is now turning to quality issues in #i18n. #imug408 about 7 hours ago via web

# Manish is also presenting case studies, such as their experience with Google Video and Unicode (pre-YouTube). #imug408 about 7 hours ago via web

# And by the way Google is hiring #i18n engineers! #imug408 #jobs about 7 hours ago via web

# Manish has given us perspective on Google’s incredible global growth, and the start of Google’s 40-language initiative in 2007 #imug408 about 7 hours ago via web

# Manish Bhargava, Google #i18n Product Manager, is presenting. #imug408 about 7 hours ago via web

# Tonight’s IMUG event at Google kicked off 1/2 hour late, big crowd not enough badges. :-) #imug408 about 7 hours ago via web

# Going to tonight’s IMUG event @ Google? Maps, directions and more: http://www.imug.org/google/ #imug408 about 10 hours ago via web

# Shirley_Rogers Twitter Unicode Hashtags – http://bit.ly/9ciNuu about 12 hours ago via web Retweeted by i18n_mug

# 57 yes, 5 maybe RSVPs: 8 left for tonight’s 70 chairs. Will it be SRO? Google’s Manish Bhargava is an #i18n star! #imug408 about 12 hours ago via web

# Localization Project Manager, Net-Translators, Sunnyvale, CA. Just posted to IMUG Jobs: http://www.imug.org/jobs/ #L10n #jobs about 13 hours ago via web

# Maps and directions to tonight’s 7 PM Google #i18n event in Mountain View, CA: http://www.imug.org/google/ #imug408 about 14 hours ago via web

# IMUG cannot do webcasts from Google yet. Hope to see you all there tonight! http://tinyurl.com/2f9esas Hashtag will be #imug408 about 14 hours ago via web

# RT @ken_lunde Two font- and CJKV-related Tech Notes now live. http://tinyurl.com/23ffulg & http://tinyurl.com/yzd3hjj <–Kazuraki font! about 16 hours ago via web

# Kazuraki: Adobe’s Groundbreaking New Japanese Typeface http://tinyurl.com/2by46nn Next month’s IMUG event, @ Adobe #imug408 about 16 hours ago via web

# TONIGHT 7 PM: The Google i18n Story http://tinyurl.com/2f9esas Hashtag for this IMUG event @ Google will be #imug408 about 16 hours ago via web

# cathywissink RT @TalkStandards Nascent Web Open Font Format is getting boost thanks to W3C’s new initiatives http://bit.ly/biE85M #typography about 16 hours ago via web Retweeted by i18n_mug

Java and the Software Patent Minefield

Friday, August 13th, 2010

I was always skeptical of Sun’s possessive and schizophrenic licensing of Java … originally CDDL (Open Source, but not quite Free), then licensed under GPL2 in 2006 but with numerous patents filed.

Some versions had “classpath exceptions”, like Standard Edition (SE), and some didn’t, like Mobile Edition (ME.)

So I stuck with C/C++ and Unix scripting languages like Perl, which don’t rely on any one company.

Oracle has clarified what those Java patents mean, with a lawsuit against Google for using Java, over 7 software patents originally granted to Sun. They even tossed in some copyright violation complaints.

(Oracle/Sun also has numerous restrictions on their downloadable Java binaries, including right of agreement termination at any time.)

The US Patent Office created a software and business method process minefield when it allowed patents on the most trivial of ideas reduced to practice.

One of the patents being litigated even involves the JAR format.

This is just the latest example of why software patents are of no benefit, except to monopolists who want to impede progress and openness.

allthingsd.com: Love, Larry: Here is the Oracle Statement and Final Complaint Versus Google
cnet.com: Sun settles Kodak’s Java suit for $92 million (2004)
cnet.com: Sun picks GPL license for Java code (2006)
cnet.com: Why Oracle, not Sun, sued Google over Java

Aviation Incidents: Ted Stevens Alaska Crash, JetBlue Flight Attendant Escape

Wednesday, August 11th, 2010

There’s been some aviation incidents this week …

Ted Stevens Crash

The small airplane crash in Alaska that killed Ted Stevens and others generated a lot of inaccurate quotes in the press.

Based on looking at the aerial photograph of the crash scene, it’s obvious from the relatively long tree damage path, resulting in gradual deceleration, and largely intact fuselage, that the accident was highly survivable. That’s contrary to the aerial observers’ quote.

Also, the press harped on the lack of a filed flight plan, which is not required for VFR flights. However, some kind of flight plan should be announced to either the FAA or family and friends when cross-country flights are involved so that somebody will notice you’re overdue. Doubly so when VIPs are involved – narrowing SAR down can save millions of dollars.

It looks like in this case a specific landing time was not relayed to the lodge, and they only realized the flight was overdue when making dinner reservations for their expected guests.

Certainly this crash is going to spotlight what an ex-senator, ex-NASA employees, lobbyists and GCI were doing out there.

cnn.com: Untamed Alaska challenges pilots

JetBlue FA Escape

Regarding the JetBlue flight attendant losing it and activating the emergency slide with a beer in hand … although entertaining to read about, this incident indicates a lack of training for dealing with unpleasant situations. The flight attendants are required crew members who are primarily there to maintain the safety and security of the cabin during flights – they can’t “just lose it.”

Perhaps flight attendants involved in an altercation with a passenger should call another FA and switch stations to depersonalize the incident.

Example: if a pax drops a bag on a FA’s head, the FA should take a second to ensure everybody’s ok and then automatically call another FA to switch stations.

I imagine this incident will result in much greater scrutiny of flight attendants and their actions, making the job even more difficult than it already is.

And tampering with an aircraft is not something professionals want to make light of. Activating the emergency slide temporarily disabled that aircraft for flight use, resulting in costs to repack the slide and possibly a missed revenue trip, as well as endangering people on the ground.

It also plants a bad idea in the minds of those passengers who suddenly want off the plane, and try to emulate him.

NBC.com: slide activation video

25th Anniversary of Japanese 747 Crash

Japan had its worst aviation crash August 12, 1985. A 747 with 520 people aboard crashed into a mountain. The relatives still climb the mountain each year to remember the victims.

FAI AFSS – Planning A Flight to Alaska
avweb.com: A Jet Blue FA Loses It
wikipedia: Flight Attendant
avweb.com: Ted Stevens Crash: A Nasty Reminder (of Alaska Bush Syndrome)

Three Weeks to Create New Twitter Account

Monday, August 9th, 2010

TwitchI’ve been trying periodically since OSCON on July 19 to create a Twitter account for @ActionMessage, but kept getting an error page with “Internal Server Error” from twitter.com.

After 3 weeks signup finally worked … yay!

However, the first account confirmation email never arrived (verified by looking at my MTA log), so I had to request it again.

Twitter.com engineers, here’s 2 tips for reliably sending email programmatically:

  1. Have your program inject the message to an MTA relay that is located inside your data center (www.twitter.com and mx006.twitter.com seem to be on same network segment, so that looks ok)
  2. Do program error checking and retry email message injection if it fails, and log the application error so ops can figure out why. (The resend_confirmation_email link could be instrumented with query-string parameters to help diagnose problems.)

I guess part of the charm of Twitter is its unreliability, though that needs to change as it targets paying business clients.

@ActionMessage
pingdom: twitter/home

YouTube: Subsidizing Internet Video for the World

Monday, August 9th, 2010

YouTube LogoHere’s some links related to YouTube subsidizing Internet video for the entire world. Thanks, Google!

(youtube.com domain name registered Feb. 15, 2005.)

blog.forret.com: Youtube bandwidth: terabytes per day (2006)
slate.com: Do You Think Bandwidth Grows on Trees? (2009)
Arbor Networks, the University of Michigan and Merit Network To Present Two-Year Study of Global Internet Traffic At NANOG47 (2009)
YouTube myth busting (2009)
mashable.com: Viacom Loses $1 Billion Lawsuit Against YouTube (2010)
socialtimes.com: Google CFO Reveals Viacom’s Lawsuit Cost YouTube $100 Million
youtube-global.blogspot.com: YouTube wins case against Viacom (2010)
wired.com: YouTube’s Bandwidth Bill Is Zero. Welcome to the New Net (2009)
Cringely: A Net Game for Google? (2010)
slashdot.org: What Are Google and Verizon Up To? (2010)

SVLUG meeting: Next-generation Samba with John Terpstra

Wednesday, August 4th, 2010

At the Silicon Valley Linux Users’ Group (SVLUG), John Terpstra lectured on the development history and status of Samba, a high-performance storage project he worked on, and ClearOS.

John is a technology manager and co-author of The Official Samba-3 HOWTO and Reference Guide (Bruce Perens’ Open Source Series).

He has previously worked as a VP at TurboLinux and Caldera on Linux clustering products. (I vaguely remember those products from way back around 2000.)

Some of the Samba tips he gave were:

  • trim your samba configuration file down to essential settings
  • Samba’s ActiveDirectory capabilities enable large networks to scale beyond Microsoft’s implementation
  • network bandwidth consumption can be reduced by proper configuration of WINS and broadcast vs. anycast

John also mentioned that Microsoft is contributing to Samba through their effort to make various protocols available to all POSIX operating systems and also interop testing meetings.

He gave an interesting overview of a document discovery project that required an elaborate storage system. He was able to setup a working test environment with RHEL, LVM, GFS2 and DRBD and various filesystems before switching to Glusterfs on top of Solaris ZFS for more efficient handling of directory metadata with deep directory paths containing 800,000 files per directory. (There were approx. 3 volumes containing 14 TB each.)

Thanks to Symantec for hosting the meeeting once again.

Axceleon acquires Turbolinux’s EnFuzion Clustering Solution (2002)

Defcon 18, Las Vegas

Sunday, August 1st, 2010

DEF CON 18 was held once again in Las Vegas at the Riviera Convention Center.

There were a handful of talks on the subjects of DNS and IPv6.

The hacker Jeopardy session was a lot of fun. I think the audience got more correct answers than the panel. I was impressed with the software somebody wrote to show the game categories – very convincing. Afterward, the EFF had an interesting fundraiser (your photo beside a “model”.)

The weather was hot but clear. The McDonald’s across the street is open 24 hours and has free WiFi.

I walked over to the Fashion Show Mall (about 1 mile.) It has a variety of restaurants on different levels, including a Maggiano’s, the Capital Grille, and a gourmet burger stand.

theregister.co.uk: Defcon speaker calls IPv6 a ’security nightmare’

O’Reilly Open Source Conference 2010, Portland

Friday, July 23rd, 2010

Once again, the O’Reilly Open Source Conference (OSCON) was held in Portland, Oregon.

It was a good conference, and we had beautiful weather all week long.

Executive Summary

The themes promoted by the conference organizers were Cloud Computing, NoSQL, Emerging Languages (Scala, Erlang, Parrot, Go) and Android phone development.

The @oscon twitter channel was heavily used to coordinate amongst organizers and attendees. I used the TwiXtreme twitter client program on my BlackBerry.

Plug Computers were very popular in the Expo area. They are 5 watt ARM-based computers running Debian Linux that fit into a power brick-sized case and cost $99 to $129 depending on features. The Marvell booth had a few models on display, from GlobalScale (GuruPlug) and Ionics. High-end models have dual gigabit NICs, multiple USB ports, a WiFi access point and other expansion ports.

There was also continuing buzz regarding Facebook’s Flashcache SSD module (GPL v2) for linux, and also ZFS snapshots.

Tutorials

I went to the Gearman Cookbook tutorial, the first half of the Chef tutorial and some of the Cloud Summit talks.

The Gearman Cookbook tutorial was excellent. After a detailed overview of the Gearman architecture and implementations in Perl and C, a number of use cases were explored in detail, including before and after code samples. The talk was both easy to listen to as an overall survey, as well as providing immediately useful info for those wanting to deploy it.

The Chef tutorial was very detailed – too much so perhaps. I went to the first half only, since I am not planning to implement Chef soon (I use PXE and anaconda/kickstart with CentOS), and did not need that level of detail at this time. cfengine, puppet and chef are ops tools for configuring servers. Chef uses Ruby data structures for its configuration files, and has include files and other useful syntax. Basically, users can “code” server configuration, as if they were traditional apps.

I went to some of the Cloud Summit talks and BOFs, but found that anybody who has done a simple project using EC2 knew as much or more than the speakers, some I would call blowhards.

Marten Mickos, president of Eucalyptus, is refreshing in that he is always clear about being in it for the money, while also promoting Open Source.

Sessions

Some of the most memorable sessions to me were:

Introduction to MongoDB, Kristina Chodorow (MongoDB)

Kristina is the maintainer of the Perl and PHP drivers for MongoDB. She gave an overview of MongoDB, a NoSQL document store, and its command-line interface, which uses JavaScript.

Some day she will release a sharding tool for MongoDB.

Scaling SourceForge with MongoDB, Nosh Petigara (10gen), Rick Copeland (SourceForge.net / GeekNet)

Nosh and Rick gave an excellent review of incorporating MongoDB into the SourceForge site.

- SF query load is mostly read-only
- ops team benchmarked a few NoSQL candidates, and MongoDB won on performance
- original MySQL servers had 64 GB RAM. After migration to MongoDB, same server machines but only 8 GB RAM
- backup dumps are verified to be bitwise the same as masters
- have to be careful not to dump all documents in your database to the network or it will max out switches
- SF relies on first-class data centers and replication slaves, less worried about MongoDB mmap (not crash-safe)
- I personally looked at their performance numbers and site graphs (on an iPad), and the end result was impressive.

Perl Lightning Talks

As always, the Perl Lightning Talks are a highpoint of the conference.

The “cartoon” of Vincent Pit’s remarkable CPAN module(VPIT) contributions was both informative and hilarious. Vincent is a French Ph.D. candidate in advanced geometry.

Cloud BOF (3 Hours)

The Cloud BOF was disorganized, starting 30 minutes late and for some reason was subdivided into 4 audience groups. Startups and vendors trying to make a cloud sales push led the BOF, including cloud and DNS service providers.

The Health Regulations subgroup came up with a couple ways to make the Cloud palatable to regulators by using encryption on all data due to the multi-tenancy issues with sharing public VMs.

I was in the NoSQL group, which discussed general issues and particular successes. Memcached was the clearest winner, while some people also had success with MongoDB and Redis.

My neighbor was an engineer at Postrank.com. He said that they were happy with HAProxy, but much less happy with the unpredictable IO available when running MySQL on EC2. He also said to carefully look at storage volumes available to your instance, as one is a useful tmpfs. They use AuthSMTP to get around EC2 being generally blacklisted for outbound email.

Database BOFs

MySQL BOF

The MySQL AB engineering staff has left Oracle. Monty Program AB (21 staff) has the core developers, and Percona Inc. (32 staff) has the consultants. Oracle still has some of the InnoDB programmers.

The business plan for Monty Program AB is 60% commercially-sponsored MySQL development, and 40% community-request development. Monty would like commercial users of MySQL to sponsor patches that would benefit them.

Mark mentioned that using Nehalem instructions for CRC were much faster, and that Facebook was using partitions for truncating tables instead of doing multi-record deletes. (See his blog for more details.)

One person mentioned using a commercial backup tool, R1Soft, that inserts a linux kernel module to allow filesystem snapshots. He said to carefully test backup and restore in your environment, especially for filesystems greater than 1 TB which may exceed certain block counter limits. Peter said that some of his clients had used it with varying success.

It worked for him in his environment, and the file browser allows selective file restore (he uses it to restore by priority where a system runs multiple applications.) It starts at $299 for the Standard Edition, and also has MySQL Add-on and Enterprise Editions.

PostgreSQL BOF

The PostgreSQL BOF talked about 30 or so changes that went into version 9.

One of the most exciting new features is a native replication feature, called streaming replication (block-based.) The advantage over Slony-I replication is that Slony-I is trigger-based, so has a variety of issues included inability to replicate DDL commands.

Some of the developers mimed replication events, which was rather amusing to watch. Yes, it was taped.

PostgreSQL is released under the PostgreSQL Licence, which is BSDish.

Peter Zaitsev, co-founder of Percona, organized 3 BOFs, including XtraDB, XtraBackup, Maatkit, Percona Server, Sphinx Search and Running Databases on Flash Storage.

Sphinx Search BOF

Andrew Aksyonoff, the original programmer of Sphinx Search (GPL v2), couldn’t make it to OSCON (the good excuse was that he was busy coding), so Richard Kelm (Sphinx sales/customer support honcho) and Peter filled in (Percona is a business partner with Sphinx, and many of Percona’s clients use it.)

Some of the attendees were existing users, like myself, and some from HP and other companies were looking for a large-scale search solution or alternative to Lucene.

Monty mentioned that the latest MySQL 5.1 should be used, as there have been a number of performance and reliability improvements. Full-text search is supposed to be 10x faster than 5.0, and replication is nearly bug-free by now.

Sphinx Search now has real-time index updates in version 1.1.0 beta. Another very nice feature is SQL+FS indexing.

Here is the full Sphinx 1.1.0 changelog.

Running Databases on Flash Storage BOF

The Running Databases on Flash Storage BOF had a combination of MySQL and Postgres users who have tested or used most of the SSD products: FusionIO, violin, Intel, OCZ, etc. Everybody was happy with SSD IOPS performance, but less so with cost and metadata RAM requirements with the add-in boards (FusionIO may require 4 GB RAM for metadata.)

Peter said that 20% to 30% of his clients are already using SSD – across the spectrum of vendors and models. Some are also trying “massive RAM” solutions, like Cisco servers with 384 GB RAM.

Some users had 1+ TB Postgres databases with very thorny backup and mgmt. issues. One solution was to start a snapshot, but not do the copy operation.

Expo Notes

I had an enjoyable talk with Austin Hook, who has operated the OpenBSD Store for many years. He lives near Calgary, the center of OpenBSD/OpenSSH/PF development. He mentioned that some perennial financial contributors had stopped because of the recession, so here’s the donations link.

I also talked to some reps from a Brazilian outsourcing firm, ActMinds. They currently have 400 employees across Brazil and a sales office in Philadelphia. Brazil is only 2 hours ahead of EST. They said the minimum project size is 2 developers and developer turnover a low 5%/annum. Their pricing is $35 to $45/hour.

And I had fun handling the plug computers on display at the Marvell booth. The Ionics boards are amazingly densely populated.

Discussions

I had the opportunity to talk to a long-time Portland resident who works as a computer consultant. He said that the Portland economy is not doing great, and really hasn’t done well since old-growth logging was stopped after 90% of the forests were cleared. And although hundreds of miles of fiber optic has been laid downtown, it’s not available for residential use. However, the Beaverton area does have ubiquitous FTTH.

I also talked to somebody who attended the Emerging Languages talks. He’s working on his M.Sc. in Computer Science, so found those talks fascinating.

Twitter Humor

There were some humorous tweets:

- “my MongoDB and CouchDB mugs are fighting each other.”
- “I got one MongoDB mug, but need two to safely store coffee.”

Notes

Note to self: skip the nightly parties unless you have a date. The bars are too loud to talk to anybody.

Note to the O’Reilly conference organizers: use meetup.com for the BOFs like ApacheCon does. The average audience was about 10 people, and with meetup it would be 4x that.

OSCON 2010 Slides
Tim Bray: Desperate Perl Hacker
Youtube: OSCON 2010 videos
blip.tv: OSCON2010 videos
wikipedia: Plug Computer
Jeremy Zawodny: MongoDB Early Impressions