Google Translate API Programming Notes

I decided to try out the Google Translate API (v1), part of the Google Language API Family this weekend.

The Google Translate API supports various housekeeping functions plus 2 main features:

  1. language detection
  2. language translation.

Although machine translation does not generally provide a high quality, polished result for arbitrary input, it can still be useful for more limited requirements.

For this project, I had 11 existing translations (Google Translate supports 56 languages) for a web site that needed a small incremental update of 65 short strings, and could compare the Google Translate results with a human-translated corpus for most of the languages.

Conveniently, the new strings included day and month names, which is easy for machine translation to get right. :)

The most important thing to do is to first read the Google Translate TOS first. There are several limitations and requirements:

  • you should register for an API key, provide a referer URL for the project, and provide the IP address of the requesting host
  • “powered by Google” must be displayed in any human-readable UI that relies on Google Translate
  • the maximum allowable input is 5,000 characters
  • Automated requests are prohibited; all requests must be made as a result of an end-user action.
  • All websites or apps which use Google APIs must be free of charge.

As a practical matter, you should detect a TOS error and stop submitting API requests, ie. responseDetails is ‘Suspected Terms of Service Abuse. Please see http://code.google.com/apis/errors‘.

JSON and REST are supported, so any programming language can be used. Google provides code samples in JavaScript, Flash, Java, PHP, Python and Perl. UTF-8 is the character set used.

I used the Perl sample code, fixed the string concat bug (!) in the first line, and enhanced it to comply with the TOS.

I found that no API key is needed if translation requests are throttled by 10 seconds each.

Also, you may submit input embedded in HTML, but the output translation can reorder the HTML tags, in some cases changing the final appearance. I noticed that anchor and strong elements were re-ordered in my results.

wikipedia: Machine translation
CLDR – Unicode Common Locale Data Repository
GeoNames.de – Languages of the World
Perl CPAN Module DateTime::Format::CLDR
Apertium Machine Translator API
Google to close Translation API service

This entry was posted in i18n, Japanese, Linux, Open Source, Perl, Tech. Bookmark the permalink.

Leave a Reply

Your email address will not be published.

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>