Archive for March, 2009

User Group: BayPIGgies (Python)

Thursday, March 26th, 2009

A handful of speakers gave very interesting presentations at BayPIGgies (Silicon Valley-San Francisco Bay Area Python Interest Group).

It was like a set of lightning talks – only longer.

There was a quick talk on Big O notation, and how to characterize python arrays and sets. I am suspicious of performance talks without benchmarks, though.

Sandrine Ribeau gave a talk on pylint, which is a lint/coding standards checker for python source code.

Although PyChecker still does a better job of general python lint checking, pylint can be used to check for things like function naming conventions, etc. using a site-specific checker module.

pylint can be configured to ignore specified warnings or errors, and also overridden from the command line.

Somebody gave a talk on doing log and log-like processing using Unix pipes, similar to how Yahoo does it. Generally they are faster and have more predictable resource requirements than MySQL, for example.

One interesting technique is to use the sort -T option to assign temporary files to different drives than your input data or output file.

Drew Perttula gave a talk and demo about kcachegrind, as well as supporting tools for measuring performance of python code, including a module call graph display tool that creates png graphs using the dot program.

It was an impressive demo, and though kcachegrind was not written with python in mind, the display still made sense.

Simeon Franklin talked about his environment setup to improve web development and release workflow using virtualenv (managing python development environments), pip (an easy_install replacement that works with virtualenv environments), and fabric (a python tool for scripting server deployment tasks).

It was very slick.

In a nutshell, the practical problem that web developers face nowadays is how to create sandboxes for multiple versions of python and web CMS systems, then periodically install them on remote servers.

Sandboxes are needed because by default, python modules are installed silently and globally. By installing instead to a sandbox, you can isolate which modules got installed, and where.

fabric, the remote installation tool, is kind of like a cross between make and Expect, except written in python and focused on installation.

fabric has 15 statements to allow running local and remote commands, authentication and copying of files. So you can build a local distro, login to a remote server, upload the distro, and run installation commands, all automatically.

Commonly fabric is used to install static and program files, do database schema updates, and restart web servers.

Thanks to Symantec for hosting the event tonite.

While passing Moffett Airfield I happened to see a big, white zeppelin owned by Airship Ventures.

cnet.com: A 21st-century zeppelin flies to San Francisco
avweb.com: Zeppelin Startup Struggles As Economy Sinks

IMUG: Globalization and Software Test Automation

Thursday, March 19th, 2009

Dana Li, Business Development Manager at hiSoft, gave a a talk on software localization and QA at IMUG.

These days, software company clients typically provide an internationalized product to hiSoft, and they translate it into 8 to 25 languages, then test the result for correct translation and functional behavior.

hiSoft uses whatever testing framework the client uses, so those can vary from commercial Silktest or QTP, to Open Source selenium. The hiSoft folks didn’t express any strong preference for frameworks.

AJAX is more difficult to do test automation for, as the entire page can be dynamic.

Generally nobody provides source code to be internationalized (like the web 1.0 days.)

An interesting project they did was to QA Chinese OCR software.

But every project has its own complications.

Afterward an Arabic consultant chatted a little about how Modern Standard Arabic (MSA) has standardized Arabic writing world-wide, but there is a local spoken dialect in each region.

Thanks again to Apple for hosting IMUG.

Bioteam Sun Grid Engine Class

Wednesday, March 18th, 2009

Sun Grid Engine LogoI attended an excellent class on Sun Grid Engine (SGE) Cluster Administration at the Santa Clara Hyatt. The instructor was Chris Dagdigian, from BioTeam, and the sponsor was UnivaUD.

This was a one-day version of his two-day class, so things moved pretty fast.

Chris is very familiar with SGE use cases applied across a number of different industries, and how SGE differs from LSF.

Since a few attendees worked in EDA, Chris also provided useful information specific to EDA, such as SGE resource configuration and retrying license checkout requests in epilog scripts for FlexLM.

LSF has had a mature public programming API for a long time, while SGE has a limited API named DRMAA for job submission and limited administration – so you’re stuck writing wrappers for the command-line utilities.

LSF is also queue-oriented, while SGE is more policy-oriented. So an LSF configuration can have 10x as many queues as SGE (not better, just different.)

The SGE “lab” was available on 10 Amazon EC2 Extra Large instances, one for each person (BYON = Bring Your Own Netbook).

I found it useful to quickly try the command-line tools as Chris talked about them. In a full 2-day class, you would also install, configure and do reporting.

(He said I was one of the few qmon fans. Most people just use the qmon Motif/X11 GUI for cluster monitoring, but I also do most administration with it.)

Chris said that SGE releases now alternate between feature and performance enhancement versions.

SGE performance has been improved on large systems by tuning for the Texas Advanced Computing Center at the University of Texas (TACC) Ranger cluster, which has 580 teraflops and 63,000 cores. (It had a Top500 supercomputer ranking of #4.)

Although scheduling improvements have resulted, some of the command-line tools default behavior have been neutered to reduce load, so you will need to add more options to get the same result now.

It’s still early days to see how batch computing and cloud computing (Amazon EC2, Hadoop, etc.) will coexist. With on-demand scheduling, SGE could possibly be used to farm out Internet web request jobs to Amazon EC2, but the job submission overhead would have to be measured.

Chris is also a storage geek, so offered some advice on cluster storage. He insists on RAID6/double parity for storage hardware.

He mentioned the Nexsan SATAbeast devices, which support RAID6 and use 40% less power by spinning down drives (called AutoMAID).

The meeting room was nice, cozy enough for about 10 people, with reliable WiFi and a gourmet Mexican/American lunch.

Thanks to BioTeam and UnivaUD for organizing and hosting this event.

If you’re using SGE and need training, contact Chris and tell him what industry you’re in to get a class tailored to your requirements.

wikipedia: Platform LSF
Jonathon Schwartz’s blog entry on Ranger

Happy St. Patrick’s Day

Tuesday, March 17th, 2009

Well, once again it’s St. Patrick’s Day.

I wore my green sweater today, so I guess that makes me fashionable one day per year anyhow.

Besides drinking green beer and talking with an accent, there is something valuable to do on this day … help wikipedia update the St. Patrick’s Day page with additional citations for verification.

After all, 54 citations just doesn’t seem to be enough to confirm its existence to the wikipedia editors.

Amazon Reserved Instances Cost-Benefit Analysis Says Yes

Thursday, March 12th, 2009

Amazon Web Services LogoAmazon just introduced “reserved instances”, likely for disaster-recovery sites, and similar uses.

You’re supposed to pay in advance to reserve instances you likely won’t use in the future.

But there’s an interesting pricing loophole …

Let’s use the linux small instance (1.7 GB of memory, 1 EC2 Compute Unit (1 virtual core with 1 EC2 Compute Unit), 160 GB of instance storage, 32-bit platform) as an example.

If you’re sure need an instance for a year, you can pre-pay a reservation fee ($325/year), then use it at a discounted price ($0.03/hr). Ends up being 1/3 less than regular pricing:

Regular: .10/hour * 24 hours * 365 days = $876.00/year plus bw.
Reserved: $325 + (.03 * 24 * 365)= $587.80/year, $73/month (33% off!) plus bw.

For a 3-year commitment, it ends up being 51% off ($35.79/month) plus bw.

Why is this really interesting? Nobody rents a useful (100 Mbps connection, 1+ GB RAM, 160 GB disk) dedicated server for $36/month.

(EC2 bandwidth is .10/GB in and .17 to .10/GB out.)

Mosso (Rackspace) Pricing
StorageMojo: The Amazon keynote at FAST ‘09

Fascinating Article on Robinson Helicopters and R66

Saturday, March 7th, 2009

Fascinating article on Robinson Helicopters, its struggles during the recession, and its hopes for their first turbine-powered helicopter, the Robinson R66.

I had a chance to see the tiny Rolls-Royce R330 engine at AOPA Expo last year, which is rated at about 300 HP.

A new R44 is about $500,000, while the R66 is expected to be about $1 million.

I wonder if their decline in sales is due to potential buyers waiting for the R66 to be available.

Robinson has been so successful in the past few years that Bell Helicopter dropped the 206 Jetranger after making it for 40 years. The R44 is about 1/3 the price of the 206, and costs 1/3 as much to operate.

S5: Simple Standards-Based Slide Show System Review

Thursday, March 5th, 2009

S5 Logo Recently I needed to create a presentation slide deck for a training class.

I’m not a big fan of any commercial programs, so I looked around for a free HTML-based slideshow authoring system and came across S5, which is public domain, cross-platform, and cross-brower compatible.

The S5 concept was copied from the proprietary Opera Show feature.

It’s very cool that Eric Meyer and collaborators were able to create S5 just using HTML, CSS and a little JavaScript. The intro sample can be seen here.

In this case the first time was the charm, since S5 did everything I wanted, looks professional, yet was very simple to edit just using vi.

After editing various metadata in the main file, just keep adding slides with the following tags:

<div class="slide">
<h1>slide title</h1>
<ul>
<li>bullet point 1</li>
<li>bullet point 2</li>
</ul>
</div>

The only disadvantages or nits that I noticed or could find with S5 were:

  • test your slide theme and resolution on projector in case of problems after resizing, also disable the screensaver on your netbook
  • initial slide doesn’t indicate how to navigate, so I added some instructions.
  • it’s not a 1 file solution, so have to keep track of several helper files
  • the browser “Back button” doesn’t go back to the previous slide, so use the S5 built-in navigation features
  • conflict with older versions of AdBlock extension for Firefox.

The navigation instructions that I added to the initial slide are, “(S5 Slide navigation instructions: use arrow, Home, PgUp, PgDn, End, space keys, and left-mouse click, or mouse-over bottom-right of slides.)”

Opera shows off: Create Professional Presentations in Minutes with the New Opera Show Generator
miscoded: Opera given cold shoulder by CSS guru

Useful Pro DSLR Camera Online Databases

Wednesday, March 4th, 2009

This is a placeholder page for useful pro DSLR camera online databases. Please send me a comment with other recommended links.

Memory Cards

Rob Galbraith’s CF/SD Performance Database

Sensors

DxO Labs DxOMark

Lenses

pbase Camera Databases
Nikon Lenses
Canon EOS Lens Notes
Photozone Lens Tests