Archive for the ‘Perl’ Category

O’Reilly Open Source Conference 2010, Portland

Friday, July 23rd, 2010

Once again, the O’Reilly Open Source Conference (OSCON) was held in Portland, Oregon.

It was a good conference, and we had beautiful weather all week long.

Executive Summary

The themes promoted by the conference organizers were Cloud Computing, NoSQL, Emerging Languages (Scala, Erlang, Parrot, Go) and Android phone development.

The @oscon twitter channel was heavily used to coordinate amongst organizers and attendees. I used the TwiXtreme twitter client program on my BlackBerry.

Plug Computers were very popular in the Expo area. They are 5 watt ARM-based computers running Debian Linux that fit into a power brick-sized case and cost $99 to $129 depending on features. The Marvell booth had a few models on display, from GlobalScale (GuruPlug) and Ionics. High-end models have dual gigabit NICs, multiple USB ports, a WiFi access point and other expansion ports.

There was also continuing buzz regarding Facebook’s Flashcache SSD module (GPL v2) for linux, and also ZFS snapshots.

Tutorials

I went to the Gearman Cookbook tutorial, the first half of the Chef tutorial and some of the Cloud Summit talks.

The Gearman Cookbook tutorial was excellent. After a detailed overview of the Gearman architecture and implementations in Perl and C, a number of use cases were explored in detail, including before and after code samples. The talk was both easy to listen to as an overall survey, as well as providing immediately useful info for those wanting to deploy it.

The Chef tutorial was very detailed – too much so perhaps. I went to the first half only, since I am not planning to implement Chef soon (I use PXE and anaconda/kickstart with CentOS), and did not need that level of detail at this time. cfengine, puppet and chef are ops tools for configuring servers. Chef uses Ruby data structures for its configuration files, and has include files and other useful syntax. Basically, users can “code” server configuration, as if they were traditional apps.

I went to some of the Cloud Summit talks and BOFs, but found that anybody who has done a simple project using EC2 knew as much or more than the speakers, some I would call blowhards.

Marten Mickos, president of Eucalyptus, is refreshing in that he is always clear about being in it for the money, while also promoting Open Source.

Sessions

Some of the most memorable sessions to me were:

Introduction to MongoDB, Kristina Chodorow (MongoDB)

Kristina is the maintainer of the Perl and PHP drivers for MongoDB. She gave an overview of MongoDB, a NoSQL document store, and its command-line interface, which uses JavaScript.

Some day she will release a sharding tool for MongoDB.

Scaling SourceForge with MongoDB, Nosh Petigara (10gen), Rick Copeland (SourceForge.net / GeekNet)

Nosh and Rick gave an excellent review of incorporating MongoDB into the SourceForge site.

- SF query load is mostly read-only
- ops team benchmarked a few NoSQL candidates, and MongoDB won on performance
- original MySQL servers had 64 GB RAM. After migration to MongoDB, same server machines but only 8 GB RAM
- backup dumps are verified to be bitwise the same as masters
- have to be careful not to dump all documents in your database to the network or it will max out switches
- SF relies on first-class data centers and replication slaves, less worried about MongoDB mmap (not crash-safe)
- I personally looked at their performance numbers and site graphs (on an iPad), and the end result was impressive.

Perl Lightning Talks

As always, the Perl Lightning Talks are a highpoint of the conference.

The “cartoon” of Vincent Pit’s remarkable CPAN module(VPIT) contributions was both informative and hilarious. Vincent is a French Ph.D. candidate in advanced geometry.

Cloud BOF (3 Hours)

The Cloud BOF was disorganized, starting 30 minutes late and for some reason was subdivided into 4 audience groups. Startups and vendors trying to make a cloud sales push led the BOF, including cloud and DNS service providers.

The Health Regulations subgroup came up with a couple ways to make the Cloud palatable to regulators by using encryption on all data due to the multi-tenancy issues with sharing public VMs.

I was in the NoSQL group, which discussed general issues and particular successes. Memcached was the clearest winner, while some people also had success with MongoDB and Redis.

My neighbor was an engineer at Postrank.com. He said that they were happy with HAProxy, but much less happy with the unpredictable IO available when running MySQL on EC2. He also said to carefully look at storage volumes available to your instance, as one is a useful tmpfs. They use AuthSMTP to get around EC2 being generally blacklisted for outbound email.

Database BOFs

MySQL BOF

The MySQL AB engineering staff has left Oracle. Monty Program AB (21 staff) has the core developers, and Percona Inc. (32 staff) has the consultants. Oracle still has some of the InnoDB programmers.

The business plan for Monty Program AB is 60% commercially-sponsored MySQL development, and 40% community-request development. Monty would like commercial users of MySQL to sponsor patches that would benefit them.

Mark mentioned that using Nehalem instructions for CRC were much faster, and that Facebook was using partitions for truncating tables instead of doing multi-record deletes. (See his blog for more details.)

One person mentioned using a commercial backup tool, R1Soft, that inserts a linux kernel module to allow filesystem snapshots. He said to carefully test backup and restore in your environment, especially for filesystems greater than 1 TB which may exceed certain block counter limits. Peter said that some of his clients had used it with varying success.

It worked for him in his environment, and the file browser allows selective file restore (he uses it to restore by priority where a system runs multiple applications.) It starts at $299 for the Standard Edition, and also has MySQL Add-on and Enterprise Editions.

PostgreSQL BOF

The PostgreSQL BOF talked about 30 or so changes that went into version 9.

One of the most exciting new features is a native replication feature, called streaming replication (block-based.) The advantage over Slony-I replication is that Slony-I is trigger-based, so has a variety of issues included inability to replicate DDL commands.

Some of the developers mimed replication events, which was rather amusing to watch. Yes, it was taped.

PostgreSQL is released under the PostgreSQL Licence, which is BSDish.

Peter Zaitsev, co-founder of Percona, organized 3 BOFs, including XtraDB, XtraBackup, Maatkit, Percona Server, Sphinx Search and Running Databases on Flash Storage.

Sphinx Search BOF

Andrew Aksyonoff, the original programmer of Sphinx Search (GPL v2), couldn’t make it to OSCON (the good excuse was that he was busy coding), so Richard Kelm (Sphinx sales/customer support honcho) and Peter filled in (Percona is a business partner with Sphinx, and many of Percona’s clients use it.)

Some of the attendees were existing users, like myself, and some from HP and other companies were looking for a large-scale search solution or alternative to Lucene.

Monty mentioned that the latest MySQL 5.1 should be used, as there have been a number of performance and reliability improvements. Full-text search is supposed to be 10x faster than 5.0, and replication is nearly bug-free by now.

Sphinx Search now has real-time index updates in version 1.1.0 beta. Another very nice feature is SQL+FS indexing.

Here is the full Sphinx 1.1.0 changelog.

Running Databases on Flash Storage BOF

The Running Databases on Flash Storage BOF had a combination of MySQL and Postgres users who have tested or used most of the SSD products: FusionIO, violin, Intel, OCZ, etc. Everybody was happy with SSD IOPS performance, but less so with cost and metadata RAM requirements with the add-in boards (FusionIO may require 4 GB RAM for metadata.)

Peter said that 20% to 30% of his clients are already using SSD – across the spectrum of vendors and models. Some are also trying “massive RAM” solutions, like Cisco servers with 384 GB RAM.

Some users had 1+ TB Postgres databases with very thorny backup and mgmt. issues. One solution was to start a snapshot, but not do the copy operation.

Expo Notes

I had an enjoyable talk with Austin Hook, who has operated the OpenBSD Store for many years. He lives near Calgary, the center of OpenBSD/OpenSSH/PF development. He mentioned that some perennial financial contributors had stopped because of the recession, so here’s the donations link.

I also talked to some reps from a Brazilian outsourcing firm, ActMinds. They currently have 400 employees across Brazil and a sales office in Philadelphia. Brazil is only 2 hours ahead of EST. They said the minimum project size is 2 developers and developer turnover a low 5%/annum. Their pricing is $35 to $45/hour.

And I had fun handling the plug computers on display at the Marvell booth. The Ionics boards are amazingly densely populated.

Discussions

I had the opportunity to talk to a long-time Portland resident who works as a computer consultant. He said that the Portland economy is not doing great, and really hasn’t done well since old-growth logging was stopped after 90% of the forests were cleared. And although hundreds of miles of fiber optic has been laid downtown, it’s not available for residential use. However, the Beaverton area does have ubiquitous FTTH.

I also talked to somebody who attended the Emerging Languages talks. He’s working on his M.Sc. in Computer Science, so found those talks fascinating.

Twitter Humor

There were some humorous tweets:

- “my MongoDB and CouchDB mugs are fighting each other.”
- “I got one MongoDB mug, but need two to safely store coffee.”

Notes

Note to self: skip the nightly parties unless you have a date. The bars are too loud to talk to anybody.

Note to the O’Reilly conference organizers: use meetup.com for the BOFs like ApacheCon does. The average audience was about 10 people, and with meetup it would be 4x that.

OSCON 2010 Slides
Tim Bray: Desperate Perl Hacker
Youtube: OSCON 2010 videos
blip.tv: OSCON2010 videos
wikipedia: Plug Computer
Jeremy Zawodny: MongoDB Early Impressions

sf.pm.org: Hudson for Everybody Else

Tuesday, June 22nd, 2010

Joe McMahon did a nice talk tonite at the San Francisco Perl Mongers (sf.pm.org) on the Hudson continuous integration server.

Hudson is written in Java, but can be used with any programming language (or documentation generator) where Makefile or JUNIT output is available.

He’s been happy with the included features so far. One of the features he’d like to try next is spawning a VM from Hudson.

As a Java application, it can be fairly memory-intensive. Hudson plus 400,000 tests requires about 4 GB RAM.

Although most Perl modules don’t require a compile-link step, CI can still be useful for Perl programs to:

  • automatically test across multiple platforms
  • automatically run test suites
  • integrate code from multiple developers
  • record build results in a common location for later analysis.

Joe also talked about cleaning up the output of Devel::Cover by excluding CPAN modules.

I mentioned during the Q&A period that `make -i’ can be used to force make to continue on errors.

Thanks to Mother Jones for hosting the event tonite.

Slides
CPAN ID MCMAHON

PENLUG Meeting: Linux Open-Source Virtualization Roadmap

Wednesday, May 26th, 2010

Jamie Cameron, the author of Webmin, did a talk on linux virtualization at Peninsula Linux Users Group (PENLUG) in the Bayshore Technology Park in Redwood City tonite.

He’s working on 2 new products, Virtualmin and Cloudmin, so has had to learn the ins and outs of the current state of linux virtualization with respect to hosting.

His favorite is Xen, but for some reason Redhat is providing more support for KVM (Kernel Virtual Machine), which has several disadvantages including lack of CPU limiting. Redhat acquired KVM resources in 2008.

OpenVZ is popular with budget hosting providers, and Virtuozzo with those that want to pay.

Linux-VServer is the lightest weight alternative, similar to FreeBSD jails, but also the least maintained at this point.

He gave a demo of Cloudmin, including creating a guest and logging into it.

Since Linux has no ABI standard, he prefers developing in scripting languages like Perl for maximum portability.

wikipedia: webmin
Ganeti is a “cluster virtual server management software tool built on top of existing virtualization technologies such as Xen or KVM and other Open Source software.”

Mapreduce and Hadoop Links

Sunday, May 23rd, 2010

This is a placeholder post for Mapreduce and Hadoop links.

(I operate a small 64-core cluster, and am always looking for ways to keep it busy with FOSS like Hadoop.)

hadoop.apache.org Cluster Setup
Cloudera.com
Mapreduce & Hadoop Algorithms in Academic Papers (3rd update)
Sandia: MapReduce-MPI Library
columbia.edu: Alex K’s Mapreduce Bibliography

wikipedia: SGE 6.2 supports Hadoop
wikipedia: Rocks Cluster Distribution

A grain of wisdom is worth an ounce of knowledge, which is worth a ton of data. — Neil Larson

Google App Engine and Perl

Sunday, May 16th, 2010
Google App Engine Logo Placeholder blog post for Google App Engine and Perl support links.
Google App Engine homepage
code.google.com: Perl App Engine
code.google.com: App Engine Issue #34: Add Perl Support
Brad: Perl on App Engine (July 22, 2008)
Perl App Engine Status Update (July 30, 2008)

fsync Links

Saturday, May 8th, 2010

This is a placeholder post for links about fsync on linux and Perl.

tchrist: some good news && bad news on fsync
Don’t fear the fsync!
Delayed allocation and the zero-length file problem
libeatmydata
Firefox 3 & ‘fsync’ issue
Brad’s diskchecker.pl

SVLUG: IPv6 Essentials for Linux Administrators with Owen DeLong

Wednesday, May 5th, 2010

At the Silicon Valley Linux Users Group (SVLUG) talk tonite, Owen DeLong from Hurricane Electric did a good talk on “IPv6 Essentials for Linux Administrators.”

Owen is the IPv6 evangelist for Hurricane Electric, an Internet hosting and network services company with 2 data centers in Fremont, 1 in San Jose, and approximately 30 POPs world-wide.

There is urgency to improve IPv6 support and adoption as:

  • IPv4 will run out of /8 blocks available shortly (2011), resulting in scarcity
  • China and other countries are rapidly moving online and require (demand) addresses
  • yet there is a long lead-time to deploy IPv6, perhaps 5 years for a company that hasn’t started preparations.

He mentioned some interesting “tricks”, including:

  • using an ssh tunnel to bridge IPv4 and IPv6 networks

He also does a separate talk on “IPv6 Essentials for Programmers.”

Owen mentioned after the talk that some of the scripting languages have poor support for IPv6, including Perl.

Thanks once again to Symantec for providing a meeting space.

HE Tunnel Broker Service
brad’s life – IPv6

MySQL Conference 2010

Monday, April 12th, 2010

The MySQL Conference was this week in Santa Clara. It was a well-organized and educational event with everybody involved in the MySQL community showing up once again.

Executive Summary

The highlights were:

  • after 2 years of effort, the performance schema foundation is available as a 5.5.x patch. With another year of effort, it could be useful.
  • the various community forks (Percona/XtraDB, MariaDB, OurDelta) will merge in the next 3 months into a maintenance fork by Monty Program, since MP has the most original MySQL developers.
  • the various MySQL vendors are soldiering along, all releasing new, improved versions of their hw and sw products this year.
  • The largest independent MySQL-centric consulting companies are Percona with 32 staff, and Monty Program 40 with staff, with a target of 50 employees.
  • the MySQL source code will have to be modified to make MySQL fast enough to keep up with Fusion IO SSD devices. Currently, better than SSD performance can be gained by installing enough RAM to fit the entire database in buffer pool.
  • Drizzle development is going nicely, but note that it’s not backward compatible with MySQL. Drizzle is a 64-bit only fork of MySQL with emphasis on community code development, increasing performance and maintainability through a plug-in architecture and strict code cleanliness.

Monday Morning Tutorial

Using Partitioning in MySQL 5.1 and 5.5 with Giuseppe Maxia (Oracle)

- available in MySQL 5.1 and later only
- TO_DAYS and YEAR() special and recommended as they can prune partitions from lookups.
- when using TO_DAYS() as a partitoning function, the first partition matters. Use value less than zero for first partition to create NULL partition to double performance as a bug workaround.
- consider lock before inserting for all table types
- for performance, consider non-partitioned on masters, partitions on slaves.
- or different partition types

He also gave a nice tutorial on mysql sandboxes script.

Partition Limitations:

- cannot mix table types
- cannot make read-only

I talked to some advanced users, and none have found a practical use for partitions in their environment that was better than using regular table types for logging type applications.

This is due to the fact that partitions do not increase fault-tolerance, often don’t benchmark any faster, and have little in the way of administrative mgmt. support after partition creation.

Partitions can increase performance in applications where the index serves to stripe operations, but most people are just using dates for logging, with no practical benefit, as most operations fall into the current date partition.

Slides

Monday Afternoon Tutorial

Talked to Arjen Lenz and a friend at lunch.

- OpenQuery is suitable for affordable, long-term contract databae admin, not firefighting
- former partition tester and bugfixer
- replication bug with TCP errors, nagios plugin should compare both replication lag seconds and log position
- need SSL or heartbeat to detect/fix

memcached

- set all clients to same values
- use JSON or YAML, not Storable or Pickl

Tuesday

Performance Schema with Peter Gulutzan

- coded by Alff, but not GA yet
- PERFORMANCE_SCHEMA database optionally populated with events (mutex, lock, io) timing and count info
- allows simple SQL reporting of performance

EXPLAIN Demystified with Baron Schwartz (Percona Inc.)

- perennial nice EXPLAIN overview
- nice example of using mysql command prompt as pipeline for non-trivial processing

Introduction to InnoDB Monitoring System and Resource & Performance Tuning with Jimmy Yang

An Overview of Flash Storage for Databases with Vadim Tkach, Percona Inc.

- nice talk with useful performance graphs

Linux Performance Tuning and Stabilization Tips with Yoshinori Matsunobu, Sun Microsystems

- nice talk with detailed slide examples – he’s a hard worker
- he’s a fan of xfs, so some info not always useful for ext3. ie. deadline scheduler may be better on xfs, but it feels the same to me as cfq on ext3.

Wednesday

More Mastering the Art of Indexing with Yoshinori Matsunobu, Sun Microsystems

- second-part continuation of his talk from last year (!) Were you there?
- his understanding of the space requirements of blobs in Innodb is different than Peter Zaitsev’s.

Faster Than Alter – Less Downtime with Chris Schneider (Ning.com)

- Hipster presentation on doing practical DBA tasks
- likes doing dump and restore on Innodb tables, 30% faster afterwards on his tables.

InnoDB Architecture and Performance Optimization with Peter Zaitsev, Percona Inc.

- perennial comprehensive overview of Innodb
- talked about differences between Antelope and Barracuda file formats
lwn.net: A look at the MySQL forks

BOFs

O’Reilly failed to use meetup.com to promote the BOFs once again at this conference, so turnout was light to moderate as in past years.

Sphinx BOF hosted by Andrew Aksyonoff

I’ve been familiar with SphinxSearch for years and am a production user, so the general audience discussion was not interesting to me.

However, I had a chance to talk to Andrew about my take on the October Apache (Search!) Conference last October and suggested a few things:

  • explain collections on the Sphinx homepage, since many users insist on this feature. The question, of course, is what does the term ‘collections’ mean to various people?
  • make it possible for a non-technical end-user (like a marketing asst.) to highlight 10 items for feature on the first page of results
  • Microsoft is EOLing FAST for linux users, so think about promotion to that segment, who is considering migration to Lucene mainly – because Lucene is free, and migration is the same cost to any other product.
  • look at the myriad “value-added features” of commercial search engines, mostly related to adserver integration, and decide what can be supported.

MariaDB BOF hosted by Monty

Not much talk about MariaDB, but lots of drinking! (See Monty’s keynote for more detailed info.)

Conference Wrapup

Overall, another good MySQL conference. The organizers restored balance to the presentations, with a fair number of independent consultants and end-users doing talks. (Though I miss the awesome Percona Performance Conference from last year.)

BOFs should be promoted on meetup.com to double participation.

There should be a room with exotic hardware to demonstrate high performance MySQL and MySQL Cluster configurations – SANs, Infiniband, failover, etc.

The lunch food was quite good on all days, as noted by several people. (Important because the suburban venue is not within walking distance to outside restaurants.)