Archive for the ‘OSCON’ Category

Some ZFS News

Friday, August 27th, 2010

Phoronix has a really well-written article on ZFS, including news on a company planning to release a CDDL-licensed linux kernel module.

ZFS is the holy grail of filesystems. Many Database Administrators have switched from Linux to Solaris because ZFS has much better snapshot support than LLVM, as well as good SSD support.

phoronix.com: Native ZFS Is Coming To Linux Next Month (Aug. 27, 2010)
phoronix.com: Btrfs, EXT4 & ZFS On A Solid-State Drive (Aug. 9, 2010)
phoronix.com: Benchmarking ZFS On FreeBSD vs. EXT4 & Btrfs On Linux (July 27, 2010)
phoronix.com: Running ZFS With CAM-based ATA On FreeBSD 8.1 (July 26, 2010)
github: Native ZFS for Linux
FreeBSD Wiki: ZFS

O’Reilly Open Source Conference 2010, Portland

Friday, July 23rd, 2010

Once again, the O’Reilly Open Source Conference (OSCON) was held in Portland, Oregon.

It was a good conference, and we had beautiful weather all week long.

Executive Summary

The themes promoted by the conference organizers were Cloud Computing, NoSQL, Emerging Languages (Scala, Erlang, Parrot, Go) and Android phone development.

The @oscon twitter channel was heavily used to coordinate amongst organizers and attendees. I used the TwiXtreme twitter client program on my BlackBerry.

Plug Computers were very popular in the Expo area. They are 5 watt ARM-based computers running Debian Linux that fit into a power brick-sized case and cost $99 to $129 depending on features. The Marvell booth had a few models on display, from GlobalScale (GuruPlug) and Ionics. High-end models have dual gigabit NICs, multiple USB ports, a WiFi access point and other expansion ports.

There was also continuing buzz regarding Facebook’s Flashcache SSD module (GPL v2) for linux, and also ZFS snapshots.

Tutorials

I went to the Gearman Cookbook tutorial, the first half of the Chef tutorial and some of the Cloud Summit talks.

The Gearman Cookbook tutorial was excellent. After a detailed overview of the Gearman architecture and implementations in Perl and C, a number of use cases were explored in detail, including before and after code samples. The talk was both easy to listen to as an overall survey, as well as providing immediately useful info for those wanting to deploy it.

The Chef tutorial was very detailed – too much so perhaps. I went to the first half only, since I am not planning to implement Chef soon (I use PXE and anaconda/kickstart with CentOS), and did not need that level of detail at this time. cfengine, puppet and chef are ops tools for configuring servers. Chef uses Ruby data structures for its configuration files, and has include files and other useful syntax. Basically, users can “code” server configuration, as if they were traditional apps.

I went to some of the Cloud Summit talks and BOFs, but found that anybody who has done a simple project using EC2 knew as much or more than the speakers, some I would call blowhards.

Marten Mickos, president of Eucalyptus, is refreshing in that he is always clear about being in it for the money, while also promoting Open Source.

Sessions

Some of the most memorable sessions to me were:

Introduction to MongoDB, Kristina Chodorow (MongoDB)

Kristina is the maintainer of the Perl and PHP drivers for MongoDB. She gave an overview of MongoDB, a NoSQL document store, and its command-line interface, which uses JavaScript.

Some day she will release a sharding tool for MongoDB.

Scaling SourceForge with MongoDB, Nosh Petigara (10gen), Rick Copeland (SourceForge.net / GeekNet)

Nosh and Rick gave an excellent review of incorporating MongoDB into the SourceForge site.

- SF query load is mostly read-only
- ops team benchmarked a few NoSQL candidates, and MongoDB won on performance
- original MySQL servers had 64 GB RAM. After migration to MongoDB, same server machines but only 8 GB RAM
- backup dumps are verified to be bitwise the same as masters
- have to be careful not to dump all documents in your database to the network or it will max out switches
- SF relies on first-class data centers and replication slaves, less worried about MongoDB mmap (not crash-safe)
- I personally looked at their performance numbers and site graphs (on an iPad), and the end result was impressive.

Perl Lightning Talks

As always, the Perl Lightning Talks are a highpoint of the conference.

The “cartoon” of Vincent Pit’s remarkable CPAN module(VPIT) contributions was both informative and hilarious. Vincent is a French Ph.D. candidate in advanced geometry.

Cloud BOF (3 Hours)

The Cloud BOF was disorganized, starting 30 minutes late and for some reason was subdivided into 4 audience groups. Startups and vendors trying to make a cloud sales push led the BOF, including cloud and DNS service providers.

The Health Regulations subgroup came up with a couple ways to make the Cloud palatable to regulators by using encryption on all data due to the multi-tenancy issues with sharing public VMs.

I was in the NoSQL group, which discussed general issues and particular successes. Memcached was the clearest winner, while some people also had success with MongoDB and Redis.

My neighbor was an engineer at Postrank.com. He said that they were happy with HAProxy, but much less happy with the unpredictable IO available when running MySQL on EC2. He also said to carefully look at storage volumes available to your instance, as one is a useful tmpfs. They use AuthSMTP to get around EC2 being generally blacklisted for outbound email.

Database BOFs

MySQL BOF

The MySQL AB engineering staff has left Oracle. Monty Program AB (21 staff) has the core developers, and Percona Inc. (32 staff) has the consultants. Oracle still has some of the InnoDB programmers.

The business plan for Monty Program AB is 60% commercially-sponsored MySQL development, and 40% community-request development. Monty would like commercial users of MySQL to sponsor patches that would benefit them.

Mark mentioned that using Nehalem instructions for CRC were much faster, and that Facebook was using partitions for truncating tables instead of doing multi-record deletes. (See his blog for more details.)

One person mentioned using a commercial backup tool, R1Soft, that inserts a linux kernel module to allow filesystem snapshots. He said to carefully test backup and restore in your environment, especially for filesystems greater than 1 TB which may exceed certain block counter limits. Peter said that some of his clients had used it with varying success.

It worked for him in his environment, and the file browser allows selective file restore (he uses it to restore by priority where a system runs multiple applications.) It starts at $299 for the Standard Edition, and also has MySQL Add-on and Enterprise Editions.

PostgreSQL BOF

The PostgreSQL BOF talked about 30 or so changes that went into version 9.

One of the most exciting new features is a native replication feature, called streaming replication (block-based.) The advantage over Slony-I replication is that Slony-I is trigger-based, so has a variety of issues included inability to replicate DDL commands.

Some of the developers mimed replication events, which was rather amusing to watch. Yes, it was taped.

PostgreSQL is released under the PostgreSQL Licence, which is BSDish.

Peter Zaitsev, co-founder of Percona, organized 3 BOFs, including XtraDB, XtraBackup, Maatkit, Percona Server, Sphinx Search and Running Databases on Flash Storage.

Sphinx Search BOF

Andrew Aksyonoff, the original programmer of Sphinx Search (GPL v2), couldn’t make it to OSCON (the good excuse was that he was busy coding), so Richard Kelm (Sphinx sales/customer support honcho) and Peter filled in (Percona is a business partner with Sphinx, and many of Percona’s clients use it.)

Some of the attendees were existing users, like myself, and some from HP and other companies were looking for a large-scale search solution or alternative to Lucene.

Monty mentioned that the latest MySQL 5.1 should be used, as there have been a number of performance and reliability improvements. Full-text search is supposed to be 10x faster than 5.0, and replication is nearly bug-free by now.

Sphinx Search now has real-time index updates in version 1.1.0 beta. Another very nice feature is SQL+FS indexing.

Here is the full Sphinx 1.1.0 changelog.

Running Databases on Flash Storage BOF

The Running Databases on Flash Storage BOF had a combination of MySQL and Postgres users who have tested or used most of the SSD products: FusionIO, violin, Intel, OCZ, etc. Everybody was happy with SSD IOPS performance, but less so with cost and metadata RAM requirements with the add-in boards (FusionIO may require 4 GB RAM for metadata.)

Peter said that 20% to 30% of his clients are already using SSD – across the spectrum of vendors and models. Some are also trying “massive RAM” solutions, like Cisco servers with 384 GB RAM.

Some users had 1+ TB Postgres databases with very thorny backup and mgmt. issues. One solution was to start a snapshot, but not do the copy operation.

Expo Notes

I had an enjoyable talk with Austin Hook, who has operated the OpenBSD Store for many years. He lives near Calgary, the center of OpenBSD/OpenSSH/PF development. He mentioned that some perennial financial contributors had stopped because of the recession, so here’s the donations link.

I also talked to some reps from a Brazilian outsourcing firm, ActMinds. They currently have 400 employees across Brazil and a sales office in Philadelphia. Brazil is only 2 hours ahead of EST. They said the minimum project size is 2 developers and developer turnover a low 5%/annum. Their pricing is $35 to $45/hour.

And I had fun handling the plug computers on display at the Marvell booth. The Ionics boards are amazingly densely populated.

Discussions

I had the opportunity to talk to a long-time Portland resident who works as a computer consultant. He said that the Portland economy is not doing great, and really hasn’t done well since old-growth logging was stopped after 90% of the forests were cleared. And although hundreds of miles of fiber optic has been laid downtown, it’s not available for residential use. However, the Beaverton area does have ubiquitous FTTH.

I also talked to somebody who attended the Emerging Languages talks. He’s working on his M.Sc. in Computer Science, so found those talks fascinating.

Twitter Humor

There were some humorous tweets:

- “my MongoDB and CouchDB mugs are fighting each other.”
- “I got one MongoDB mug, but need two to safely store coffee.”

Notes

Note to self: skip the nightly parties unless you have a date. The bars are too loud to talk to anybody.

Note to the O’Reilly conference organizers: use meetup.com for the BOFs like ApacheCon does. The average audience was about 10 people, and with meetup it would be 4x that.

OSCON 2010 Slides
Tim Bray: Desperate Perl Hacker
Youtube: OSCON 2010 videos
blip.tv: OSCON2010 videos
wikipedia: Plug Computer
Jeremy Zawodny: MongoDB Early Impressions

OSCON 2009 – San Jose

Friday, July 24th, 2009

For the first time in a decade, the O’Reilly Perl and Open Source Conference (OSCON) was held San Jose again for 2009.

(I have heard that the City of San Jose Business Development office is very, very accommodating towards conferences these days.)

There was great attendance, and plenty to see with about 15 simultaneous tracks, lots of BoFs, and an active exhibits area.

One of the changes this year was more OS talks, including some for linux and FreeBSD. This is a welcome change, though many kernel hackers won’t travel to the USA, for various legal issues.

My favorites were:

  • talk – YAML by Ingy. YAML is a serialization standard for all programming languages and is a superset of JSON, in that YAML supports types and references. The Perl module is YAML.pm. Although it is a “serialization standard”, best results are obtained when both sides of the exchange are controlled by the programmer (ie. different word sizes or floating point standards will likely cause issues.)
  • Perl lightning talk – Esthetic Randomness by Joseph Brenner. Joseph likes to post-process random output before display to get a more desirable appearance. He’s kind of goth-looking, so the overall subject and delivery made it an interesting 5 minutes.
  • BoF – MySQL social with Monty, Percona (now 25 employees!) and Mark C.
  • exhibit – Haiku OS (the Open Source BeOS clone) demo. 2 developers, now at Google, reimplemented BFS from the textbook. Haiku can run with 64 MB RAM. Posix compatible, so the gcc toolchain works. The ARM port is a GSoC project. The video support comes from from ffmpeg, xiph, etc.

Regarding the MySQL BoFs, I think they can be summarized like this: the community is not going to wait for MySQL AB/Sun/Oracle to dick around any further.

Monty Program AB, Percona and Drizzle are going to have forks regardless of what Oracle does. Either the official MySQL documentation will be freed, or rewritten by Monty Program AB and Drizzle.

Typo3 CMS also had a community booth. They are the #1 European CMS with support for 38 languages.

I didn’t see much use for the “OSCamp” attendee-organized tracks personally. Whereas at the MySQL Conference the Percona Performance Conference was necessary to fix the broken speaker selection process that was weighed towards MySQL/Sun staff instead of productive community contributors, that wasn’t an issue this time around.

The talk on Perl and Unicode was pedantic (focusing on UTF-8 bit patterns, presumably for those needing to detect and fix corruption) but comprehensive, as Tom Christiansen was in attendance to provide up-to-the-minute comments and tips. perluniintro is very helpful.

The PHP Best Practices talk was informative, as the 2 presenters have worked as PHP programming consultants and seen how projects go wrong. They tend to use whatever PHP framework the client is using, and have nothing glowing to say about any particular one.

I’d say that the world of PHP frameworks (dozens) is even more fragmented than Perl (Catalyst, Mason, embperl, CGI::Application are the major ones), which is indeed astonishing. And ironic – since PHP is itself a templating language.

Stonehenge Consulting threw another of their famous drunkfests at a local bar for those wearing their neon yellow t-shirts. If you want to get hammered for free, this is always the spot. :)

I talked to Randy Ray a little about what can be done with svlug.pm considering that the South Bay is suburban and thus less centralized than a dense city. Stay tuned.

Other Perl lightning talks included:

  • Larry Wall’s son talking about black holes
  • Scott Smith talking about Getopt::Complete, which can do svn command-line style nested args
  • connie willis bellwether talked about Flocks and the hive mind as defined by 3 rules.
  • Don’t Blame Perl – It’s the programmer’s problem if they don’t use modules, scoping, comments, brevity, objects.
  • Cool Perl6 – hyperoperators (work on arrays) with a card game sample using extended-ASCII symbols.
  • svn is not totally useless – it pointed the world towards git.

The closing talk on linux economics seemed to be an eye-opener for the audience. Most cell carriers and OEMs are at a severe disadvantage to Apple in the apps market, so they may need linux (or Haiku) to mount any kind of response that makes financial sense.

In the conference wrap-up segment, Allison Randal and an O’Reilly rep fielded questions from the audience and answered in “Twitter mode” – single sentences less than 140 characters in length.

OSCON 2009 Speaker Presentation Files
youtube.com: oscon 2009 video clips
wikipedia: Monty Program AB
mtocker: Understanding the MySQL forks

SVLUG: The Parrot Virtual Machine, Allison Randal

Wednesday, June 3rd, 2009

Allison Randal gave an overview of the Parrot Virtual Machine, plus delved into the syntactic details of the PIR assembly language for the virtual machine. (around 1200 opcodes.)

Parrot is a virtual machine aimed at running all dynamic languages.

She’s the chief architect for the Parrot project, and is also the author of the Python port to Parrot, Pynie. Apparently the Python maintainers are happy to have help with language backend support.

Allison said that perhaps 50 dynamic languages are in some process of being ported to Parrot. Often they run up to 10x faster on Parrot than the original implementation.

One member said he knew of a commercial project that used Parrot as the language VM when the underlying chip or OS became obsolete and they needed to port to a more modern system.

PIR source is actually run through flex and yacc.

This was her third talk on Parrot in the Bay Area recently. They’re organized around her business meetings for the O’Reilly Open Source Convention, to be held in July in San Jose.

Besides working for O’Reilly Media, she is also working on her Ph.D. computer science at Bristol University in the UK.

Thanks again to Symantec for hosting SVLUG meetings.

OSCON 2008, Portland

Friday, July 25th, 2008

I attended the O’Reilly Open Source Conference, once again in Portland, Oregon.

Overall my impression was that the talks and vibe were oriented towards Web 2.0 primarily.

I would say that the talks were not as strong as previous years, but it’s easy to compensate for that with the “hallway track” and access to the original Open Source authors.

Several attendees used the EEE sub-notebook computer, and were happy with it as a email/browser tool.

Wednesday

PHP Taint Tool: It Ain’t a Parser

- CS’y effort at PHP parser for code analysis, reminds me of early days of Perl’s B tools
- not suitable for end-users

Write Beautiful Code (in PHP), Laura Thomson, Mozilla

- good general background on good programming practises
- not a lot of specifics about PHP, but available for questions

Hypertable, Doug Judd, Zevents

- HyperTable is a clone of Google’s BigTable, from public paper
- room was packed, some turned away
- still alpha, maybe beta in August
- preferred distributed filesystem is HDFS, works with others
- I recommend reading web site and then looking at the curt slides
- plans to do benchmarks with same hardware as Google has published.

Open Source Virtualization for People Who Feel Guilty About Using VMware So Much, andy michelle, EDA

- cute talk about VirtualBox, Xen and VMware
- Xen has weird nomenclature compared to other tools
- VMware wins on tools and polish
- showed screenshots of unreleased and alpha mgmt. tools.

Barely Legal XXX Perl, Jos Boumans, RIPE

- stunning and twisted example of overloading, short-circuiting, import-faking, whatever it takes to make a loaded module do something other than intended
- illustrates great flexibility of perl, for good or ill
- could be useful for things like testing harnesses, etc.
- motivated to win bet of $100 or 1 vertical meter of beer
- said it took 3 or 4 hours to complete.

I walked around the exhibits area.

Got a demo of Atlassian’s continuous integration (CI) tool, Bamboo. They’re also the vendors of JIRA issue tracker and Confluence wiki, which I’ve used before.

One company had a public Wii game happening.

Thursday

Scaling Databases with DBIx::Router, Perrin Harkins

Ultimate Perl Code Profiling, Tim Bunce (Shopzilla)

- talk and screenshots about NYT perl profiler


The New York Times Perl Profiler

Top 10 Scalability Mistakes, John Coggeshall (Automotive Computer Services)

- good overview of writing high-performance, maintainable Internet systems
- interesting opinion that scalability is not just about increasing performance. scalability can be about scaling up or down, performance or maintainability, etc.
- recommended php.ini settings list

Perl Lightning Talks

- popular with audience, attendees seemed to like all the talks
- Mail::ESMTP looks very interesting for testing and production

Code is Easy, People are Hard: Developing Meebo’s Interview Process, Elaine Wherry (meebo)

- struggled to find time, right approach to interview new candidates in 1996, likely at behest of VCs
- external recruiters hit-and-miss, conferences and jobs email link useless
- phase where non-founder employees doing interviews wanted a founder involved in interview process
- trying to preserve culture (finger rockets, social networking, 2 female founders, etc.)
- came up with process involving reading resumes, phone screens, and office “sim” that adds a new candidate within 3-6 weeks
- “sim” has 3 versions: office manager (plan to erect a meebo office sign), front-end engineer (write a JavaScript app), and back-end engineer (write a server) in 4 hours
- current goal is to keep interview time down to 8 hours per candidate over 10 days
- now up to about 40 employees
- my feeling was that their hiring process started off clueless due to inexperienced mgmt. and is still oriented towards junior engineers. Silicon Valley is full of expert engineers and it doesn’t take 8 hours to interview them.

BOF

mysql-sandbox

Giuseppe Maxia discussed and demoed his very useful mysql-sandbox utility for managing several versions and instances of MySQL on the same machine.

He wrote it for his testing work at MySQL AB. Very well received by attendees. This is a great example of what I call “anti-virtualization” – using ports instead of resource-intensive VMs.

MySQL Conference 2008 Presentation

State of the Onion Address, Larry Wall

- talk about Perl6, random anecdotes, etc.

Friday

Open Voices, Jim Zemlin (The Linux Foundation), Keith Bergelt (Open Invention Network), Karen Sandler (Software Freedom Law Center), Phil Robb (Hewlett Packard)

- panel discussion of various free software efforts, some little-known

An Illustrated History of Failure, Paul Fenwick (Perl Training Australia)

Paul gave an interesting talk on notable Software Failures and estimated a price tag for each. I had heard news reports of many of them, but it was interesting to hear an updated analysis of what really happened behind the scenes.

Thanks to Google for sponsoring the fairly good almost-gourmet lunches. Sure beats the O’Reilly lunchbags from the dot bomb days. (Everybody I know bailed and found a subway shop back then.)

Notes

- Burgerville popular with attendees, can upgrade combos to a shake.
- Red Lion hotel has a small cardio gym with 1 universal machine, no free weights, open til 11 pm
- WiFi password changed weekly, in middle of remodel, lobby just finished.
- There is a 24-Hour Fitness that is actually open 24 hours near downtown Portland. Has basketball court and 2-lane pool. $15 for non-member visitors.

OSCON 2008 Presentations

OSCON July, 2007 – Portland

Friday, July 27th, 2007

I attended the O’Reilly Open Source convention again, making it 10 years in a row. Once again it was held at the Convention Center near downtown Portland, a convenient light rail ride from the airport.

Like many experienced developers, I spent a lot of time in the “hallway track” talking to other developers and users, as well as in one of the 15 simultaneous talks.

The general consensus was that the talks were not as strong as in previous years (not even compared to the MySQL conference this year), but it’s worthwhile to me if I can get even one juicy nugget from each talk, or gain an understanding of a developing trend in programming or system administration.

Many of the presenters griped about there not being enough time to look at source code in a 40 minute talk.

For those who want a conference summary in a nutshell:

  • OpenID is popular
  • lucene and its REST interface have more mindshare than projects like Kinosearch, language-specific bindings, etc.
  • Yahoo! released the yslow browser plug-in for front-end performance evaluation
  • Perl: no ORM appears to be gaining the upper hand, though DBIx is respected. Tim Bunce would like to see a wrapper around JDBC for each scripting language. Alison Randal is updating the Perl license.
  • PHP: no good way to do vector reporting graphics, especially since IE doesn’t support SVG and Adobe is killing the Macromedia plug-in in December. PHP4 is being EOL’ed 8/8/8 so that the PHP developers can focus on 5 and 6 only.

Google was heavily recruiting at the conference. I ran into 3 recruiters, and there were even more in the Google booth.

Pretty good food for lunch, usually chicken or fish in some kind of red sauce with steamed veggies. Better than the wilted sandwich boxes from previous years that mainly got tossed out.

Tuesday nite

I arrived at the Convention Center in time for the evening Google Open Source awards. Happened to sit next to Zak and the 20 year-old OpenID guy, David Recordon, who won $5,000 and a colored, transparent, angular plastic trophy and base that we had fun stabbing each other with.

The OpenID Foundation is offering a $5,000 bounty to the first 10 OSI-approved projects that add OpenID support. Many programmers were busy adding it, including SocialText and others. (David works at Verisign.)

I walked over to the Doug Fir Lounge with a few guys, 2 of them Austrian. I had the halibut fish and chips and lemonade for $20 including tip. It was ok. They have a log cabin motif happening with a restaurant, patio and bar upstairs, and dance club downstairs, so ID is required to enter. Open from 7 am to 2:30 am every day, 1 503 231 WOOD.

Wednesday

Nagios

- general overview of features
- Event Broker most powerful, least used

Bigger and Faster
Rasmus Lerdorf

Rasmus did his usual “PHP is as secure as any other language”, and “pick on a PHP app and make it go faster” talk.

He said he’s still not a Y! Paranoid, but his work does often touch on PHP and web security.

He used to use httpload, but now prefers siege for load testing because it has support for cookies.

http://developer.yahoo.com/yslow/
Live HTTP Headers
APC

sla.ckers.org/forum/list.php?3
php.net/filter
xdebug.org/docs/profiler
talks.php.net/show/oscon07

xdebug
jeremiah san diego xss console author
scanmus.corp.yahoo.com

PHP and Ruby Envy
- NZ programmer on Silverlight CMS (BSD licensed)
- own object system in PHP5
- Ruby less available on web servers, less mindshare
- rolled his own PHP OO frameword apparently

Exhibits

- talked to Mark Finkle of mozilla.org
- said hi to Larry. He had the whole family there.

Trac
Vivek Khera

- he uses RT for public tickets, Trac internally
- doesn’t require much resources since only a few developers
- Trac is used on many Ruby/PHP projects
- gives you wiki/tickets/etc.
- modified BSD license

Afterwards mentioned:

- uses Trac in a BSD jail
- an alternative to Trac would be basecamp (or I guess Sourceforge software). See slashdot.org threads for more ideas.
- likes pfSense firewall as an appliance
- nagios alerts too much, and no good rule builder for multiple hosts
- own web framework called Rowdy (RWDE)
- software as complicated to install as RT should be treated as an appliance
- he submitted 6 related talks on software development environment, only 1 accepted.

Steve Souders
Chief Performance Yahoo!
souders@yahoo-inc.com
Exceptional Performance Group

http://developer.yahoo.com/performance

- IBM Page Detailer Pro
- yslow (crawls the DOM, not a packet sniffer)
- firebug
- jslint – The JavaScript Verifier

80-90% of end user response time is spent on the front-end. so optimize there.

14 Rules for a Better User Experience

1. make fewer HTTP requests
2. use a CDN
3. add an Expires header
4. gzip components – even JS and CSS
5. CSS at top
6. JavaScripts to bottom
7. avoid CSS expressions
8. make JS and CSS external
9. reduce DNS lookups
10. minify Javascript
11. avoid redirects
12. remove duplicate scripts
13. configure Etags – disable in most cases if load-balanced or multiple web servers
14. make AJAX cacheable

move JS to onload
remove bottom tabs
avoid redirects
images sprites
expires

Thursday

PHP Graphics
Luke Welling, OmniTI

Luke presented an overview of raster and vector graphics modules for PHP.

He prefers vector graphics, but there’s not many free options for doing that.

He feels that Yahoo! Finance and Google analytics sites are state of the art in presentation graphics with anti-aliasing, interactivity, esthetics, text and maps. I’d say that’s aiming a little low, but it’s a start I guess.

Perl Lightning Talks

  • Vani Raja did a talk on Yahoo! JS
  • Ask did his talk on qsmtpd again
  • talk on Test::More 3?
  • talk on task lists for hit and run volunteers
  • Schwern did one talk on making tea for 5 minutes, and one on “Blame Schwern” – just do it instead of waiting for permission
  • Tim Potter did a talk on a messaging standards effort for his employer, saying that the ANSI process was too slow and looking for an alternative
  • Andy Lester did a talk on ack
  • a talk on SVN::Notify
  • http://angerwhale.org/
  • Tim Bunce talked about DBD::Gofer Proxy and next-gen cross-scripting language DB API based on JDBC API
  • guitar song about #perl

YouTube: Perl Lightning Talks on Handycam by Schtonk

Perl Auction

Larry’s talk on comparative languages and Perl6. Sounds like we’ll be able to do something like foreach (1..infinity).

Full Text Search BOF
Peter Zaitsev
– based in London, England but often in SV
– uses Sphinx on several servers
– http://boardreader.com/ one TB of searchable data
– geneology is big on full-text search

- after insert, mysql full text gets slow, run optimize.
also, doing it at insert time causes index update per keyword
- gin or gyst for Postgresql 8.4?
- Michael Kimsal, SOLR
- hard disk space is free (enough for whatever indexing is required)
- mostly news search involves last 5 minutes of feed
- MessageOne stores email for lawyers to mine. They like to search, archive and expire. Mostly Exchange lusers, rarely Unix admins.
- Lucene and REST interface
- Monty says MySQL AB hired a programmer to work on search, but he’s working on another project now. They need somebody with a burning desire to make progress in an area like that, but they recognize the importance of search.
- Monty poured out free Finnish chocolate rum from a Pepsi bottle that was so powerful it scared most people. He said it was banned for 2 years in Finland because it was so addictive.

Sun BOF

- audience talked to senior Sun staff about Java and Solaris a little.
- free beer, cheese and crackers.

Friday

A bunch of Postgres people went to the Portland wine tasting on the river event in the afternoon.

Call for Software Whiteboard

OSCON07 Call for Software Whiteboard
flickr.com: Jeff Kubina’s OSCON 2007 Whiteboard set of tiles

3 Rules for Writing High-Performance Code

Monday, October 30th, 2006

Several years ago I attended a talk by Chip Salzenberg, a former Perl Pumpkin (lead maintainer), at OSCON.

One of his slides had these three rules for writing high-performance code:

  1. Don’t do it.
  2. Do it later.
  3. Let somebody else do it.

Simple rules, but they get powerful results.

I especially like “Don’t do it.”

  • Don’t walk that array – use a hash data structure. (Perl and JavaScript have hash built-in, C has it in Boost.)
  • Don’t do locking if serialization is not needed.
  • Don’t add more columns or surrogate keys to your database if not needed.
  • Don’t do that join.

CPUs have gotten faster over the years (disks haven’t), but server code still needs to be tight so that you can provide a better user experience (under 300 ms page creation time) with fewer boxes (save money, space, manpower and energy.)

Smile and “Just say No.” on your project.

Your users and DBA will thank you later.

Virtues of a Perl Programmer: Laziness, Impatience, Hubris

MySQL 4.x to Oracle 10g Migration Notes

Wednesday, August 16th, 2006

About once a year I need to migrate a medium-sized web application using MySQL to Oracle.

Usually it’s to prepare a successful intranet application for a move to a formally supported production environment – and in Silicon Valley, that usually means Oracle.

I was apologizing to Monty at OSCON this year about my latest migration project away from MySQL.

His surprising response was, “It’s not a bad thing. I don’t mind hearing about conversions because successful migrations demonstrate that people don’t have to worry about database lock-in.”

Here’s some notes on doing migrations.

Converting MySQL 4.x Apps to Oracle 10g

Conceptual Issues

  • Skills: Oracle has a steep learning curve for programmers unfamiliar with database transaction programming, so you should have at least one programmer who has worked with Oracle in the past and can write stored procedures. You will also need an experienced Oracle DBA for non-trivial projects.
  • Portability: decide if you want to maintain dual database support or not. It’s not difficult to do since 90% of SQL queries work the same, and it’s unlikely you’ve done anything tricky like stored procedures or views in MySQL since those are new features in 5.0. It’s nice to have dual-database support if your developers use notebook computers (MySQL is fairly light-weight), or if you want to market the software later (for example, most ISP hosting plans support MySQL only.)
  • Autocommit: decide if you want to use autocommit or not with Oracle. Usually not.
  • Performance: MySQL has a very limited query optimizer, resulting often in slow table scans. Oracle however has a great query optimizer and can optimize queries even with multiple subselects and OR clauses.

    If you do mass updates, consider checkpointing them (loop and commit) to avoid filling the Oracle redo log. A MyISAM UPDATE statement that touches all records can fill up the log in Oracle if not sized correctly. For the sames reason, Oracle TRUNCATE is much more efficient than doing a DELETE * FROM TABLE.

    MySQL has table types that have specialized features and performance, such as MyISAM/FullText, Innodb/Transactions, Heap, Blackhole, etc.

    Oracle has 3 storage engines (heap/parallel query, index-organized, external) with features on top of that. Heap is the default storage engine, index-organized is a B-tree optimized for compactness and quick access that can be several times faster for index and range queries than heap, often used for data warehousing.

  • MySQL supports database names, but Oracle is quite different … it uses one database but optionally supports multiple schemas based on userid
  • Schema: converting the database schema and data migration will likely be more difficult than doing the application source code changes. There are tools to help. Spectral Core sells Full Convert. Same with regression testing. MySQL can be accomodating about blank vs. null. vs 0000-00-00. Oracle is not, so loading data can be touchy.
  • Timezone: MySQL-based applications often use whatever timezone the MySQL server uses. Oracle recommends for performance reasons setting the database to UTC (+0000). Beginning with MySQL 4.1.3 and Oracle 9i, both databases have similar tz features, as both are based on the Olson timezone database. To do timezone conversion, MySQL uses CONVERT_TZ and Oracle uses FROM_TZ. Note that in MySQL 5.0, NOW() is a session timestamp (computed once so replication-safe), while SYSDATE() is a real-time timestamp and would be a different value on the slave.


    mysql> SELECT @@global.time_zone, @@session.time_zone; # show MySQL tz settings
    SYSTEM SYSTEM

    mysql> select * from time_zone_name; # see if tz database is loaded yet
    Name | Time_zone_id
    Africa/Abidjan | 1
    Africa/Accra | 2
    Africa/Addis_Ababa | 3

    sqlplus> SELECT * FROM v_$timezone_names;

  • Character Set: MySQL-based applications often use whatever character set that MySQL defaults to. When moving to Oracle, you likely want to add explicit support because production Oracle instances are usually set to AL32UTF8 these days (Oracle’s “UTF8″ character set is actually Unicode 2.0 from 8i days). You can see which character set a database or column is set to with:


    select * from v$nls_parameters where parameter in ('NLS_LANGUAGE',
    'NLS_TERRITORY', 'NLS_CHARACTERSET');
    select dump(mycolumn,1017) from mytable where rownum=1;

  • Case-sensitivity: MySQL generally does case-insensitive string comparisons if you don’t use the BINARY keyword, but Oracle is case-sensitive. MySQL database and table names are case-sensitive on Unix (but not Windows or Mac OS X HPFS+) because databases and tables are actually directories and files, but Oracle silently upper-cases them and appears case-insensitive.
  • Sequence Numbers: Oracle sequence numbers are not guaranteed to be sequential. Values are “lost” in a rollback, and will most likely be “lost” if cached sequence numbers are specified and there is a shutdown or panic or library age-out. Non-cached sequence numbers can cause a noticeable performance impact (a disk access), which is why the default is to cache 20 values per allocation. 1000 is commonly used for bulk loading.

    Some Oracle sequence factoids: sequences never rollback after being incremented, they can be non-numeric, and you can use multiple sequences per table in Oracle. (Oracle sequences are actually separate objects from tables.)

    To use sequences in Oracle, you can either specify NAME_OF_SEQ.NEXTVAL followed on the same $dbh with NAME_OF_SEQ.CURRVAL in 2 statements, or combine both with an INSERT … RETURNING … INTO statement.

    You can emulate MySQL’s autoincrement feature with an Oracle sequence and a trigger. This is documented in the blog posting How to Create Auto Increment Columns in Oracle.
    When migrating from MySQL to Oracle, you may want to consider dropping useless surrogate keys altogether, reducing the need for application code changes or creating sequences.

  • Trailing spaces: MySQL and Oracle handle trailing spaces in columns differently when doing string comparisons. Oracle’s NCHAR preserves trailing spaces, and NVARCHAR2 does not.
  • Performance: MySQL is a lightweight database that usually performs well with little planning. Updating a couple of rows and doing a select from a MyISAM table may take milliseconds in MySQL, but one second calendar time in Oracle without prior thought to using batch inserts in a single transaction, or batch sequence numbers.
  • NULL: In MySQL, the empty string may be inserted into a column and is not a NULL value. Oracle converts the empty string to NULL.
  • MySQL silently truncates input data when too wide for a column, but Oracle considers the column width to be a constraint and fails the insert or update.
  • It is smart to quickly migrate a representative sample of your migrated code with the converted Oracle schema to notice any surprise problems.

SQL Syntax Issues

  • the ANSI join syntax works in 10g, so that makes porting much easier than in the past. (The Oracle DBAs I have talked to said that the new features in 9i were not ready for prime-time.)
  • MySQL allows GROUP BY on any column. Oracle only allows GROUP BY on all of the columns in your query result set.
  • MySQL unix_timestamp() can be converted to an Oracle stored procedure
  • if you were using a MySQL database for scratch tables, you can do a similar thing in Oracle by declaring tables to be in a scratch tablespace, but in the same schema
  • should be able to use Oracle transactions and ACID, and remove MySQL LOCK TABLE and UNLOCK TABLE statements.
  • in MySQL, a database is a combination of a hostname and database name. In Oracle it’s a SID, and is defined in the tnsnames.ora configuration file.
  • MySQL LIMIT can be replaced with a subquery using ROWNUM in Oracle, for example SELECT * FROM (SELECT ROWNUM limit, … ORDER BY …) WHERE limit BETWEEN ? and ?. Note that MySQL LIMIT is 0-based but Oracle ROWNUM is 1-based.
  • Oracle object names (column, table, sequence) are 32 characters and by default case-insensitive. In MySQL, database and table names are just files, so case-sensitive on case-sensitive file systems like Unix
  • MySQL autoincrement columns will instead need a sequence in Oracle.
  • the optional MySQL AS alias statement keyword is not recognized in Oracle after a FROM clause table name… just delete it. (AS is valid after a column name.)
  • MySQL CONCAT can be rewritten as || in Oracle
  • MySQL syntax INSERT INTO table SET is not supported in Oracle.
  • MySQL syntax for batch INSERT (mutiple VALUE lists) is not supported in Oracle.
  • MySQL NOW() can be replaced with Oracle SYSDATE, or CURRENT_TIMESTAMP which works in both MySQL and Oracle. CURRENT_DATE is also portable.
  • MySQL SELECT on a datetime field for display without explicit formatting can be emulated with select to_char(SYSDATE,’YYYY-MM-DD HH24:MI:SS’) from dual;
  • MySQL EXPLAIN can be done 2 ways in Oracle: EXPLAIN PLAN FOR …; @$ORACLE_HOME/rdbms/admin/utlxpls.sql, or the DBA can create a plan_table so that in SQLPLUS you can type SET AUTOTRACE ON
  • MySQL CREATE TABLE … LIKE would need a stored procedure in Oracle to copy metadata. Oracle’s CREATE TABLE … AS SELECT does not copy indexes, triggers, constraints, tablespaces or sequences. Oh, and in Oracle, non-primary indexes must have unique names across tables because they are schema objects that do not actually belong to a table.
  • SELECT COUNT(*) FROM table_name is very fast with MySQL MyISAM tables, because the total row count is stored in the index. In Oracle, expect a much slower result as an index scan is required. An estimate is stored in NUM_ROWS which is updated after most non-INSERT table changes, but it can be wildly inaccurate.
  • To read blobs in Oracle, you will likely need to allocate memory for the result: $db->{LongReadLen}=500000; # Make sure buffer is big enough for BLOB
  • MySQL always sorts NULLs last, but Oracle ORDER … DESC sorts NULLs first, unless you specify NULLS LAST
  • queries returning the “top n” results are usually implemented in MySQL with ORDER … LIMIT, but in Oracle there are a number of ways of doing that with subselects, ROWNUM and RANK keywords.
  • The MySQL client program (mysql) is a fairly usable text-mode application. Oracle’s version, sqlplus, is inadequate for programmers. Consider using Oracle SQL Developer (Java, so kind of slow sometimes but does work on MacIntel machines), Squirrel, or Qwest TOAD instead.
  • MySQL’s LOAD DATA INFILE and SELECT INTO OUTFILE statements can be emulated in Oracle with SQL Loader and BCP external programs. Perl programmers can use the CPAN module Oracle::SQLLoader, although it is simplistic and needs more testing for customized control files.
  • Perl DBI’s $sth->rows() returns the row count from a SELECT result set in MySQL, but in Oracle does not. At best it will indicate -1 for failure and 0E0 for success, so do a COUNT(*) or loop over the result set with while and fetch for a row count. In Oracle, $sth->rows() is incremented as you do the fetch, often too late for your program logic.
  • Jeremy’s MyTop for MySQL has an analogous display in Oracle’s SQL Developer Reports .. DD Reports .. DB .. Top SQL.
  • MySQL’s best-effort statement-based replication and Oracle’s replication are very different. Oracle replication is done in a transaction across master and slave, so they stay in sync.

A sequence hack, that I would seldom recommend, to mimic MySQL autoinc in Oracle is this. Define one sequence in Oracle called APP_ID, have the app call APP_ID.NEXTVAL everywhere an autoinc is needed, regardless of table, and create an Oracle stored procedure called LAST_INSERT_ID that calls APP_ID.CURRVAL. That way you can minimize source code changes to your MySQL app.

The downside is that a busy app will soon be using very large numbers as IDs, perhaps needing wider columns, and making it difficult for humans to write or verbalize them, or even predict what the next value will be for a given table.

Please post your migration tips!

Oracle Globalization Whitepaper
Oracle Documentation
MySQL Documentation
Oracle: Welcome to the “2 Day DBA”
Planet MySQL (High Quality Blog Aggregator)
Oracle-Base: EXPLAIN PLAN Usage
Oracle SQL*Loader FAQ
SOLUTIONS TO COMMON SQL*LOADER QUESTIONS
BULLETIN: CACHING ORACLE SEQUENCES
OraFAQ: BLOBs
MySQL Manual: MySQL Server Time Zone Support
MySQL Manual: MySQL Date and Time Functions
Oracle Date Functions
Oracle Resumable Transactions
Write Time Zone Aware Code in Oracle
Speeding Up Index Access with Index-Organized Tables
Contact James if you need database conversion or Perl consulting.