Archive for the ‘MySQL’ Category

AOPA Expo 2008 in San Jose

Sunday, November 9th, 2008

I found out about AOPA Expo 2008 at the last minute on avweb.com, and spent half a day there Saturday.

First of all, sorry to out-of-town attendees for the showery weather today. San Jose is normally sunny, but not in winter.

The Expo was 3 days (Thursday, Friday and Saturday) for $55/day including seminars and exhibits.

Exhibits Hall

The Exhibits Hall was twice as big as I expected and a lot of fun. I’m used to sparse IT conference exhibits, but aviation exhibits are a welcome relief with lots of hardware: aircraft, wrap-around simulators, parts, interior fabrics - you name it.

I only had time to spend an hour looking at the exhibits, but could easily spend a day going booth-to-booth and trying everything out.

The Cessna 162 Skycatcher in purple was on display, as well as the DJet (and a Williams engine static display) and Epic.

Frasca had a wrap-around simulator, and there was a very nice glass cockpit Cessna 172 simulator with 3 almost 180 degrees of displays. It’s available for rent in Hayward for $65/hour.

Rolls-Royce had 2 engines on display, the A300 used in the Robinson R66 helicopter and a prototype for the A400.

Noticeably absent … Eclipse did not have a booth.

I listened to 2 seminars from trainers that I haven’t seen live before, and really wanted to: John and Martha King and Mike Busch.

Pilot Risk Management, John and Martha King

John and Martha King talked about managing flying risks in a systematic fashion.

It took John a minute to warm up, then he sounded just like his pilot training tapes. :)

They calculate that small-time GA flying is about as risky as motorcycle operation in the US. Half the audience personally knew a pilot killed in GA.

John then went into 3 war stories: IFR letdown in a Cessna 210 with no electricity due to ignoring maintenance near St. Paul in icing, poor takeoff decision at max. weight at a high DA airport (Lone Pine, 3680′), and an unlighted flight from Big Island to Oahu over the ocean at nite (forgot to pre-flight lights for a nite flight.)

They recommend using checklist nmenonics like PAVE CARE for reducing risks.

  • Pilot
  • Aircraft
  • enVironment
  • External Pressures
  • Consequences
  • Alternatives
  • Reality
  • External Pressures

Airplane Maintenance Management, Mike Busch

Mike Busch gave an awesome talk on airplane maintenance management.

Mike is an aviation maintenance author, trainer and businessman who is famous in the GA aviation community.

His latest venture is savvymx.com, which provides professional maintenance management for owners. They represent dozens of aircraft already.

(I’ve attempted to paraphrase what Mike said below, but any errors or omissions are my fault.)

40 years ago in GA’s hayday, there was an authorized Cessna, Piper or Beech service center on every field with specialist mechanics and a building full of parts on the shelf. Now GA maintenance facilities are merely a shadow of that, unless you’re talking jets.

He recommends 5 rules/secrets for affordable maintenance:

  1. Interview your mechanic/shop like you’re hiring an employee
  2. Inspection, Discrepancies, Approval in writing
  3. Don’t fix what’s not broken
  4. Pilot needs to troubleshoot before mechanic can fix anything
  5. one other …

The aircraft owner is the manager, the mechanic takes orders, and there must be a business-like relationship. Otherwise, find another shop.

However, while giving his SavvyAviation talks, he’s noticed that some people either don’t want to make the time, effort or be assertive enough to actually do the mgmt. needed.

95% of aircraft components can and should be maintained on-condition, meaning periodically inspected and replaced as needed. Examples are tires, some actuators, etc.

The remaining 5% are things maintained on a time-based schedule, like magnetos and hoses which are difficult to inspect.

When approving aircraft repairs, terminology is very important. The terms repair, overhaul and rebuild mean very different things. Normally what one wants is a repair (fix just what’s broken directly and as cheaply as possible), and not an overhaul (blindly follow an overhaul checklist from beginning to end and change and test everything, broken or not.)

He says that TBO is a psychological limit, not a maintenance one. His P210 is currently 1100 hours past TBO, FWIW.

Mike recommends using the most direct method for monitoring and troubleshooting aircraft: engine monitors, oil and filter analysis, borescope. He calls this “21st century analysis”, while old techniques like magneto RPM-drop and cylinder compression tests are “Orville and Wilbur Wright analysis.”

He uses Blackstone Laboratories for oil analysis. Unfortunately, many shops don’t have borescopes, and even when they do, nobody with training to interpret the image, since studying borescopes is not required for A&P.

In newer airplanes, there is no separation between airframe and electronics, so you need a shop that can handle integrated maintenance, like Woodgreen in SoCal.

He used a black Asus EEE PC to show his slides.

Robinson R66: a preview by Philip Greenspun, Feb. 2008
flyingmag.com: Learning to Use an IFR Rating

OSCON 2008, Portland

Friday, July 25th, 2008

I attended the O’Reilly Open Source Conference, once again in Portland, Oregon.

Overall my impression was that the talks and vibe were oriented towards Web 2.0 primarily.

I would say that the talks were not as strong as previous years, but it’s easy to compensate for that with the “hallway track” and access to the original Open Source authors.

Several attendees used the EEE sub-notebook computer, and were happy with it as a email/browser tool.

Wednesday

PHP Taint Tool: It Ain’t a Parser

- CS’y effort at PHP parser for code analysis, reminds me of early days of Perl’s B tools
- not suitable for end-users

Write Beautiful Code (in PHP), Laura Thomson, Mozilla

- good general background on good programming practises
- not a lot of specifics about PHP, but available for questions

Hypertable, Doug Judd, Zevents

- HyperTable is a clone of Google’s BigTable, from public paper
- room was packed, some turned away
- still alpha, maybe beta in August
- preferred distributed filesystem is HDFS, works with others
- I recommend reading web site and then looking at the curt slides
- plans to do benchmarks with same hardware as Google has published.

Open Source Virtualization for People Who Feel Guilty About Using VMware So Much, andy michelle, EDA

- cute talk about VirtualBox, Xen and VMware
- Xen has weird nomenclature compared to other tools
- VMware wins on tools and polish
- showed screenshots of unreleased and alpha mgmt. tools.

Barely Legal XXX Perl, Jos Boumans, RIPE

- stunning and twisted example of overloading, short-circuiting, import-faking, whatever it takes to make a loaded module do something other than intended
- illustrates great flexibility of perl, for good or ill
- could be useful for things like testing harnesses, etc.
- motivated to win bet of $100 or 1 vertical meter of beer
- said it took 3 or 4 hours to complete.

I walked around the exhibits area.

Got a demo of Atlassian’s continuous integration (CI) tool, Bamboo. They’re also the vendors of JIRA issue tracker and Confluence wiki, which I’ve used before.

One company had a public Wii game happening.

Thursday

Scaling Databases with DBIx::Router, Perrin Harkins

Ultimate Perl Code Profiling, Tim Bunce (Shopzilla)

- talk and screenshots about NYT perl profiler


The New York Times Perl Profiler

Top 10 Scalability Mistakes, John Coggeshall (Automotive Computer Services)

- good overview of writing high-performance, maintainable Internet systems
- interesting opinion that scalability is not just about increasing performance. scalability can be about scaling up or down, performance or maintainability, etc.
- recommended php.ini settings list

Perl Lightning Talks

- popular with audience, attendees seemed to like all the talks
- Mail::ESMTP looks very interesting for testing and production

Code is Easy, People are Hard: Developing Meebo’s Interview Process, Elaine Wherry (meebo)

- struggled to find time, right approach to interview new candidates in 1996, likely at behest of VCs
- external recruiters hit-and-miss, conferences and jobs email link useless
- phase where non-founder employees doing interviews wanted a founder involved in interview process
- trying to preserve culture (finger rockets, social networking, 2 female founders, etc.)
- came up with process involving reading resumes, phone screens, and office “sim” that adds a new candidate within 3-6 weeks
- “sim” has 3 versions: office manager (plan to erect a meebo office sign), front-end engineer (write a JavaScript app), and back-end engineer (write a server) in 4 hours
- current goal is to keep interview time down to 8 hours per candidate over 10 days
- now up to about 40 employees
- my feeling was that their hiring process started off clueless due to inexperienced mgmt. and is still oriented towards junior engineers. Silicon Valley is full of expert engineers and it doesn’t take 8 hours to interview them.

BOF

mysql-sandbox

Giuseppe Maxia discussed and demoed his very useful mysql-sandbox utility for managing several versions and instances of MySQL on the same machine.

He wrote it for his testing work at MySQL AB. Very well received by attendees. This is a great example of what I call “anti-virtualization” - using ports instead of resource-intensive VMs.

MySQL Conference 2008 Presentation

State of the Onion Address, Larry Wall

- talk about Perl6, random anecdotes, etc.

Friday

Open Voices, Jim Zemlin (The Linux Foundation), Keith Bergelt (Open Invention Network), Karen Sandler (Software Freedom Law Center), Phil Robb (Hewlett Packard)

- panel discussion of various free software efforts, some little-known

An Illustrated History of Failure, Paul Fenwick (Perl Training Australia)

Paul gave an interesting talk on notable Software Failures and estimated a price tag for each. I had heard news reports of many of them, but it was interesting to hear an updated analysis of what really happened behind the scenes.

Thanks to Google for sponsoring the fairly good almost-gourmet lunches. Sure beats the O’Reilly lunchbags from the dot bomb days. (Everybody I know bailed and found a subway shop back then.)

Notes

- Burgerville popular with attendees, can upgrade combos to a shake.
- Red Lion hotel has a small cardio gym with 1 universal machine, no free weights, open til 11 pm
- WiFi password changed weekly, in middle of remodel, lobby just finished.
- There is a 24-Hour Fitness that is actually open 24 hours near downtown Portland. Has basketball court and 2-lane pool. $15 for non-member visitors.

OSCON 2008 Presentations

DRBD and MySQL: Just Say No

Sunday, April 20th, 2008

I’ve successfully used MySQL statement-based replication for several years across data centers and understand it’s quirks.

While at the MySQL Conference, I tried to see how DRBD could help the installations I manage, but I just can’t drink the DRBD Kool-Aid.

MySQL Replication Pluses

  • Free
  • Easy to setup if you already have a backup and master position
  • No shared storage to manage or corrupt
  • Light network load
  • Can use master for r/w and slaves for r.
  • can do maintenance on slave (ALTER TABLE, etc.) and failover afterwards
  • works well across Internet even with high-latency
  • many replication problems simple and hand-fixable

MySQL Replication Minuses

  • Slaves can/will get out of sync with the master, typically noticed after a few weeks or with Maatkit
  • Changing masters requires rebuilding slaves
  • There is always some replication lag when there is a busy master
  • no checksums or 2-phase commit

DRBD is a low-level driver to copy a disk partition in near real-time from a master to a failover node (cold standby.)

MySQL with DRBD Pluses

  • Free
  • No fsck or transaction log replay needed if manual failover.
  • Slaves don’t need SET MASTER updated unless DRBD fails.

MySQL with DRBD Minuses

  • DRBD partition corruption means failover node would be unusable (disadvantage of shared storage) and failback could destroy original master too.
  • if the master panics, then after failover both fsck and transaction logs replay must be performed
  • more work to setup initially than statement-based replication
  • NIC and network corruption is also propagated.
  • Failover node is a cold standby, cannot accept database traffic if that would change the DRBD partition
  • Could generate a lot of network traffic.
  • cannot do maintenance on cold standby database
  • 2 heartbeats needed on a reliable, local network

I can see how MySQL/DRBD would be appealing for those who operate on a reliable network and don’t need Master-Master for load or maintenance, or who have many slaves that cannot easily be rebuilt.

Eric Bergen: DRBD in the real world.

MySQL Conference 2008

Thursday, April 17th, 2008

I attended the MySQL Conference once again at the Santa Clara Convention Center.

Despite the January purchase by Sun, the conference had the same great vibe as usual, and everybody showed up again.

Top Conference Themes

Some of the conference themes I noticed are:

  1. Linux LVM snapshots are now popular for MySQL backups. Snapshots have long been used in enterprise IT, now it works well and for free on Linux. Another use for a snapshot backup is to copy a busy master offline for comparing to a slave with Maatkit mk-table-checksum.
  2. DRBD is popular for HA (one speaker wondered if his talk about DRBD was still relevant since everybody was already using it), but I see some drawbacks.
  3. developers are concerned with supporting massively multi-core CPUs in both database and cache code. The Sun Niagara multi-core architecture seems to be the future, with 128 or more threads.
  4. databases and storage are quickly increasing in size, so many DBAs are interested in MySQL 5.1 partitioning and other tools and techniques.
  5. cloud computing is entering common use with lots of Amazon EC2 and S3 users. mosso.com has been available for a couple years from Rackspace, and Google and IBM are entering cloud computing. Users would like more competition to reduce prices and improve reliability, SLA or not.
  6. most companies say it’s hard to find experienced MySQL DBAs, but most are lazy when it comes to training and compensation.

Top MySQL Conference Tips

  1. Disable swap if your version of Linux supports it (most do.) This avoids getting some of MySQL’s pages swapped out and crippling the box with IO.
  2. Use memcached. MySQL User Defined Functions (UDFs) for memcached are now available to auto-populate memcached from MySQL statements.
  3. Consider multi-level partitioning schemes with MySQL 5.1, like combining RANGE and KEY.
  4. Consider MySQL High Availability (HA), either with replication, DRBD, or both. Linux HA project.

Top MySQL Conference Misconceptions

I had to straighten out a lot of newbs:

  1. statement-based replication is not reliable, does not have checksums or two-phase commit, and masters and slaves tend to diverge over time
  2. many novices believe Innodb does row-locking only, but often does range and table locking
  3. mysqldump is ok in most cases, but you have to be careful with locking and locktime, matching charsets, and testing the dump.

Here’s my notes on some of the talks I attended. (In case you haven’t read my blog before, I’m a long-time user of MySQL, replication, LVM and memcached. I have not used DRBD.)

If you have a correction or improvement, please leave a comment and I will update this blog entry.

Monday (Tutorials)

Building Scalable & High Performance Datamarts with MySQL, Tangirala Sarma

Discussed general DW concepts at first.

3 main requirements for a succesful DW project are:

  1. good data quality
  2. the right tools
  3. phased results.

Talked about various partitioning schemes in MySQL 5.1. I’ve used it for about 6 months in one project, but most of the audience was new to MySQL partitioning and struggled to understand beyond RANGE partitioning for logging it seems.

Later talked about MySQL-particular DW aids, including:

  • Kickfire appliance, which has capabilities such as column-store, compression and fast loading.
  • Infobright, which also has column-store and fast loading.
  • Nitro

His recommended references are:

  • DW Toolkit, Kimball on Amazon.com
  • Enterprise DW with MySQL, MySQL AB
  • MySQL Roadmap 2008-2009, MySQL AB.

Also, there’s a list of books here:

DW and BI Starter Books

Queued up for sandwiches and salad. Not really a surprise with O’Reilly as the conference organizer, expected more.

Ate lunch with the DRBD programmers. They said that MySQL AB now provides 1st- and 2nd-level support, and their company provides 3rd-level support and cashes checks. :)

Memcached and MySQL: Everything You Need To Know, Brian Aker (MySQL), Alan Kasindorf (Six Apart)

Very detailed talk about tips and issues with using memcached.

  • Evolving online notes
  • Brian wrote about 30 man pages for memcached, edited by Mark Atwood. Unusual amount for an Open Source project.
  • memcached is very handy for people stuck with databases on 32-bit systems and a lot of otherwise unaddressable memory.
  • Patrick from Grazr has written MySQL UDFs to populate memcached, has a SoC student. Pipelines cache inserts. Handy for distributed DCs already using replication, triggers.
  • Postgresql has pgmemcache()
  • lighthttpd has mod_memcache, prolly url hash key
  • Apache has mod_memcached with CAS, GET/PUT/DELETE, still alpha, try at pandoraport.com
  • limits: key size 250 bytes, data size 1 MB, 32/64 bit limits
  • threading is new based on giant mutex, bad for more than 8 cores
  • stop swapping with -MLOCKALL, noswap, sizing
  • stats sizes to test efficiency
  • command line option to disable LRU
  • CRC most consistent hashing, normal
  • ring consistent hash
  • IP takeover
  • bad switches or intermittent network is very bad
  • pick a driver than can do multiget
  • ghetto lock
  • Tim Bunce’s Cache::Memcached::Libcache does not do Storable, which is what most people prolly want
  • time in seconds < 30 days is relative, > 30 days is absolute
  • namespace trick: versions in key name
  • uint32_t type parameter usually indicates whether compressed or storable, can be used for anything though.
  • memcached_tool: memcp, memrm, memstat, memslap (load testing)
  • showed Mixi MRTG graphs: 6800 reads/second, 200 servers, no CPU load
  • Brian added IPV6 support after mysqld update (pet project, but helped with multi-interface support and to optimize out name resolution)
  • binary protocol code available, not merged yet, helps with multi-byte charsets than embed spaces or newlines which break the text protocol
  • improvements needed are durable to disk, highly-threaded
  • persistent connections are good and recommended
  • UDP alpha code available, good for lots of sets
  • storing BLOBs with MogileFS or LUSTRE good
  • speakers did not have experience with commercial caches, but did say that most people find Java caches often too featureful and slow
  • MogileFS, Hypertable, Hbase interesting

Tuesday

EXPLAIN Demystified, Baron Schwartz (Percona Inc.)

Room was packed for this talk. Good step-by-step talk for understanding EXPLAIN better.

Replication Tricks and Tips, Lars Thalmann (MySQL), Mats Kindahl (MySQL)

Some good tips, but overall an assumption is made by MySQL AB that a MySQL master and slave actually have the same data.

  • mysqlbinlog has –hexdump option for seeing byte-level dump
  • examine both binlog and relay log when debugging replication
  • you can clone a slave from another slave if you trust it - just do STOP SLAVE, SHOW SLAVE STATUS, shutdown, copy over the files, SET MASTER, and START SLAVE

Dramatically Improving MySQL Database Performance in Data Warehouse Applications, Martin Farach-Colton (Tokutek)

His view is that the storage engine is the primary bottleneck in BI systems for loading and search, and showed how to use a B-tree to organize data to guarantee the maximum performance in a growing DW. (Not sure why the online summary talks about fractal trees.)

How to Achieve Operational BI on a Budget, Lance Walter (Pentaho Corporation)

Kind of overly Pentaho-product oriented, since Lance is a product manager for Pentaho.

However he made a useful distinction between historical and operational BI.

Historical BI is mainly reporting on what happened before yesterday, and operational is up-to-the-minute for business process analysis and improvement.

By designing your BI system for both historical and operational requirements, you can get both at the same time.

Interesting case study of the US Navy doing BI on pilot training and operations to reduce accident risk

MySQL Backup BOF (hosted by Zmanda rep)

- admins attending generally unaware of LVM, but awareness growing
- one guy split his database across multiple databases for easier mgmt. and uses a HP SAN with 1 TB of RAM, very happy with IO performance, not so happy with price.
- one guy using 100 EC2 instances and S3. ok except for recent outages, maybe a little pricey.
- one guy using dd for fast network copies
- Zmanda just integrates existing techniques and allows scheduling, but prolly quite useful for inexperienced DBAs to do point-in-time recovery and schedule backups.

MARIA BOF

Hosted by Monty with his usual wicked Finnish black vodka.

He said Falcon was supposed to take 3 months, but slipped, so work started on MARIA to replace MyISAM. He has made promises to deliver a working storage engine, so will continue and do so and prolly release MARIA in 6 months.

All of the MARIA programmers were required to read Jim Gray’s textbook Transaction Processing: Concepts and Techniques on Amazon.com, and each have said they understand it.

I got the impression that MARIA should end up with a cleaner codebase than Innodb.

Discussed table-level checksums for replication checking, which he is planning to do, might be an option. Still time to decide whether to use CRC32 or another algorithm.

One of his programmers gave a demo of yanking power on MARIA-current on his notebook computer, though I didn’t notice what the outcome was.

He said that ALTER TABLE optimizations are planned, including instantly dropping an index without copying the table, though add would still require copying the table.

Another conversation could be paraphrased as, “After MARIA, he will work with the Sun team that fixed Postgresql threads to fix various thread scaling problems in MySQL.”

Sphinx Search BOF

Hosted by Peter Zaitsev and Andrew Aksyonoff, the author.

Percona has used Sphinx Search in a few projects for web forum searching for the past few years. Separate index server is recommended.

About a dozen people talked about full-text search requirements and experience with Sphinx. One guy was spending a lot of money on encad(?) and didn’t want to spend more on a bigger license later.

Wednesday

MySQL Performance Under a Microscope: The Tobias and Jay Show, Tobias Asplund (MySQL), Jay Pipes (MySQL)

Presented slides on the performance of various workloads.

The MySQL Query Cache, Baron Schwartz (Percona Inc.)

Excellent in-depth discussion of the query cache.

Grazr: Lessons Learned Building a Web 2.0 Application Using MySQL, Patrick Galbraith (Grazr Inc.), Michael Kowalchik (Grazr Corporation)

Deadlocks, Wait Timeouts, and Other Transaction Issues, Jess Balint (MySQL)

Thursday

DTrace and MySQL, Ben Rockwood (Joyent Inc)

Good talk on using DTrace specifically with MySQL, mainly query debugging.

Scaling with MySQL using Materialized Views and a Shared Everything architecture, Moshe Shadmon (ScaleDB)

Listed 3 ways to do materialized views, but dwelt on their ScaleDB cluster product mostly.

High Availability MySQL with DRBD and Heartbeat: MTV Japan Mobile Services, Patrick Bolduan (MTV Networks Japan KK), Yoshinori Matsunobu (MySQL)

Talked about setting up HA using heartbeat, pingd and DRBD on a mostly-reads 5 GB CMS db. Used Enterpeise MySQL distro and support. No replication involved, likes mysqldump. Happy with MySQL 5.0 and Unicode with Japanese. Cute slides with Japanese maru symbols, etc. DBRD staff were on-hand to help with more difficult questions. Said LVM can be used under or over DRBD.

The Science and Fiction of Petascale Analytics, Jacek Becla (SLAC)

Talked about petabyte and exabyte DW for physics and astro programs. Contrasted science and industry PB databases (Google, MSN and Yahoo! likely each have 100 PB databases, but don’t disclose the size.) 5 years to plan DW for next experiment.

Spent 10 minutes with Rohit Nadhani and his programmer from Webyog looking at their MonYOG 2.01 version. Provided UI feedback for database operations use based on several months usage. Looking good.

Talked with Patrick Bolduan, somebody from DRBD and a NY Times IT guy who uses EC2 in the lounge. Apparently S3 can lose up to 10% of insert requests. Everybody is looking forward to when Amazon EC2 and S3 have competition for both pricing and reliability improvements.

Conference Evaluation and Recommendations

I go to a lot of conferences as a paying attendee, so I usually provide feedback to the organizers to help them improve the experience.

I mentioned to Jay that overall the conference was fine:

  • Talks were good, maybe less technical than previous years. Jay said Sun wanted more talks for novices since 2007 was too hardcore for some new attendees, but O’Reilly did not want to ghettoize newbs with a single track and room. For sure 2007 had too many sharding talks, mostly with users stuck on 4.0. There’s always the hallway track with developers anyway.
  • Food was ok but not great, although O’Reilly, the conference organizer, slipped in sandwiches for lunch on tutorial day. (SCCC is in an isolated location, so food is a big deal.) Still way better than OSCON in recent years. I guess the original MySQL conferences at the DoubleTree spoiled me.
  • Some of the vendors got too salesy in their presentations, but it’s hard to crack down on sponsors. Pentaho and ScaleDB come to mind.
  • conference still provides full access to MySQL managers and key programmers, in the best Open Source tradition
  • Still need a big-iron room with functioning demo SANs and HA setups, as I’ve suggested for a few years.

MySQL Conference 2008 Presentations
TechCrunch: Rackspace Offers Cloud Computing with Mosso
cnet.com: Is cloud computing more than just smoke?

MySQL and Choosing a Random Row From Unnumbered Sets

Tuesday, April 8th, 2008

Choosing a random row from a table using SQL is a common task.

For small tables and when you don’t have a numeric key or don’t know the maximum value in advance, SELECT * FROM test ORDER BY RAND() LIMIT 1 is adequate. However, this involves calculating RAND() for each input row, followed by a sort.

When there is a numeric key from 1 to N and there are no gaps (ineligible records), a more optimized method is SELECT * FROM test WHERE id >= CEIL(N * RAND()) LIMIT 1.

A more difficult scenario is when there is no numeric key, there are rows in the table that are ineligible, and you don’t want to sort. In this case an artificial rownum column can be computed and used in the selection calculation:

SELECT id FROM (SELECT id, @n := @n+1 rownum FROM test, (SELECT @n := 0, @r := RAND()) d WHERE 1) x WHERE rownum = CEIL(@n * @r);

Replace “WHERE 1″ above with whatever conditions you need.

Xaprb: How to number rows in MySQL
Perl Cookbook: 8.6. Picking a Random Line from a File

Ghetto MySQL Innobackup with rsync

Saturday, November 3rd, 2007

I was reading an interesting samba mailing list comment about using rsync on live MySQL databases.

The author said this:

“Assuming a short break in accessibility is tolerable, I’d

  1. run rsync to the backup
  2. stop the server
  3. run rsync to the backup (should be much much faster now)
  4. restart the server.”

Combining rsync and mysqlhotcopy we can get a little fancier:

Ghetto Innobackup-style backup with rsync

  1. STOP SLAVE; FLUSH TABLES
  2. run rsync to the backup
  3. FLUSH TABLES WITH READ LOCK; SHOW SLAVE STATUS;SHOW MASTER STATUS
  4. run rsync to the backup (should be much much faster now)
  5. UNLOCK TABLES
  6. START SLAVE

Note that the read lock and unlock must be done while on the same database connection, and innodb continues to update indexes even when read-locked.

Also, record the master and slave status values. They may be very useful later if you want to apply binlogs to the backup, or initialize a slave.

This technique would be very suitable for non-critical snapshots like QA copies and on quiet databases.

It may be suitable for busy databases if other methods aren’t working out, for instance you don’t have LVM snapshots setup and innobackup is locking your MyISAM tables too long.

rsync -a is also useful for backing up master binlogs every 5 minutes on a live site. Normally you’re better off setting up a slave just running the slave IO thread, though.

Many databases have features to allow “log shipping.” With MySQL, similar functionality is accomplished by doing FLUSH LOGS and rsync, or using replication (there is a command to not execute the replication stream, just save it to disk.)

FLUSH NO_WRITE_TO_BINLOG LOGS
FLUSH TABLES WITH READ LOCK

OSCON July, 2007 - Portland

Friday, July 27th, 2007

I attended the O’Reilly Open Source convention again, making it 10 years in a row. Once again it was held at the Convention Center near downtown Portland, a convenient light rail ride from the airport.

Like many experienced developers, I spent a lot of time in the “hallway track” talking to other developers and users, as well as in one of the 15 simultaneous talks.

The general consensus was that the talks were not as strong as in previous years (not even compared to the MySQL conference this year), but it’s worthwhile to me if I can get even one juicy nugget from each talk, or gain an understanding of a developing trend in programming or system administration.

Many of the presenters griped about there not being enough time to look at source code in a 40 minute talk.

For those who want a conference summary in a nutshell:

  • OpenID is popular
  • lucene and its REST interface have more mindshare than projects like Kinosearch, language-specific bindings, etc.
  • Yahoo! released the yslow browser plug-in for front-end performance evaluation
  • Perl: no ORM appears to be gaining the upper hand, though DBIx is respected. Tim Bunce would like to see a wrapper around JDBC for each scripting language. Alison Randal is updating the Perl license.
  • PHP: no good way to do vector reporting graphics, especially since IE doesn’t support SVG and Adobe is killing the Macromedia plug-in in December. PHP4 is being EOL’ed 8/8/8 so that the PHP developers can focus on 5 and 6 only.

Google was heavily recruiting at the conference. I ran into 3 recruiters, and there were even more in the Google booth.

Pretty good food for lunch, usually chicken or fish in some kind of red sauce with steamed veggies. Better than the wilted sandwich boxes from previous years that mainly got tossed out.

Tuesday nite

I arrived at the Convention Center in time for the evening Google Open Source awards. Happened to sit next to Zak and the 20 year-old OpenID guy, David Recordon, who won $5,000 and a colored, transparent, angular plastic trophy and base that we had fun stabbing each other with.

The OpenID Foundation is offering a $5,000 bounty to the first 10 OSI-approved projects that add OpenID support. Many programmers were busy adding it, including SocialText and others. (David works at Verisign.)

I walked over to the Doug Fir Lounge with a few guys, 2 of them Austrian. I had the halibut fish and chips and lemonade for $20 including tip. It was ok. They have a log cabin motif happening with a restaurant, patio and bar upstairs, and dance club downstairs, so ID is required to enter. Open from 7 am to 2:30 am every day, 1 503 231 WOOD.

Wednesday

Nagios

- general overview of features
- Event Broker most powerful, least used

Bigger and Faster
Rasmus Lerdorf

Rasmus did his usual “PHP is as secure as any other language”, and “pick on a PHP app and make it go faster” talk.

He said he’s still not a Y! Paranoid, but his work does often touch on PHP and web security.

He used to use httpload, but now prefers siege for load testing because it has support for cookies.

http://developer.yahoo.com/yslow/
Live HTTP Headers
APC

sla.ckers.org/forum/list.php?3
php.net/filter
xdebug.org/docs/profiler
talks.php.net/show/oscon07

xdebug
jeremiah san diego xss console author
scanmus.corp.yahoo.com

PHP and Ruby Envy
- NZ programmer on Silverlight CMS (BSD licensed)
- own object system in PHP5
- Ruby less available on web servers, less mindshare
- rolled his own PHP OO frameword apparently

Exhibits

- talked to Mark Finkle of mozilla.org
- said hi to Larry. He had the whole family there.

Trac
Vivek Khera

- he uses RT for public tickets, Trac internally
- doesn’t require much resources since only a few developers
- Trac is used on many Ruby/PHP projects
- gives you wiki/tickets/etc.
- modified BSD license

Afterwards mentioned:

- uses Trac in a BSD jail
- an alternative to Trac would be basecamp (or I guess Sourceforge software). See slashdot.org threads for more ideas.
- likes pfSense firewall as an appliance
- nagios alerts too much, and no good rule builder for multiple hosts
- own web framework called Rowdy (RWDE)
- software as complicated to install as RT should be treated as an appliance
- he submitted 6 related talks on software development environment, only 1 accepted.

Steve Souders
Chief Performance Yahoo!
souders@yahoo-inc.com
Exceptional Performance Group

http://developer.yahoo.com/performance

- IBM Page Detailer Pro
- yslow (crawls the DOM, not a packet sniffer)
- firebug
- jslint - The JavaScript Verifier

80-90% of end user response time is spent on the front-end. so optimize there.

14 Rules for a Better User Experience

1. make fewer HTTP requests
2. use a CDN
3. add an Expires header
4. gzip components - even JS and CSS
5. CSS at top
6. JavaScripts to bottom
7. avoid CSS expressions
8. make JS and CSS external
9. reduce DNS lookups
10. minify Javascript
11. avoid redirects
12. remove duplicate scripts
13. configure Etags - disable in most cases if load-balanced or multiple web servers
14. make AJAX cacheable

move JS to onload
remove bottom tabs
avoid redirects
images sprites
expires

Thursday

PHP Graphics
Luke Welling, OmniTI

Luke presented an overview of raster and vector graphics modules for PHP.

He prefers vector graphics, but there’s not many free options for doing that.

He feels that Yahoo! Finance and Google analytics sites are state of the art in presentation graphics with anti-aliasing, interactivity, esthetics, text and maps. I’d say that’s aiming a little low, but it’s a start I guess.

Perl Lightning Talks

  • Vani Raja did a talk on Yahoo! JS
  • Ask did his talk on qsmtpd again
  • talk on Test::More 3?
  • talk on task lists for hit and run volunteers
  • Schwern did one talk on making tea for 5 minutes, and one on “Blame Schwern” - just do it instead of waiting for permission
  • Tim Potter did a talk on a messaging standards effort for his employer, saying that the ANSI process was too slow and looking for an alternative
  • Andy Lester did a talk on ack
  • a talk on SVN::Notify
  • http://angerwhale.org/
  • Tim Bunce talked about DBD::Gofer Proxy and next-gen cross-scripting language DB API based on JDBC API
  • guitar song about #perl

YouTube: Perl Lightning Talks on Handycam by Schtonk

Perl Auction

Larry’s talk on comparative languages and Perl6. Sounds like we’ll be able to do something like foreach (1..infinity).

Full Text Search BOF
Peter Zaitsev
- based in London, England but often in SV
- uses Sphinx on several servers
- http://boardreader.com/ one TB of searchable data
- geneology is big on full-text search

- after insert, mysql full text gets slow, run optimize.
also, doing it at insert time causes index update per keyword
- gin or gyst for Postgresql 8.4?
- Michael Kimsal, SOLR
- hard disk space is free (enough for whatever indexing is required)
- mostly news search involves last 5 minutes of feed
- MessageOne stores email for lawyers to mine. They like to search, archive and expire. Mostly Exchange lusers, rarely Unix admins.
- Lucene and REST interface
- Monty says MySQL AB hired a programmer to work on search, but he’s working on another project now. They need somebody with a burning desire to make progress in an area like that, but they recognize the importance of search.
- Monty poured out free Finnish chocolate rum from a Pepsi bottle that was so powerful it scared most people. He said it was banned for 2 years in Finland because it was so addictive.

Sun BOF

- audience talked to senior Sun staff about Java and Solaris a little.
- free beer, cheese and crackers.

Friday

A bunch of Postgres people went to the Portland wine tasting on the river event in the afternoon.

Call for Software Whiteboard

OSCON07 Call for Software Whiteboard
flickr.com: Jeff Kubina’s OSCON 2007 Whiteboard set of tiles

FrontBase RDBMS

Friday, July 6th, 2007

I came across the Frontbase RDBMS recently. Their 288-page users manual was so nice and clear I decided to read through it.

The FrontBase project was started in 1996 for Copenhagen’s 911 service (MySQL has similar practical origins, written to support retail data warehousing contracts). They claim it’s a “database designed for the Internet” and requires zero maintenance. Currently they’re on version 4.2.

It’s a proprietary, closed-source database written in ANSI C, but free for most users. They currently charge for support and consulting, and sell licenses for embedded users. The FBAccess client library source is available on request for programmers to add support to their oses and devices.

It seems to have traction in the Mac community, likely because of the nice Mac client program and FileMaker and RealBasic support.

What’s nice is that it supports transactions and full-text search at the same time, unlike MySQL.

FrontBase supports:

- servers: Mac OS X, Linux, Windows, Solaris, FreeBSD
- clients: FrontBaseManager Aqua client for Mac OS X allows mgmt. and monitoring, also web, common scripting languages (Perl, PHP, Tcl), JDBC, ODBC, and the sql92 console client
- SQL92
- Unicode
- statement-based replication from transaction logs
- full SQL logging options, allowing replay (SET WRITE SQL TRUE GLOBAL;)
- full-text indexing
- stored procedures
- clusters
- table spaces
- raw device driver
- live (hot) backup due to versioning (WRITE DATA;)
- data and communications encryption
- whitelisting and blacklisting of IP addresses
- row-level locking
- row-level privileges
- read-only databases (note that replication slaves are read-only)
- import utilities for Access and FileMaker
- migration tools for FileMaker (tables only) and MySQL (no enum or set column support)

They have been changing their licensing, so contact gclem at frontbase.com for the latest info.

FrontBase sure looks like a clean, modern RDBMS implementation.

I plan to learn more about its reliability and performance.

I have noticed that it is prone to crashing on Mac OS X PPC. Even a simple query like SELECT DISTINCT(OID) FROM TABLE; on 500,000 records with mostly nulls can crash the database, but hard to tell if it’s Frontbase, the OS, or what.