MySQL Conference 2008
I attended the MySQL Conference once again at the Santa Clara Convention Center.
Despite the January purchase by Sun, the conference had the same great vibe as usual, and everybody showed up again.
Top Conference Themes
Some of the conference themes I noticed are:
- Linux LVM snapshots are now popular for MySQL backups. Snapshots have long been used in enterprise IT, now it works well and for free on Linux. Another use for a snapshot backup is to copy a busy master offline for comparing to a slave with Maatkit mk-table-checksum.
- DRBD is popular for HA (one speaker wondered if his talk about DRBD was still relevant since everybody was already using it), but I see some drawbacks.
- developers are concerned with supporting massively multi-core CPUs in both database and cache code. The Sun Niagara multi-core architecture seems to be the future, with 128 or more threads.
- databases and storage are quickly increasing in size, so many DBAs are interested in MySQL 5.1 partitioning and other tools and techniques.
- cloud computing is entering common use with lots of Amazon EC2 and S3 users. mosso.com has been available for a couple years from Rackspace, and Google and IBM are entering cloud computing. Users would like more competition to reduce prices and improve reliability, SLA or not.
- most companies say it’s hard to find experienced MySQL DBAs, but most are lazy when it comes to training and compensation.
Top MySQL Conference Tips
- Disable swap if your version of Linux supports it (most do.) This avoids getting some of MySQL’s pages swapped out and crippling the box with IO.
- Use memcached. MySQL User Defined Functions (UDFs) for memcached are now available to auto-populate memcached from MySQL statements.
- Consider multi-level partitioning schemes with MySQL 5.1, like combining RANGE and KEY.
- Consider MySQL High Availability (HA), either with replication, DRBD, or both. Linux HA project.
Top MySQL Conference Misconceptions
I had to straighten out a lot of newbs:
- statement-based replication is not reliable, does not have checksums or two-phase commit, and masters and slaves tend to diverge over time
- many novices believe Innodb does row-locking only, but often does range and table locking
- mysqldump is ok in most cases, but you have to be careful with locking and locktime, matching charsets, and testing the dump.
Here’s my notes on some of the talks I attended. (In case you haven’t read my blog before, I’m a long-time user of MySQL, replication, LVM and memcached. I have not used DRBD.)
If you have a correction or improvement, please leave a comment and I will update this blog entry.
Monday (Tutorials)
Building Scalable & High Performance Datamarts with MySQL, Tangirala Sarma
Discussed general DW concepts at first.
3 main requirements for a succesful DW project are:
- good data quality
- the right tools
- phased results.
Talked about various partitioning schemes in MySQL 5.1. I’ve used it for about 6 months in one project, but most of the audience was new to MySQL partitioning and struggled to understand beyond RANGE partitioning for logging it seems.
Later talked about MySQL-particular DW aids, including:
- Kickfire appliance, which has capabilities such as column-store, compression and fast loading.
- Infobright, which also has column-store and fast loading.
- Nitro
His recommended references are:
- DW Toolkit, Kimball on Amazon.com
- Enterprise DW with MySQL, MySQL AB
- MySQL Roadmap 2008-2009, MySQL AB.
Also, there’s a list of books here:
Queued up for sandwiches and salad. Not really a surprise with O’Reilly as the conference organizer, expected more.
Ate lunch with the DRBD programmers. They said that MySQL AB now provides 1st- and 2nd-level support, and their company provides 3rd-level support and cashes checks.
Memcached and MySQL: Everything You Need To Know, Brian Aker (MySQL), Alan Kasindorf (Six Apart)
Very detailed talk about tips and issues with using memcached.
- Evolving online notes
- Brian wrote about 30 man pages for memcached, edited by Mark Atwood. Unusual amount for an Open Source project.
- memcached is very handy for people stuck with databases on 32-bit systems and a lot of otherwise unaddressable memory.
- Patrick from Grazr has written MySQL UDFs to populate memcached, has a SoC student. Pipelines cache inserts. Handy for distributed DCs already using replication, triggers.
- Postgresql has pgmemcache()
- lighthttpd has mod_memcache, prolly url hash key
- Apache has mod_memcached with CAS, GET/PUT/DELETE, still alpha, try at pandoraport.com
- limits: key size 250 bytes, data size 1 MB, 32/64 bit limits
- threading is new based on giant mutex, bad for more than 8 cores
- stop swapping with -MLOCKALL, noswap, sizing
- stats sizes to test efficiency
- command line option to disable LRU
- CRC most consistent hashing, normal
- ring consistent hash
- IP takeover
- bad switches or intermittent network is very bad
- pick a driver than can do multiget
- ghetto lock
- Tim Bunce’s Cache::Memcached::Libcache does not do Storable, which is what most people prolly want
- time in seconds < 30 days is relative, > 30 days is absolute
- namespace trick: versions in key name
- uint32_t type parameter usually indicates whether compressed or storable, can be used for anything though.
- memcached_tool: memcp, memrm, memstat, memslap (load testing)
- showed Mixi MRTG graphs: 6800 reads/second, 200 servers, no CPU load
- Brian added IPV6 support after mysqld update (pet project, but helped with multi-interface support and to optimize out name resolution)
- binary protocol code available, not merged yet, helps with multi-byte charsets than embed spaces or newlines which break the text protocol
- improvements needed are durable to disk, highly-threaded
- persistent connections are good and recommended
- UDP alpha code available, good for lots of sets
- storing BLOBs with MogileFS or LUSTRE good
- speakers did not have experience with commercial caches, but did say that most people find Java caches often too featureful and slow
- MogileFS, Hypertable, Hbase interesting
Tuesday
EXPLAIN Demystified, Baron Schwartz (Percona Inc.)
Room was packed for this talk. Good step-by-step talk for understanding EXPLAIN better.
Replication Tricks and Tips, Lars Thalmann (MySQL), Mats Kindahl (MySQL)
Some good tips, but overall an assumption is made by MySQL AB that a MySQL master and slave actually have the same data.
- mysqlbinlog has –hexdump option for seeing byte-level dump
- examine both binlog and relay log when debugging replication
- you can clone a slave from another slave if you trust it - just do STOP SLAVE, SHOW SLAVE STATUS, shutdown, copy over the files, SET MASTER, and START SLAVE
Dramatically Improving MySQL Database Performance in Data Warehouse Applications, Martin Farach-Colton (Tokutek)
His view is that the storage engine is the primary bottleneck in BI systems for loading and search, and showed how to use a B-tree to organize data to guarantee the maximum performance in a growing DW. (Not sure why the online summary talks about fractal trees.)
How to Achieve Operational BI on a Budget, Lance Walter (Pentaho Corporation)
Kind of overly Pentaho-product oriented, since Lance is a product manager for Pentaho.
However he made a useful distinction between historical and operational BI.
Historical BI is mainly reporting on what happened before yesterday, and operational is up-to-the-minute for business process analysis and improvement.
By designing your BI system for both historical and operational requirements, you can get both at the same time.
Interesting case study of the US Navy doing BI on pilot training and operations to reduce accident risk
MySQL Backup BOF (hosted by Zmanda rep)
- admins attending generally unaware of LVM, but awareness growing
- one guy split his database across multiple databases for easier mgmt. and uses a HP SAN with 1 TB of RAM, very happy with IO performance, not so happy with price.
- one guy using 100 EC2 instances and S3. ok except for recent outages, maybe a little pricey.
- one guy using dd for fast network copies
- Zmanda just integrates existing techniques and allows scheduling, but prolly quite useful for inexperienced DBAs to do point-in-time recovery and schedule backups.
MARIA BOF
Hosted by Monty with his usual wicked Finnish black vodka.
He said Falcon was supposed to take 3 months, but slipped, so work started on MARIA to replace MyISAM. He has made promises to deliver a working storage engine, so will continue and do so and prolly release MARIA in 6 months.
All of the MARIA programmers were required to read Jim Gray’s textbook Transaction Processing: Concepts and Techniques on Amazon.com, and each have said they understand it.
I got the impression that MARIA should end up with a cleaner codebase than Innodb.
Discussed table-level checksums for replication checking, which he is planning to do, might be an option. Still time to decide whether to use CRC32 or another algorithm.
One of his programmers gave a demo of yanking power on MARIA-current on his notebook computer, though I didn’t notice what the outcome was.
He said that ALTER TABLE optimizations are planned, including instantly dropping an index without copying the table, though add would still require copying the table.
Another conversation could be paraphrased as, “After MARIA, he will work with the Sun team that fixed Postgresql threads to fix various thread scaling problems in MySQL.”
Sphinx Search BOF
Hosted by Peter Zaitsev and Andrew Aksyonoff, the author.
Percona has used Sphinx Search in a few projects for web forum searching for the past few years. Separate index server is recommended.
About a dozen people talked about full-text search requirements and experience with Sphinx. One guy was spending a lot of money on encad(?) and didn’t want to spend more on a bigger license later.
Wednesday
MySQL Performance Under a Microscope: The Tobias and Jay Show, Tobias Asplund (MySQL), Jay Pipes (MySQL)
Presented slides on the performance of various workloads.
The MySQL Query Cache, Baron Schwartz (Percona Inc.)
Excellent in-depth discussion of the query cache.
Grazr: Lessons Learned Building a Web 2.0 Application Using MySQL, Patrick Galbraith (Grazr Inc.), Michael Kowalchik (Grazr Corporation)
Deadlocks, Wait Timeouts, and Other Transaction Issues, Jess Balint (MySQL)
Thursday
DTrace and MySQL, Ben Rockwood (Joyent Inc)
Good talk on using DTrace specifically with MySQL, mainly query debugging.
Scaling with MySQL using Materialized Views and a Shared Everything architecture, Moshe Shadmon (ScaleDB)
Listed 3 ways to do materialized views, but dwelt on their ScaleDB cluster product mostly.
High Availability MySQL with DRBD and Heartbeat: MTV Japan Mobile Services, Patrick Bolduan (MTV Networks Japan KK), Yoshinori Matsunobu (MySQL)
Talked about setting up HA using heartbeat, pingd and DRBD on a mostly-reads 5 GB CMS db. Used Enterpeise MySQL distro and support. No replication involved, likes mysqldump. Happy with MySQL 5.0 and Unicode with Japanese. Cute slides with Japanese maru symbols, etc. DBRD staff were on-hand to help with more difficult questions. Said LVM can be used under or over DRBD.
The Science and Fiction of Petascale Analytics, Jacek Becla (SLAC)
Talked about petabyte and exabyte DW for physics and astro programs. Contrasted science and industry PB databases (Google, MSN and Yahoo! likely each have 100 PB databases, but don’t disclose the size.) 5 years to plan DW for next experiment.
Spent 10 minutes with Rohit Nadhani and his programmer from Webyog looking at their MonYOG 2.01 version. Provided UI feedback for database operations use based on several months usage. Looking good.
Talked with Patrick Bolduan, somebody from DRBD and a NY Times IT guy who uses EC2 in the lounge. Apparently S3 can lose up to 10% of insert requests. Everybody is looking forward to when Amazon EC2 and S3 have competition for both pricing and reliability improvements.
Conference Evaluation and Recommendations
I go to a lot of conferences as a paying attendee, so I usually provide feedback to the organizers to help them improve the experience.
I mentioned to Jay that overall the conference was fine:
- Talks were good, maybe less technical than previous years. Jay said Sun wanted more talks for novices since 2007 was too hardcore for some new attendees, but O’Reilly did not want to ghettoize newbs with a single track and room. For sure 2007 had too many sharding talks, mostly with users stuck on 4.0. There’s always the hallway track with developers anyway.
- Food was ok but not great, although O’Reilly, the conference organizer, slipped in sandwiches for lunch on tutorial day. (SCCC is in an isolated location, so food is a big deal.) Still way better than OSCON in recent years. I guess the original MySQL conferences at the DoubleTree spoiled me.
- Some of the vendors got too salesy in their presentations, but it’s hard to crack down on sponsors. Pentaho and ScaleDB come to mind.
- conference still provides full access to MySQL managers and key programmers, in the best Open Source tradition
- Still need a big-iron room with functioning demo SANs and HA setups, as I’ve suggested for a few years.
MySQL Conference 2008 Presentations
TechCrunch: Rackspace Offers Cloud Computing with Mosso
cnet.com: Is cloud computing more than just smoke?