Archive for the ‘MySQL’ Category

Bug Tracking, Customer Service Software

Tuesday, January 16th, 2007

Looking around this week at systems that help manage bug tracking, trouble tickets and customer support inquiries.

Main requirements are: easy-to-use for inside managers and outside end-user clients, runs on Linux, available in browser and email clients.

Seems like the main differences between bug tracking and crm systems is in the UI clients would see.

Bugzilla and RT come to mind in the Open Source world.

FogBugz runs on Windows, OS X and Linux, but has per-user licensing fees of about $100/user.

Let me know if you have any recommendations or comments.

JoS BoS: About To Spend $1,000 On FogBugz
HelpSpot Versus FogBugz
FogBugz for Customer Service

Databases: NULL vs. the Empty String

Wednesday, December 13th, 2006

Some interesting posts on database NULL vs. empty string on JoS. May seem trivial unless you ever tried to do a database migration, like from MySQL to Oracle.

DBAs and accountants seem to prefer a distinction between NULL and empty, although developers usually like to just use the empty string.

One of the points in the posts is that middleware/applications rarely know anything about NULL.

JoS: Null vs. empty strings in DB

ProvenScaling.com - MySQL Consulting

Monday, December 11th, 2006

I was chatting with Eric Bergen (firewire) on irc freenode#mysql today.

Eric and Jeremy Cole quit Yahoo! recently to start a company called ProvenScaling.com, which does consulting, evangelizing and patches for the MySQL database.

He mentioned they also have an employee, bringing the headcount to 3.

I worked at Yahoo! at the same time as Jeremy Cole, so can vouch for his deep technical knowledge regarding MySQL scaling.

Jeremy mentioned at #mysqlcamp that there were a few reasons for leaving Yahoo!, but one of the biggest was the difficulty in getting approval to donate patches back to MySQL AB. As a former MySQL AB employee, he’s familiar with the source, and at ProvenScaling has contributed patches for profiling and in other areas.

Eric has helped in those patches, as well as writing a patch for 4.1 to add TTL to the query cache, which by default is table- oriented. He presented a talk on hot/warm dual master at the previous MySQL conference and will write an educational whitepaper.

ProvenScaling.com is available for consulting on both large and small projects and works with many prominent companies in Silicon Valley. Just fill in the contact form to reach them.

FYI

Peter Zaitsev, ex-MySQL AB performance consultant, now available as an independent MySQL performance consultant. His old blog.

Dathan Pattishall is prolly Yahoo’s latest internal MySQL consultant after Jeremy Cole.

PHP Accelerators

Thursday, November 23rd, 2006

PHP LogoTime to install a PHP accelerator for various reasons … would be nice to improve the performance of WordPress, but mainly research for clients who use PHP.

I found the following PHP accelerators:

  1. APC (Another PHP Cache) - FOSS
  2. Turck Mmcache/Eaccelerator - FOSS
  3. PHPA (Ion Cube) - no charge, no source, claims to be as fast as Zend. They also sell PHP source obfuscator tools.
  4. Zend Platform - commercial

Rasmus and George contribute to APC, so I installed it, and according to phpinfo() it is working. Note that it’s very helpful to read the INSTALL file carefully first.

No noticeable performance improvement on my WordPress installation, but then WordPress does a lot of MySQL queries per page.

What I’ve heard is that APC will not work well on PHP 5.0.x or on multi-core CPUs with threaded servers, like IIS.

Thanks to arjenAU of MySQL AB for pointers.

Optimizing Entity-Attribute-Value Design for MySQL

Wednesday, November 15th, 2006

MySQL LogoIn traditional relational database design, we normalize (restructure) data relationships until rows consist of a primary key and only columns that depend on that primary key.

In Entity-Attribute-Value Design (EAV) design, a step beyond normalization is made where only one column in addition to the primary key is stored in each row. A data dictionary is needed to “reassemble” the data later.

EAV has an advantage when data is sparse (storage space would be decreased), the data is mainly logging events, or perhaps when programs auto-generate a system.

Somebody asked me about performance tuning for EAV database applications for when tables grow to 10 million rows or so.

An obvious suggestion would be to ensure that you use numeric-valued primary keys, and that there is enough key buffer to cache the indexes.

MySQL NDB tables might be a good match since you’re mainly doing primary key operations - after all, the row is almost entirely a primary key!

Also, you could look at alternative table types, like archive or merge, where the data can be made read-only to avoid re-building indexes in case of an unclean shutdown.

EAV design is not that appealing to me as a programmer since there’s more mental effort needed to understand the system. For example, if a lot of UI forms will be displayed, then a mapping must be done using the data dictionary.

Also one of the key benefits of a database is enforcing constraints to maintain data integrity. Using EAV and a data dictionary moves that enforcement to the app, which is not a good thing.

Wikipedia: Entity-Attribute-Value model
An Introduction to Entity-Attribute-Value Design for Generic Clinical Study Data Management Systems

mysqlcamp Notes

Sunday, November 12th, 2006

MySQL LogoI attended mysqlcamp at Google in Mountain View.

(A “camp”, or “unconference”, is a recent West Coast trend for periodic informal technical conferences. The schedule has both scripted and unscripted sessions, and the speakers are not really separate from the audience. The organizer was Jay Pipes from MySQL AB.)

Friday

Opening Meet and Greet

I had already met many of the MySQL staff at various conferences, but most of the users were new to me.

At the Friday intro session, each of 200 people introduced themselves. Employees of Google, YouTube, LiveJournal, etc. attended.

Often attendees would throw out questions asking about what other people were doing with MySQL, or problems they have encountered and want opinions about how to solve it.

Bug #15815 was discussed, and Ken Jacobs, the Oracle InnoDB manager, talked about his role a little (Oracle employee #18, 25-year Oracle employee, former comittee member of ANSI SQL and TPC groups.)

Introducing mod_ndb, a REST Web Services API for MySQL Cluster

John David Duncan gave a talk on mod_ndb and REST. He’s now a MySQL AB sales engineer, but has a deep background in the database world.

mod_ndb is a 3,000 line Apache x.x module that allows direct access to NDB tables with GET, POST, DELETE returning JSON, XML, Raw, and ApacheNote formats. He wrote it to spread NDB outside the telco world to the web world.

After loading mod_ndb, other languages running inside apache like Perl or PHP should be able to just make a subrequest to call the mod_ndb interface.

Test Client JMeter 2.2 on XP, Test Server PowerBook 12″ 1.3

mysql 2250 pages/min
mysqli 2250 pages/min
mod_ndb 1650 pages/min on (non-persistent 75% Apache 1.3, 85% Apache 2, persistent 115%, embedded 156% 2500 pages/min)

(jdd did this to some extent because he wanted to experiment with sort-merge-join instead of old MySQL nested loop join. So have PHP do 2 big requests and merge them in 6 lines of PHP.)

mod_ndb allows a different kind of 3-tier architecture with browser JS as a 1st tier. However you need a security model that prevents injection and DoS, hoping that Apache user authentication and rate limiting can deal with it.

Some discussion of a memcached storage engine for MySQL by Brian Aker. JDD is not sure what MySQL adds in that scenario. Also, a patch for InnoDB-style row caching to MyISAM. Falcon is all about row cache. Also mentioned was using Linux 5 x nbd and “mounting” memcached or storage engines across network.

One inconvenience of the current interface is discovery, since the table definitions are hard-coded in the apache conf file. So you would need to publish a document on what’s available perhaps. or have mod_perl auto-configure the interface from metadata in a perl section using ndb_dest.

Lunch

Filet mignon, vegetables with gourmet veggies, salad, chocolates for dessert.

Talked to Monty a little. He’s getting over the flu, but happy to be doing more programming. Falcon is coming along.

Kevin Burton gave me a little demo of his memcached monitoring setup with rrd. He mentioned alternatives as ganglia and munin. BigTable is an ultra-scalable database.

Chip Turner, Open Source MySQL tools from Google

Download MySQL tools released by Google SVN Trunk

compact_innodb.py is an off-line packer. Written for older version of MySQL (4.0 or less) normally they fail over to another server first then compact it. Can boost performance on linear scans by 30%-40%. Alternatives are alter table to MyISAM and back, or ANALYZE table. Or use file per table with InnoDB to more easily recover space later.

Should track qps vs. io busy over time, do an OPTIMIZE when ratio falls.

mypgrep.py can list queries across databases and identify connectionless queries, for example.

Sheeri Kritzer’s blog

Paul Tuckhead is an Oracle DBA using MySQL who is joining Google. He said he has a script to run on the slave to parse the binlog, do SELECTs in 3 threads to prime the cache for the replication thread. Suitable for non-IO-bound situation. Lots of corner cases where both the SELECT and UPDATE blow out the cache.

Temporary tables can be troublesome for replication, even 5.0.22 with InnoDB. If the replication thread hangs, the temp table will still exist, meaning you can’t restart replication until you drop it.

DBIx::dwiw

DRRaw nice program for rrd/cacti

Most people seem to prefer manual failover in a master-master setup to prevent split-brain situations.

MySQL at Google

Steve Gunn and Michael Dickerson, Database Operations

Architecture Overview/Terminology

- partitioned on some key that’s useful
- shard is member of partition
- slims are read-only slaves
- wrote a client to do parallel dll and sql to cluster
- master can run out of buffer cache on large results and request dies, slaves die from binlog truncation

High Availability and Failover Strategies

- DNS-based
- cron-based python script scores load as green, yellow, red
- slave 5-10 seconds behind, can work on master if need transaction or no delay
- BigTable is more of a Berkeley store than ACID database
- db#-physical
- db#-logical, like db0-accounting
- never trust your glibc or your memory allocator. Fedora Core 3 is bad when doing static builds, do dynamic links. MySQL AB uses Suse glibc.
- needs to change MySQL privilege model or add roles, too much memory pressure with default system.
- be nice to have connectionless queries stop instead of running a long time
- they do explain on queries across the system to detect changes in the query plan overtime, drift
- lint checker for fk and schema

- ibbkp off primary, don’t trust replicated slaves yet
- have written library to do ordered and unordered checksums on table (start slave until, compare checksums)
- formal schema changes process

Mark Callaghan, InnoDB Scalability - formerly database internals at Oracle for 8 years, sorting, floating point, customer requests

- Google databases are IO bound
- looking forward to page compression in InnoDB in 5.1
- would like transactional counter back in InnoDB same as 4.0 so transactions on commodity boxen are recoverable

Had dinner with Jeremy 2, wife, Matt (WordPress) and his bizdev guy.

Saturday

Brian Aker, Clustered Database Approaches

- NDB
- Emic 2 would have problems with SP because of how they intercept queries …
- Solid and Falcon to in-memory transactions, good for web
- InnoDB uses disk more, good for data warehousing
- Amazon S3, use a trigger to store copy on S3, like archival or SoX data
- Every developer today should be asking: How am I going to partition and cluster this?
- would prefer if all the databases used the same binary protocol, since they do the same thing
- Tridge has a persistent memcached-like cache (cnet and facebook may have more than 1,000 nodes in memcached)

Architectural questions:

- how do I partition it?
- how do I avoid crippling it?
- what can I afford to lose vs. durability
- how do I minimize number of nodes?

- NDB good to 63 nodes, can accept table scans, while Solid will prolly be good to 8 nodes based on experience
- iSCSI 10 Gbps demos fast as FiberChannel

- One could handle a Slashdot-type situation with Amazon EC2 for example.

- immediate read after write makes horizontal scaling difficult. near synchonicity makes things very easy.
“Your comment will appear in a minute or so.”

- model for web start-up could be: start with 1 MySQL db with a memory table for sessions, grow to 2xDB and multiple memcached session caches

- Alexa does thumbnails with S3?

- think about architecture that works for both internal and external users, avoid balkanization

Baron Schwartz: How to use the innotop InnoDB and MySQL Monitor

- cool
- think about allocating more key buffers and assign queries to use that. can disable caching for logging table, for example
- install linux in solaris zone, setup slave, dtrace it
- tcpdump profile measurement - time inbound sql frame and outbound result frame
- EXPLAIN EXTENDED and SHOW WARNINGS
- dependent subquery
- MySQL does not give adequate performance counters or data

Sheeri Kritzer — Better Performance with Booleans (using bitwise operations)

- uses colinux on windows

- problem: given hair color, eye color, sex, status
- question: how to optimize performance in MySQL for general dating search?
- played with various MySQL binary statements

Jeremy Cole: MySQL Replibeertion: Replication; Uses, Performance, Problems, Brainstorming

Reasons

- backup
- scaling
- data warehousing
- multiple storage engines

Problems with MySQL Replication

- delayed
- masters don’t track slaves
- should checksum events
- nice to have guid for multiple masters

- Golden Gate $40,000 for MySQL - Oracle

- for unreliable network, slave reconnect default is 60 seconds. try master connect retry=1

Sunday

- 9 am Innodb session cancelled
- group photos with umbrellas in back of Building 40
- Brian Aker and myself got a tour of suspended Space Ship One mockup
- started MySQL Forge Data Warehousing Wiki page, in-person comments from Rick James and Frank Flynn, a former Red Brick user

- Jeremy Cole demoed his profile patch for MySQL, very nice: SHOW PROFILE …
- Stage 0 is checked-in, 4 more stages to go
- minimal performance overhead impact
- shows about a dozen query stages elapsed times in microseconds, option to show source function
- some day could have graphical tool

- talked about SQL optimization for Web 2.0 tag clouds (20,000 tags x 2 million uses)
- Jay Pipes showed a slide with his effort to use a derived query and LIMIT to prevent “run-away” large tag matches
- Monty mentioned that GROUP BY … ORDER BY NULL is faster when using GROUP BY, which by default does an implicit ORDER BY.
- he also mentioned a max-join-size mysqld option to return an error on overly large joins
- Brian Aker and Jay Pipes led a MySQL trivia quiz
- what is wrap behavior of enum
- what is procedure analyze

Lunch

Indian food.

ScaleDB Inc.

- trie data structure concept
- patricia tries (compact, used in routers)

Falcon Database

- in memory, commit to disk, similar to Solid
- better for OLTP than DW prolly
- tracks hot records in memory
- InnoDB is moving to fewer locks for OLTP performance, may slow down replication
- MySQL does benchmarking with Quest, can only make Open Source comparisons public

MySQL++

- been working 4 months on it
- complete do logs, great for data warehousing importing

memcached

- memcached table type is being added so it will have a MySQL interface
- kind of strange for some users, since no result for empty cache, could do stored procedure to hit real database
- toy compared to ndb
- key lookup only, no range queries without a patch
- use MySQL as a SQL language router with stored procedure or trigger to not change app
- Brad did a 1.2 release 2 months ago, mostly with FaceBook patches. smugmug is using it in production.

MySQL Enterprise Dashboard

- available to MySQL Network Silver Level and up subscribers
- many monitoring and alerting features
- ball matrix monitoring display
- very nice
- code name was Merlin

Requested MySQL Enterprise Dashboard Enhancements

- favicons specific to this product and alert level
- KB article integration with bugs
- dump variables into a bug report
- scale to hundreds of servers
- publish alerts as RSS feeds
- compatible email format to RT and Bugzilla
- some alerting is not helpfully implemented as of now. messages are not clear and hard to disable.

MySQL Binary End of Life Announcement

- 3.2 and 4.0 binaries already EOL’d, unless you purchase extended support, source available to satisfy GPL
- need to upgrade some classes of users, like large ISPs, cpanel providers, etc.
- notify others

Thanks to MySQL and Google for a great job organizing and hosting the event. Nice venue, fresh food, friendly security staff and conference photo policy.

MySQL Camp Unconference News

MySQL Bug #15815 and InnoDB Anti-scaling

Friday, November 10th, 2006

MySQL LogoAn old rule of thumb in database development is that it takes 10 years for a storage engine to bake-in and become a reliable, high-performance engine for enterprise users.

MySQL Bug #15815 (filed Nov. 17, 2005 and closed Nov. 9, 2006) illustrates this.

In a nutshell, before the patch for this bug, InnoDB would get slower as you add more CPUs.

If you can’t upgrade yet, a work-around for this bug is to do reporting/data-warehousing against a replicated slave.

eWeek: The Truth Comes Out: Oracle Bought InnoDB Without a Clue

Intro to Compiling and Configuring MySQL

Thursday, November 2nd, 2006

databasejournal.com has a good introductory-level article on compiling and configuring MySQL:

Optimizing MySQL: Hardware and the MySQLd Variables

Note that most users should use the pre-built binary packages from mysql.com.

The advantages are:

  • tested, both internally by MySQL AB and millions of downloaders
  • may have performance enhancements from different compilers, like icc
  • supported by MySQL AB and other users.

tutorialized.com: MySQL FAQs