Archive for the ‘MySQL’ Category

YAPC::NA 2007 Perl Conference, Houston

Saturday, June 30th, 2007

I was at the YAPC::NA 2007 Perl Conference this week at the University of Houston in Texas.

Once again, it was a great conference, with about 200 people attending.

It was my first time in Houston, and boy was it humid. My eyeglasses fogged up every time I went outside. It might be the most humid place on earth, even moreso that Bali (on the equator.)

YAPC is a great value. The conference, 2 days of tutorials plus book and 5 nights of dorm accomodations was $384.

Sunday

- got up an hour before my flight
- hot and humid outside airport in Texas
- took bus from the airport, 73 and 30, 88 doesn’t run on Sunday
- city is not a war zone like Detroit or other inner cities, but rough. one drunk on the bus
- staying in Taub dorm in “the quadrangle”, has ok common
basement with TV lounge, weight room and laundry room
- spent evening at The Cadillac Mexican food restaurant
- very off-campus $25 cab ride
- $30 fajitas and cheesecake

Monday

- meeting rooms are in the 3-storey University Center
- scattered around vertically
- coffee
- number of fast-food restaurants: Wendy’s, Subway, Chili’s Too
- Cougar shop
- TV lounge with wide-screen flat-scree TV
- Game Center with FPS, pool, air hockey and bowling

Larry gave 2 talks in morning

- briliant comparative language analysis
- Perl borrows from other languages heavily
- Perl hits the sweet spot for programmer freedom
- Prolly too many object models in Perl5. one is enough.

Lifetime of Many Hats

- born near Pasadena
- almost drowned with styrofoam surfboard
- liked to specialize in everything
- dropped out of original uni program, worked, went
to missionary school and studied linguistics, back to uni

The Perl Foundation

- case study of dev shop that couldn’t find enough perl programmers,
tried Java, ended up with 25 of each
- creating materials for the public in MBA-speak
- Forester called them and asked for scripting language survey participation
- the analyst groups rank language by current capability
and long-term strategy equal weighed, perhaps problematic for perl
which has a mature perl5 but fuzzy perl6
- somebody has written a Perl module for the Facebook API,
overlooked in their initial API release

Dave Rolsky

- error, validation, logging
- good talk

Abigail

- sudoku solver with Perl regex
- regex is NP complete … so let’s do something insane
- 250 Kilobyte regex to solve 9×9 sudo, 90 minutes to run
- he can solve it by hand in less time

Tuesday

Resume talk was funny. Basically, don’t do things
like send a picture of yourself in a disco, show up
late, or be difficult. Based on real-world experience.

Perrin Harkin

- always thoughtful and practical
- but lolcats galore in slides
- tried VMware for snapshotting environment, 4x slow
disk io on their hw, moved to lvm
- uses version control, indifferent, cvs good enough
for a long time anybody who said otherwise was a crackpot
- use 2 branches (dev on main plus maintenance branch)
- test is important, Test::More is not enough because
of namespace pollution?
- tried to store db operations and roll back in perl,
prolly better to just truncate

Casey West

- working at SocialText, SocalCalc
- started project with MochiKit, web 2.0 ish framework
- roundelement for rounded corners
- cross-browser keyboard modifiers
- animation lib
- functional
- did one lolcat to mock Perrin
- switched to MooTools for JS, smaller memory footprint

Evan Carroll, Houston.pm, Dealermade

- catalyst with Dojo, template toolkit
- ruby with scriptaculous JS, integral

Nice dinner in Hilton ballroom.

Auction hosted by Uri. Quite humorous, raised about $4,000.

Dorm party afterwards, lot of alcohol. Talked to Julian Cash
a little. He is very artistic.

Talked to Beth and Adrian, 2 bioinformatics perl chicks.
Beth runs Ubuntu on her Powerbook G4 and Adrian Debian on her Dell
notebook.

Taub dorm party.

Wednesday

Beth from Ithaca did a talk on database programming with postgresql.
She uses it for bioinformatics work.

Some type of skit about the power of the yapc perl t-shirt.

Town Hall meeting rehashing usual Perl adoption issues:

- lack of corporate funding
- lack of PR
- lack of mindshare vs other languages
- poor state of Perl distros, but in LSB

Richard Dice mentioned some feedback from Forrester Research.
Perl was rated ok overall, with a hole in WSDL support.

This YAPC finished about $6,000 in black.

Talked to Perrin a little about file uploads and memory leaks
on mod_perl. He recommended either using Apache api,
reading by row, or using exit in your Apache::Registry script
to force child to die after the upload.

Talked to a Larry a little about parallel programming
with Perl6. Seems like comprehensive support for features
found in other languages like Haskell.

Thursday

Damion was unable to come to YAPC this year, so Randal Schwartz filled in for him for 2 days in the Advanced Perl training tutorial.

Randal did a good talk on his page-by-page views on Damion Conway’s Perl Best Practises book from 9 am to 2:30 pm.

I tend to identify more strongly with Randal’s programming style than Damion’s in some cases. For example, I’m not a big fan of English.pm.

Randal delved in detail into some areas of Perl upon prompting by the audience. He spent some extra time talking about Perl control structures.

MySQL Replication and Errors

Saturday, May 26th, 2007

MySQL LogoJust investigating ways to detect and fix replication errors on a daily basis - without reloading the slave. The database I am managing is large, but fortunately partitioned into lots of smaller, independent tables.

The most common error this year is malformed packets as the master and slave are in different data centers. Skipping that statement is often ok, with SET GLOBAL SQL_SLAVE_SKIP_COUNTER = 1.

I saw 2 talks by Baron Schwartz this year at MySQLCamp and the MySQL Conference, so I thought I’d look into his work.

I’m able to run his mysql-table-checksum-1.1.5 in ACCUM and BIT_XOR modes, but not CHECKSUM. I had to do some editing on the script, so it looks like it needs a little more testing.

Update: He’s fixed the ACCUM and BIT_XOR bind bugs.

Baron Schwartz’s work:
xaprb: Introducing MySQL Table Checksum
Sourceforge: MySQL Toolkit
Innotop

MySQL manual:
Replication Startup Options
SQL Statements for Controlling Slave Servers
SET GLOBAL SQL_SLAVE_SKIP_COUNTER
SHOW SLAVE HOSTS Syntax
CHECKSUM TABLE Syntax

SQLZoo: Cool Multi-Database Educational Site

Thursday, May 17th, 2007

MySQL LogoSQLZoo not only has free tutorials for learning and trying SQL, but lets you run interactive queries against multiple databases.

Very cool.

One of my friends uses it for learning basic SQL.

I use it for testing SQL written for MySQL that may be run on Oracle later, and doing meta-data queries that require access to a real database installation, like this Oracle query:

SELECT OWNER, TABLE_NAME FROM ALL_TABLES

Learn UNIX in 10 minutes

MySQL Conference 2007

Thursday, April 26th, 2007

MySQL LogoThe MySQL Conference was held again at the Santa Clara Convention Center. As always, it was a great conference, with so many good talks in parallel tracks that it made choosing difficult. Here’s a link to the slides.

The Santa Clara Convention Center is not near any restaurants, so you have to rely on the conference lunches. Fortunately, they were pretty good this year (less sandwiches than last year.) However, it was often hard to find a coffee urn that wasn’t empty, so that was annoying.

My conference blog posts tend to focus on talks in the tracks that I attended and BOFs. Click on the “Conferences” category on the right-hand side to see my posts on previous conferences.

Monday, April 23

Tutorial: Scaling and High Availability Architectures, Jeremy Cole and Eric Bergen, ProvenScaling.com

A good tutorial on large-scale database architectures popular with the Web 2.0 crowd.

- Hash Partitioning
- Directory Partitioning
- HiveDB Java project started by Cole, contributions from clients

Tutorial: Wikipedia: Site Internals, Configuration and Code Examples, and Management Issues, Domas Mituzas

This was a very funny talk on how Wikipedia handles scaling and performance. Not intentionally funny, but ghetto database and system administration for when your project doesn’t have a lot of money or people.

Jimmy Wales’ SLA: wants “more up than down”

Downtime does not mean being fired, it means more donations,” because an error page soliciting donations is displayed.

One 9 is pretty good for Wikipedia site, most businesses aim for 5.

RAID0 fast, more storage, use replication.
RAID1 does not make sense because 1 hour
rebuild and could be dead.
RAIS

dc failure - get backup from mailing list users

security patches: if you can ping a server, or get a shell, you have bigger problems.

memcached is not faster than mysql if on network.

squid 2.6. mysql 4 with patches, can’t upgrade and
maintain performance. 2.6 can’t rewrite single header line

squid sync broken on Gb ethernet with 0.0 ping time
.net Lucene for search, worked the best for them when tested

read load bigger than write load, replication is great for updaters, put binlog position in cookie

hundreds of writes per second to single master, thousands of reads

no local logging, just wite to network and forget

“people complain about everything”

loves LVS

use fedora, testing ubuntu

Tuesday, April 24

Security and Debugging in PHP, Rasmus Lerdorf

Another lucid and practical talk by Rasmus on performance tuning. This time he picks on Drupal, which does like 50 database calls per page and gets load time down from 1.2 seconds to 50 ms.

mysql native driver

http_load

base/mysqli/mysqli-nd same speed 1.2s

callgrind

- recursive calls can take more than 100%
- if you see compile high (20%), install
op code cache like apc

- xdebug
- kcachegrind drupal_xdebug1.out
- drupal has 3 performance options:
- off 1200 ms
- normal 80 ms
- aggressive 50 ms

- div
- cracker JS control panel: inspect dom for colored links (like banking sites), see if operation succeeds in same browser session (like a banking transaction)
- browser zombie control panel with iframe
- linksys dmz to make your pc outside fw

- filter extension in 5.2 by default

- alert(0), /, /xss, /abc
- make sure not cached

- host header xss-scriptable, apache picks 1st
- javascript:

- IE6 allows JavaScript in img src tag

- #039 is a single quote in On* and style handlers

- hit yahoo mail and google groups in
next/prev which showed subject lines

- update acrobat now!!! or people can run JS against your local drive because there’s c:\…\file.html

- cross-site request forgery

- xsrf

http://talks.php.net/show/mysql07
http://xdebug.org/docs-profiling2.php
http://sla.ckers.org/forum/list.php?3
http://php.net/filter

Building a Vertical Search Engine in a Day, Ken Krugler

Awesome talk, reminiscent of the old OSCON talk “Build Your Own Map Server like Google”.

- vertical search
- less than 1% of content out there is
of interest to my audience

- like specialized processing on source code
(indexing, hiliting)

- specialized optimized search interface

- 25 people, 120 servers
- 2 years
- get it right for user, or get lost in
technology - 10 papers a week
- nutch
- 1 day of crawling, 2.5 GB data
nutch tomcat
- mysql used to support operations
- nutch args in xml
- results in mysql for analysis
- db falls over at 100 million records
- one good record, 5 crap

- injection list
- fetching pages
- parse the content
- update crawl db
- whitelist
- blacklist
- convergence
- index (lucene)
- map reduce - split up big problem into small pieces, combine results
- recrawl

100 servers 1000 threads each

- 20 billion pages out there, lots of junk, 1,000 servers regularly, 2 Gbps

- breadth first would take years and cost millions of dollars

- dmoz has 500 dollars, 95% of content

- depth first crawling to the rescue

OPIC + Term Vectors = Depth First

- page rank is a 20 billion by 20 billion sparse matrix you solve for each page

- pure opic means “fetch well-linked pages first”

- we modify it to “fetch pages about mysql first”

- term vector

mysql - 5.0
sql - 4.5
schema 2.5

- can use wikipedia to train based on mysql category

- nutch and hadoop running on 4 server cluster (2 hours to make lucene index)
see http://lucene.apache.org/nutch for details
- 23 loops, 150k pages fetched, 2M urls found
- common problems
- honey pots
- bogus pages (spam, porn, out of scope)
- dynamic links (http://mydomain.com?param=value)
- getting blocked

- 2U fast sata quad core 8 GB RAM
- searchers for web

- introduction to nutch

- 40 Mbps ok for 4 servers because of database update, 1 day
- they have a 1 Gbps link
- can make a NetNanny list from crap

- mapreduce of text score plus incoming links gives useful score
- also score subdomain name like blogpost

- personalized yahoo and google: no ui, so special processing

- krugle is on nutch .8 or .9, hadoop .9

- papers: do they scale, do they really improve results?

- precision is relevance
- recall is completeness

- focus on code search side of things
- 1.5 people to manually score index
- search is tough on code because no comments
- look in tab search for “haskell quicksort”

- solr
- compass
- omnifind
- htdig

SPASQL

- used for federated queries, semantic web.

Lunch

Pretty good food, at my table were some conference first-timers from out of the country.

Angel, IT Manager, from WillowBrook Bathrooms, UK, was here to learn more about MySQL from an IT perspective.

And Juan Carlos Gonzalez, Project Coordinator and Programmer, Innox, from Guadalajara, Mexico, was here to learn more about MySQL and software development and do some bizdev. Innox does PHP development. Programmer rates in Mexico are about $1000/month and Juan is looking for clients. +52 (33) 3030-7181, jgonzalez@innox.com.mx

MySQL Performance Tuning, Peter Zaitsev, Percona

- max_connect_errors
- bump up to prevent DoS
- created_tmp_tables
- created_tmp_disk_tables
- increase tmp_table_size and max_heap_table_size
- tmpdir=/dev/shm

The Blackhole and Federated Storage Engines: The Coolest Kids on the Block, Kai Voigt, Giuseppe Maxia

Great talk. These 2 guys have the most twisted uses for these table types that you can imagine.

Blackhole Storage Engine

- blackhole is ACI not D
- smallest storage engine codebase (under 6k), so prolly reliable

Good for 3 things:

- statement-based replication relay slave layer (throw away data, keep binlog)
- benchmark SQL layer or network layer
- test dumps
- real time data warehousing - use triggers to automatically fill summary tables, avoid joins later
only causes result with insert, not delete or update because 0 rows affected

Federated Storage Engine

- create local table with connection to remote server
- uses local query cache if enabled
- can federate views or tables
- strange behavior of queries where index not available
- select count(*) causes a table scan - all aggregates are slow, so use views.
- can do subqueries against mysql 3.23 from newer versions!!!
- use federated table on slave to check against master in pure sql
- remote command execution on remote server by using view and function or udf
- takes one connection per table
- should have restricted account for federated table login

PHP Performance and Scalability Best Practices, Laura Thomson

http://omniti.com/~george/talks/

adv. php programming, g. schlossnagle
building scalable web sites, c. henderson
scalable internet architectures, t. schlossnagle

BOF: Data Warehousing

Host: Pentaho, Julian Hyde from Mondrian Project

- Julian works on modrian reporting project, sponsored by Pentaho now
- Julian recommended looking at Kettle as an ETL tool
- he thinks that parallel queries across multiple disks and materialized views are often important for DW
- other companies selling ETL tools include ab initio, ardent and informatica.

BOF: MySQL Performance

Jay Pipes and Flupps, MySQL AB

- fastest way to load external data is drop a csv file in the dbdir and do create table, alter table?
- load data infile stream from memory?
- single master means 2 data centers will have lag
- stored procedures are compiled per connection
- Jay likes PDO 5.1.13 for prepared statements
- 5.0 just had query caching of prepared statements 3 weeks ago, took 2 days of programmer effort and solves a frequest customer support question
- 5.1 will be GA in Q3 or Q4?
- will need to dump and restore if moving data, esp. UTF-8, to 5.1
- for syncing databases, look at mysqltoolkit mytablesync

Wednesday, April 25

Federation at Flickr: Doing Billions of Queries Per Day, Dathan Pattishall

Thousands of Shards, each shard has 2 nodes in master-master replication
Storage on NetApp Filer
InnoDB, InnoBackup nightly
tables as files
multiple data centers, want all hosts to be active for site performance
4 billion queries per day
migrate in bulk
need patch for linux to work with 2,000 threads in mysql

Using MySQL as Active DBMS for Monitoring Applications, Jacob Nikom, Lincoln Labs

- talked about design of database system to do event monitoring in subway attacks
- Oracle site license at lab, but clients don’t want to pay for Oracle, or hire a DBA per database
- likes combination of MySQL, UDF, triggers, Spread - because no Java in MySQL db yet
- audience member noted that MySQL triggers could use more docs, debugging tools, overall design

Storage Replication with DRBD: The Next Big Thing in Cluster Storage Management, Florian Haas, Phillip Reisner

DRBD FAQ

- layer below database, similar relationship of SCSI to MySQL
- now on version 8
- Siemens uses it.

askask.com: DRBD and MySQL

Lucene and MySQL, Farhan Mashraqi

- plugins for office formats, pdf, html
- capable up to 100 qps, need architecture to support more
- ranks
- supports + - modifiers, phrase, wildcard, range, more
- usually not right solution for very small or very big sites
- 20 MB/min on P M 1.5Ghz
- index size 20-30% of size of text indexed
- small RAM needed (1 MB heap is ok)
- Basis Technologies Rosette Linguistics Platform basistech.com/lucene
same as google and yahoo
farsi, etc.
- many ports of lucene to different languages. Java lucene is main.
lucene4c is index compatible with java, ferret and rubylucene is not.
- solr
- 18th - 25th, sept 2007 heidelberg Kai

Lightning Rounds with Top MySQL Community Contributors, Giuseppe Maxia, Ask Bjørn Hansen, Jeremy Cole, Martin Friebe, Bill Karwin, Sheeri Kritzer, Mike Kruckenberg, Paul McCullagh, Baron Schwartz, Yoshiaki Tajika, Beat Vontobel, Peter Zaitsev

QC’s (Quality Contributors)

- several people in the MySQL community from around the world got awards and talked for a few minutes about their contributions and motivations

Yoshiaki Tajika, Customer Support, NEC

- formerly supported Microsoft SQL Server
- likes free bug reporting and access to developers with MySQL
- “The manual was be perfect. My customers might be confused by mistakes.”

MySQL Performance Cookbook, Peter Zaitsev

- slave_transaction_retries= to prevent timeouts when doing an alter table
- innodb_flush_log_at_trx_commit=0 huge performance improvement with replication
- log-slave-updates chain replication
- replicate by db or table
- set global sql_slave_skip_counter=1
- start slave
- log event entry exceeded max_allowed_packet
(prolly master bounced, slave is looking for non-existent binlog entry)
- could not parse relay log event entry

MySQL Quiz Show

- mostly silly questions, some MySQL history trivia, a few technical questions
- lot of popcorn and ice cream.

BOF: Sphinx Search, Andrew Aksyonoff and Peter Zaitsev

- Sphinx Search has more and more users
- Peter is providing Sphinx support to web companies
- Andrew Aksyonoff is thinking of doing 2 months of development on Sphinx and offering support services and embedding in products.

Exhibits

I like to spend 5 minutes with each exhibitor to learn about what they’re doing and see what’s new.

The most notable were:

Alfresco Enterprise Content Mgmt.
Joomla CMS
Sphinx Search was there again.

Absent this year were booths for DealNews and Quest Software. DealNews is sponsoring Phorum.

Thursday, April 26

The innotop Workshop, Baron Schwartz

- showed innotop
- talked about changes to InnoDB for more statistics
- talked about replication monitoring tools. one of most popular projects on sf.net

Managing MySQL the Slack Way: How Google Deploys New MySQL Servers, Michael Still, Grant Allen

- Google uses PXE, slack to build hosts, Open Source
- guessed that CF Engine would be more complicated than necessary
- can’t upgrade rsync on all legacy nodes, so experimented with scp first, then netcat
- can build MySQL host in 10 minutes, faster to rebuild than troubleshoot
- use IP address as the master/slave id since both are 32 bits.

Introduction to the Japanese Charset, Yoshinori Matsunobu

- strange definition of multi-byte - more than 2 bytes
- mysql utf-8 is 4 bytes or not?
mysqlforge: WL#3780: Replace 4-byte UTF8 input

High Performance Data Warehousing with MySQL: Tricks and Tips from the Field, Brian Miezejewski

Great talk. He used to work at American Airlines on DW projects.

- memory engine for smaller tables that are not updated often
- nitro
- infobright 10 to 1 compression including indexes on all columns

–init-file= in init script
- insert on duplicate key
- sort_buffer_size key_buffer_size
- key_cache_division_limit
manual myisam-key-cache
- mysqldumpslow –help
- log_queries_not_using_indexes
- mysqladmin ex -r -i60 | grep -v ‘ 0 ‘
- ran into Seti At Home administrator later, he’s interested in the hi-perf table types, gets 500 GB drive from Arecibo instead of tape now, always something with budgets

Closing Keynote

Yahoo! Pipes, Pasha Sadri

- good demo, showed searching for rentals with characteristics that he likes
- author surprised to see Pipes used to do language translation in Second Life
- Mercury News says this is the first Brickhouse incubator project to get productionized

Cnet Sysadmin conversation

- continuance mcluster/unicluster as middleware/load balancer to multiple MySQL databases
- different database product for DW

sheeri.com: more blogs about MySQL Conference

Google Releases MySQL Patches

Thursday, April 26th, 2007

At MySQLCamp, mention was made of a few Google internal patches to MySQL4 to improve replication and IO throughput on InnoDB (Mark Callaghan, ex-Oracle, did the InnoDB work).

The good news is that Google has just released them.

The license is GPL, so we’ll see what MySQL AB can do with them as their code is dual-licensed (GPL and commercial).

Hopefully this will prod Yahoo! into releasing more patches for Open Source software they use. I’ve heard that in 2006 it was tough for employees to get permission to release MySQL patches into the open.

Google releases patches that enhance the manageability and reliability of MySQL
List of MySQL4 Patches

MySQL Hacks

Saturday, April 7th, 2007

MySQL LogoLinks for cool things that people do with MySQL.

Amazon Web Services Blog: MySQL Interface to Amazon S3
Mark Atwood’s S3 Blog Entries

register.com: Inside Amazon’s web services: SLAs for us but not for you, says Amazon’s CTO

SVLUG: Talk on Meldware Groupware

Wednesday, April 4th, 2007

Aron Sogor, cofounder of BuniSoft, gave a talk at Silicon Valley Linux Users Group (SVLUG) tonite on the Meldware Linux Multi-Platform Groupware Package. He is the core developer of the Calendaring feature in Meldware.

This is a suite of servers written in Java that uses a database backend as a store. They use Hibernate for Object-Relational Mapping (ORM) and support Postgres, MySQL and most JDBC3-compliant databases.

The project provides a multi-platform set of servers for MTA, IMAP, webmail (with Adobe Flex) and calendaring. You can either continue to use your existing servers or move to theirs for the various components.

So far Aron has tackled iCal and wCal, but not yet the Exchange Calendaring protocol.

It was fun to hear about the intricacies of calendaring from an experienced calendaring server engineer. Apparently calendaring vendors work on the server, protocol and client at the same time, and when they converge just ship it, standard or not. Much like IMAP and its incompatibilities.

Overall quite an ambitious project.

Thanks to Symantec for hosting the meeting. SVLUG is looking for a new corporate sponsor for expenses like shows and pizza.

MySQL Performance Tuning Links

Saturday, February 24th, 2007

YouTube: Jay Pipes, MySQL AB: Performance Tuning Best Practices for MySQL
MySQL Proxy