Archive for the ‘MySQL’ Category

OSCON 2008, Portland

Friday, July 25th, 2008

I attended the O’Reilly Open Source Conference, once again in Portland, Oregon.

Overall my impression was that the talks and vibe were oriented towards Web 2.0 primarily.

I would say that the talks were not as strong as previous years, but it’s easy to compensate for that with the “hallway track” and access to the original Open Source authors.

Several attendees used the EEE sub-notebook computer, and were happy with it as a email/browser tool.

Wednesday

PHP Taint Tool: It Ain’t a Parser

- CS’y effort at PHP parser for code analysis, reminds me of early days of Perl’s B tools
- not suitable for end-users

Write Beautiful Code (in PHP), Laura Thomson, Mozilla

- good general background on good programming practises
- not a lot of specifics about PHP, but available for questions

Hypertable, Doug Judd, Zevents

- HyperTable is a clone of Google’s BigTable, from public paper
- room was packed, some turned away
- still alpha, maybe beta in August
- preferred distributed filesystem is HDFS, works with others
- I recommend reading web site and then looking at the curt slides
- plans to do benchmarks with same hardware as Google has published.

Open Source Virtualization for People Who Feel Guilty About Using VMware So Much, andy michelle, EDA

- cute talk about VirtualBox, Xen and VMware
- Xen has weird nomenclature compared to other tools
- VMware wins on tools and polish
- showed screenshots of unreleased and alpha mgmt. tools.

Barely Legal XXX Perl, Jos Boumans, RIPE

- stunning and twisted example of overloading, short-circuiting, import-faking, whatever it takes to make a loaded module do something other than intended
- illustrates great flexibility of perl, for good or ill
- could be useful for things like testing harnesses, etc.
- motivated to win bet of $100 or 1 vertical meter of beer
- said it took 3 or 4 hours to complete.

I walked around the exhibits area.

Got a demo of Atlassian’s continuous integration (CI) tool, Bamboo. They’re also the vendors of JIRA issue tracker and Confluence wiki, which I’ve used before.

One company had a public Wii game happening.

Thursday

Scaling Databases with DBIx::Router, Perrin Harkins

Ultimate Perl Code Profiling, Tim Bunce (Shopzilla)

- talk and screenshots about NYT perl profiler


The New York Times Perl Profiler

Top 10 Scalability Mistakes, John Coggeshall (Automotive Computer Services)

- good overview of writing high-performance, maintainable Internet systems
- interesting opinion that scalability is not just about increasing performance. scalability can be about scaling up or down, performance or maintainability, etc.
- recommended php.ini settings list

Perl Lightning Talks

- popular with audience, attendees seemed to like all the talks
- Mail::ESMTP looks very interesting for testing and production

Code is Easy, People are Hard: Developing Meebo’s Interview Process, Elaine Wherry (meebo)

- struggled to find time, right approach to interview new candidates in 1996, likely at behest of VCs
- external recruiters hit-and-miss, conferences and jobs email link useless
- phase where non-founder employees doing interviews wanted a founder involved in interview process
- trying to preserve culture (finger rockets, social networking, 2 female founders, etc.)
- came up with process involving reading resumes, phone screens, and office “sim” that adds a new candidate within 3-6 weeks
- “sim” has 3 versions: office manager (plan to erect a meebo office sign), front-end engineer (write a JavaScript app), and back-end engineer (write a server) in 4 hours
- current goal is to keep interview time down to 8 hours per candidate over 10 days
- now up to about 40 employees
- my feeling was that their hiring process started off clueless due to inexperienced mgmt. and is still oriented towards junior engineers. Silicon Valley is full of expert engineers and it doesn’t take 8 hours to interview them.

BOF

mysql-sandbox

Giuseppe Maxia discussed and demoed his very useful mysql-sandbox utility for managing several versions and instances of MySQL on the same machine.

He wrote it for his testing work at MySQL AB. Very well received by attendees. This is a great example of what I call “anti-virtualization” - using ports instead of resource-intensive VMs.

MySQL Conference 2008 Presentation

State of the Onion Address, Larry Wall

- talk about Perl6, random anecdotes, etc.

Friday

Open Voices, Jim Zemlin (The Linux Foundation), Keith Bergelt (Open Invention Network), Karen Sandler (Software Freedom Law Center), Phil Robb (Hewlett Packard)

- panel discussion of various free software efforts, some little-known

An Illustrated History of Failure, Paul Fenwick (Perl Training Australia)

Paul gave an interesting talk on notable Software Failures and estimated a price tag for each. I had heard news reports of many of them, but it was interesting to hear an updated analysis of what really happened behind the scenes.

Thanks to Google for sponsoring the fairly good almost-gourmet lunches. Sure beats the O’Reilly lunchbags from the dot bomb days. (Everybody I know bailed and found a subway shop back then.)

Notes

- Burgerville popular with attendees, can upgrade combos to a shake.
- Red Lion hotel has a small cardio gym with 1 universal machine, no free weights, open til 11 pm
- WiFi password changed weekly, in middle of remodel, lobby just finished.
- There is a 24-Hour Fitness that is actually open 24 hours near downtown Portland. Has basketball court and 2-lane pool. $15 for non-member visitors.

OSCON 2008 Presentations

DRBD and MySQL: Just Say No

Sunday, April 20th, 2008

I’ve successfully used MySQL statement-based replication for several years across data centers and understand it’s quirks.

While at the MySQL Conference, I tried to see how DRBD could help the installations I manage, but I just can’t drink the DRBD Kool-Aid.

MySQL Replication Pluses

  • Free
  • Easy to setup if you already have a backup and master position
  • No shared storage to manage or corrupt
  • Light network load
  • Can use master for r/w and slaves for r.
  • can do maintenance on slave (ALTER TABLE, etc.) and failover afterwards
  • works well across Internet even with high-latency
  • many replication problems simple and hand-fixable

MySQL Replication Minuses

  • Slaves can/will get out of sync with the master, typically noticed after a few weeks or with Maatkit
  • Changing masters requires rebuilding slaves
  • There is always some replication lag when there is a busy master
  • no checksums or 2-phase commit

DRBD is a low-level driver to copy a disk partition in near real-time from a master to a failover node (cold standby.)

MySQL with DRBD Pluses

  • Free
  • No fsck or transaction log replay needed if manual failover.
  • Slaves don’t need SET MASTER updated unless DRBD fails.

MySQL with DRBD Minuses

  • DRBD partition corruption means failover node would be unusable (disadvantage of shared storage) and failback could destroy original master too.
  • if the master panics, then after failover both fsck and transaction logs replay must be performed
  • more work to setup initially than statement-based replication
  • NIC and network corruption is also propagated.
  • Failover node is a cold standby, cannot accept database traffic if that would change the DRBD partition
  • Could generate a lot of network traffic.
  • cannot do maintenance on cold standby database
  • 2 heartbeats needed on a reliable, local network

I can see how MySQL/DRBD would be appealing for those who operate on a reliable network and don’t need Master-Master for load or maintenance, or who have many slaves that cannot easily be rebuilt.

Eric Bergen: DRBD in the real world.

MySQL Conference 2008

Thursday, April 17th, 2008

I attended the MySQL Conference once again at the Santa Clara Convention Center.

Despite the January purchase by Sun, the conference had the same great vibe as usual, and everybody showed up again.

Top Conference Themes

Some of the conference themes I noticed are:

  1. Linux LVM snapshots are now popular for MySQL backups. Snapshots have long been used in enterprise IT, now it works well and for free on Linux. Another use for a snapshot backup is to copy a busy master offline for comparing to a slave with Maatkit mk-table-checksum.
  2. DRBD is popular for HA (one speaker wondered if his talk about DRBD was still relevant since everybody was already using it), but I see some drawbacks.
  3. developers are concerned with supporting massively multi-core CPUs in both database and cache code. The Sun Niagara multi-core architecture seems to be the future, with 128 or more threads.
  4. databases and storage are quickly increasing in size, so many DBAs are interested in MySQL 5.1 partitioning and other tools and techniques.
  5. cloud computing is entering common use with lots of Amazon EC2 and S3 users. mosso.com has been available for a couple years from Rackspace, and Google and IBM are entering cloud computing. Users would like more competition to reduce prices and improve reliability, SLA or not.
  6. most companies say it’s hard to find experienced MySQL DBAs, but most are lazy when it comes to training and compensation.

Top MySQL Conference Tips

  1. Disable swap if your version of Linux supports it (most do.) This avoids getting some of MySQL’s pages swapped out and crippling the box with IO.
  2. Use memcached. MySQL User Defined Functions (UDFs) for memcached are now available to auto-populate memcached from MySQL statements.
  3. Consider multi-level partitioning schemes with MySQL 5.1, like combining RANGE and KEY.
  4. Consider MySQL High Availability (HA), either with replication, DRBD, or both. Linux HA project.

Top MySQL Conference Misconceptions

I had to straighten out a lot of newbs:

  1. statement-based replication is not reliable, does not have checksums or two-phase commit, and masters and slaves tend to diverge over time
  2. many novices believe Innodb does row-locking only, but often does range and table locking
  3. mysqldump is ok in most cases, but you have to be careful with locking and locktime, matching charsets, and testing the dump.

Here’s my notes on some of the talks I attended. (In case you haven’t read my blog before, I’m a long-time user of MySQL, replication, LVM and memcached. I have not used DRBD.)

If you have a correction or improvement, please leave a comment and I will update this blog entry.

Monday (Tutorials)

Building Scalable & High Performance Datamarts with MySQL, Tangirala Sarma

Discussed general DW concepts at first.

3 main requirements for a succesful DW project are:

  1. good data quality
  2. the right tools
  3. phased results.

Talked about various partitioning schemes in MySQL 5.1. I’ve used it for about 6 months in one project, but most of the audience was new to MySQL partitioning and struggled to understand beyond RANGE partitioning for logging it seems.

Later talked about MySQL-particular DW aids, including:

  • Kickfire appliance, which has capabilities such as column-store, compression and fast loading.
  • Infobright, which also has column-store and fast loading.
  • Nitro

His recommended references are:

  • DW Toolkit, Kimball on Amazon.com
  • Enterprise DW with MySQL, MySQL AB
  • MySQL Roadmap 2008-2009, MySQL AB.

Also, there’s a list of books here:

DW and BI Starter Books

Queued up for sandwiches and salad. Not really a surprise with O’Reilly as the conference organizer, expected more.

Ate lunch with the DRBD programmers. They said that MySQL AB now provides 1st- and 2nd-level support, and their company provides 3rd-level support and cashes checks. :)

Memcached and MySQL: Everything You Need To Know, Brian Aker (MySQL), Alan Kasindorf (Six Apart)

Very detailed talk about tips and issues with using memcached.

  • Evolving online notes
  • Brian wrote about 30 man pages for memcached, edited by Mark Atwood. Unusual amount for an Open Source project.
  • memcached is very handy for people stuck with databases on 32-bit systems and a lot of otherwise unaddressable memory.
  • Patrick from Grazr has written MySQL UDFs to populate memcached, has a SoC student. Pipelines cache inserts. Handy for distributed DCs already using replication, triggers.
  • Postgresql has pgmemcache()
  • lighthttpd has mod_memcache, prolly url hash key
  • Apache has mod_memcached with CAS, GET/PUT/DELETE, still alpha, try at pandoraport.com
  • limits: key size 250 bytes, data size 1 MB, 32/64 bit limits
  • threading is new based on giant mutex, bad for more than 8 cores
  • stop swapping with -MLOCKALL, noswap, sizing
  • stats sizes to test efficiency
  • command line option to disable LRU
  • CRC most consistent hashing, normal
  • ring consistent hash
  • IP takeover
  • bad switches or intermittent network is very bad
  • pick a driver than can do multiget
  • ghetto lock
  • Tim Bunce’s Cache::Memcached::Libcache does not do Storable, which is what most people prolly want
  • time in seconds < 30 days is relative, > 30 days is absolute
  • namespace trick: versions in key name
  • uint32_t type parameter usually indicates whether compressed or storable, can be used for anything though.
  • memcached_tool: memcp, memrm, memstat, memslap (load testing)
  • showed Mixi MRTG graphs: 6800 reads/second, 200 servers, no CPU load
  • Brian added IPV6 support after mysqld update (pet project, but helped with multi-interface support and to optimize out name resolution)
  • binary protocol code available, not merged yet, helps with multi-byte charsets than embed spaces or newlines which break the text protocol
  • improvements needed are durable to disk, highly-threaded
  • persistent connections are good and recommended
  • UDP alpha code available, good for lots of sets
  • storing BLOBs with MogileFS or LUSTRE good
  • speakers did not have experience with commercial caches, but did say that most people find Java caches often too featureful and slow
  • MogileFS, Hypertable, Hbase interesting

Tuesday

EXPLAIN Demystified, Baron Schwartz (Percona Inc.)

Room was packed for this talk. Good step-by-step talk for understanding EXPLAIN better.

Replication Tricks and Tips, Lars Thalmann (MySQL), Mats Kindahl (MySQL)

Some good tips, but overall an assumption is made by MySQL AB that a MySQL master and slave actually have the same data.

  • mysqlbinlog has –hexdump option for seeing byte-level dump
  • examine both binlog and relay log when debugging replication
  • you can clone a slave from another slave if you trust it - just do STOP SLAVE, SHOW SLAVE STATUS, shutdown, copy over the files, SET MASTER, and START SLAVE

Dramatically Improving MySQL Database Performance in Data Warehouse Applications, Martin Farach-Colton (Tokutek)

His view is that the storage engine is the primary bottleneck in BI systems for loading and search, and showed how to use a B-tree to organize data to guarantee the maximum performance in a growing DW. (Not sure why the online summary talks about fractal trees.)

How to Achieve Operational BI on a Budget, Lance Walter (Pentaho Corporation)

Kind of overly Pentaho-product oriented, since Lance is a product manager for Pentaho.

However he made a useful distinction between historical and operational BI.

Historical BI is mainly reporting on what happened before yesterday, and operational is up-to-the-minute for business process analysis and improvement.

By designing your BI system for both historical and operational requirements, you can get both at the same time.

Interesting case study of the US Navy doing BI on pilot training and operations to reduce accident risk

MySQL Backup BOF (hosted by Zmanda rep)

- admins attending generally unaware of LVM, but awareness growing
- one guy split his database across multiple databases for easier mgmt. and uses a HP SAN with 1 TB of RAM, very happy with IO performance, not so happy with price.
- one guy using 100 EC2 instances and S3. ok except for recent outages, maybe a little pricey.
- one guy using dd for fast network copies
- Zmanda just integrates existing techniques and allows scheduling, but prolly quite useful for inexperienced DBAs to do point-in-time recovery and schedule backups.

MARIA BOF

Hosted by Monty with his usual wicked Finnish black vodka.

He said Falcon was supposed to take 3 months, but slipped, so work started on MARIA to replace MyISAM. He has made promises to deliver a working storage engine, so will continue and do so and prolly release MARIA in 6 months.

All of the MARIA programmers were required to read Jim Gray’s textbook Transaction Processing: Concepts and Techniques on Amazon.com, and each have said they understand it.

I got the impression that MARIA should end up with a cleaner codebase than Innodb.

Discussed table-level checksums for replication checking, which he is planning to do, might be an option. Still time to decide whether to use CRC32 or another algorithm.

One of his programmers gave a demo of yanking power on MARIA-current on his notebook computer, though I didn’t notice what the outcome was.

He said that ALTER TABLE optimizations are planned, including instantly dropping an index without copying the table, though add would still require copying the table.

Another conversation could be paraphrased as, “After MARIA, he will work with the Sun team that fixed Postgresql threads to fix various thread scaling problems in MySQL.”

Sphinx Search BOF

Hosted by Peter Zaitsev and Andrew Aksyonoff, the author.

Percona has used Sphinx Search in a few projects for web forum searching for the past few years. Separate index server is recommended.

About a dozen people talked about full-text search requirements and experience with Sphinx. One guy was spending a lot of money on encad(?) and didn’t want to spend more on a bigger license later.

Wednesday

MySQL Performance Under a Microscope: The Tobias and Jay Show, Tobias Asplund (MySQL), Jay Pipes (MySQL)

Presented slides on the performance of various workloads.

The MySQL Query Cache, Baron Schwartz (Percona Inc.)

Excellent in-depth discussion of the query cache.

Grazr: Lessons Learned Building a Web 2.0 Application Using MySQL, Patrick Galbraith (Grazr Inc.), Michael Kowalchik (Grazr Corporation)

Deadlocks, Wait Timeouts, and Other Transaction Issues, Jess Balint (MySQL)

Thursday

DTrace and MySQL, Ben Rockwood (Joyent Inc)

Good talk on using DTrace specifically with MySQL, mainly query debugging.

Scaling with MySQL using Materialized Views and a Shared Everything architecture, Moshe Shadmon (ScaleDB)

Listed 3 ways to do materialized views, but dwelt on their ScaleDB cluster product mostly.

High Availability MySQL with DRBD and Heartbeat: MTV Japan Mobile Services, Patrick Bolduan (MTV Networks Japan KK), Yoshinori Matsunobu (MySQL)

Talked about setting up HA using heartbeat, pingd and DRBD on a mostly-reads 5 GB CMS db. Used Enterpeise MySQL distro and support. No replication involved, likes mysqldump. Happy with MySQL 5.0 and Unicode with Japanese. Cute slides with Japanese maru symbols, etc. DBRD staff were on-hand to help with more difficult questions. Said LVM can be used under or over DRBD.

The Science and Fiction of Petascale Analytics, Jacek Becla (SLAC)

Talked about petabyte and exabyte DW for physics and astro programs. Contrasted science and industry PB databases (Google, MSN and Yahoo! likely each have 100 PB databases, but don’t disclose the size.) 5 years to plan DW for next experiment.

Spent 10 minutes with Rohit Nadhani and his programmer from Webyog looking at their MonYOG 2.01 version. Provided UI feedback for database operations use based on several months usage. Looking good.

Talked with Patrick Bolduan, somebody from DRBD and a NY Times IT guy who uses EC2 in the lounge. Apparently S3 can lose up to 10% of insert requests. Everybody is looking forward to when Amazon EC2 and S3 have competition for both pricing and reliability improvements.

Conference Evaluation and Recommendations

I go to a lot of conferences as a paying attendee, so I usually provide feedback to the organizers to help them improve the experience.

I mentioned to Jay that overall the conference was fine:

  • Talks were good, maybe less technical than previous years. Jay said Sun wanted more talks for novices since 2007 was too hardcore for some new attendees, but O’Reilly did not want to ghettoize newbs with a single track and room. For sure 2007 had too many sharding talks, mostly with users stuck on 4.0. There’s always the hallway track with developers anyway.
  • Food was ok but not great, although O’Reilly, the conference organizer, slipped in sandwiches for lunch on tutorial day. (SCCC is in an isolated location, so food is a big deal.) Still way better than OSCON in recent years. I guess the original MySQL conferences at the DoubleTree spoiled me.
  • Some of the vendors got too salesy in their presentations, but it’s hard to crack down on sponsors. Pentaho and ScaleDB come to mind.
  • conference still provides full access to MySQL managers and key programmers, in the best Open Source tradition
  • Still need a big-iron room with functioning demo SANs and HA setups, as I’ve suggested for a few years.

MySQL Conference 2008 Presentations
TechCrunch: Rackspace Offers Cloud Computing with Mosso
cnet.com: Is cloud computing more than just smoke?

MySQL and Choosing a Random Row From Unnumbered Sets

Tuesday, April 8th, 2008

Choosing a random row from a table using SQL is a common task.

For small tables and when you don’t have a numeric key or don’t know the maximum value in advance, SELECT * FROM test ORDER BY RAND() LIMIT 1 is adequate. However, this involves calculating RAND() for each input row, followed by a sort.

When there is a numeric key from 1 to N and there are no gaps (ineligible records), a more optimized method is SELECT * FROM test WHERE id >= CEIL(N * RAND()) LIMIT 1.

A more difficult scenario is when there is no numeric key, there are rows in the table that are ineligible, and you don’t want to sort. In this case an artificial rownum column can be computed and used in the selection calculation:

SELECT id FROM (SELECT id, @n := @n+1 rownum FROM test, (SELECT @n := 0, @r := RAND()) d WHERE 1) x WHERE rownum = CEIL(@n * @r);

Replace “WHERE 1″ above with whatever conditions you need.

Xaprb: How to number rows in MySQL
Perl Cookbook: 8.6. Picking a Random Line from a File

Ghetto MySQL Innobackup with rsync

Saturday, November 3rd, 2007

I was reading an interesting samba mailing list comment about using rsync on live MySQL databases.

The author said this:

“Assuming a short break in accessibility is tolerable, I’d

  1. run rsync to the backup
  2. stop the server
  3. run rsync to the backup (should be much much faster now)
  4. restart the server.”

Combining rsync and mysqlhotcopy we can get a little fancier:

Ghetto Innobackup-style backup with rsync

  1. STOP SLAVE; FLUSH TABLES
  2. run rsync to the backup
  3. FLUSH TABLES WITH READ LOCK; SHOW SLAVE STATUS;SHOW MASTER STATUS
  4. run rsync to the backup (should be much much faster now)
  5. UNLOCK TABLES
  6. START SLAVE

Note that the read lock and unlock must be done while on the same database connection, and innodb continues to update indexes even when read-locked.

Also, record the master and slave status values. They may be very useful later if you want to apply binlogs to the backup, or initialize a slave.

This technique would be very suitable for non-critical snapshots like QA copies and on quiet databases.

It may be suitable for busy databases if other methods aren’t working out, for instance you don’t have LVM snapshots setup and innobackup is locking your MyISAM tables too long.

rsync -a is also useful for backing up master binlogs every 5 minutes on a live site. Normally you’re better off setting up a slave just running the slave IO thread, though.

Many databases have features to allow “log shipping.” With MySQL, similar functionality is accomplished by doing FLUSH LOGS and rsync, or using replication (there is a command to not execute the replication stream, just save it to disk.)

FLUSH NO_WRITE_TO_BINLOG LOGS
FLUSH TABLES WITH READ LOCK

OSCON July, 2007 - Portland

Friday, July 27th, 2007

I attended the O’Reilly Open Source convention again, making it 10 years in a row. Once again it was held at the Convention Center near downtown Portland, a convenient light rail ride from the airport.

Like many experienced developers, I spent a lot of time in the “hallway track” talking to other developers and users, as well as in one of the 15 simultaneous talks.

The general consensus was that the talks were not as strong as in previous years (not even compared to the MySQL conference this year), but it’s worthwhile to me if I can get even one juicy nugget from each talk, or gain an understanding of a developing trend in programming or system administration.

Many of the presenters griped about there not being enough time to look at source code in a 40 minute talk.

For those who want a conference summary in a nutshell:

  • OpenID is popular
  • lucene and its REST interface have more mindshare than projects like Kinosearch, language-specific bindings, etc.
  • Yahoo! released the yslow browser plug-in for front-end performance evaluation
  • Perl: no ORM appears to be gaining the upper hand, though DBIx is respected. Tim Bunce would like to see a wrapper around JDBC for each scripting language. Alison Randal is updating the Perl license.
  • PHP: no good way to do vector reporting graphics, especially since IE doesn’t support SVG and Adobe is killing the Macromedia plug-in in December. PHP4 is being EOL’ed 8/8/8 so that the PHP developers can focus on 5 and 6 only.

Google was heavily recruiting at the conference. I ran into 3 recruiters, and there were even more in the Google booth.

Pretty good food for lunch, usually chicken or fish in some kind of red sauce with steamed veggies. Better than the wilted sandwich boxes from previous years that mainly got tossed out.

Tuesday nite

I arrived at the Convention Center in time for the evening Google Open Source awards. Happened to sit next to Zak and the 20 year-old OpenID guy, David Recordon, who won $5,000 and a colored, transparent, angular plastic trophy and base that we had fun stabbing each other with.

The OpenID Foundation is offering a $5,000 bounty to the first 10 OSI-approved projects that add OpenID support. Many programmers were busy adding it, including SocialText and others. (David works at Verisign.)

I walked over to the Doug Fir Lounge with a few guys, 2 of them Austrian. I had the halibut fish and chips and lemonade for $20 including tip. It was ok. They have a log cabin motif happening with a restaurant, patio and bar upstairs, and dance club downstairs, so ID is required to enter. Open from 7 am to 2:30 am every day, 1 503 231 WOOD.

Wednesday

Nagios

- general overview of features
- Event Broker most powerful, least used

Bigger and Faster
Rasmus Lerdorf

Rasmus did his usual “PHP is as secure as any other language”, and “pick on a PHP app and make it go faster” talk.

He said he’s still not a Y! Paranoid, but his work does often touch on PHP and web security.

He used to use httpload, but now prefers siege for load testing because it has support for cookies.

http://developer.yahoo.com/yslow/
Live HTTP Headers
APC

sla.ckers.org/forum/list.php?3
php.net/filter
xdebug.org/docs/profiler
talks.php.net/show/oscon07

xdebug
jeremiah san diego xss console author
scanmus.corp.yahoo.com

PHP and Ruby Envy
- NZ programmer on Silverlight CMS (BSD licensed)
- own object system in PHP5
- Ruby less available on web servers, less mindshare
- rolled his own PHP OO frameword apparently

Exhibits

- talked to Mark Finkle of mozilla.org
- said hi to Larry. He had the whole family there.

Trac
Vivek Khera

- he uses RT for public tickets, Trac internally
- doesn’t require much resources since only a few developers
- Trac is used on many Ruby/PHP projects
- gives you wiki/tickets/etc.
- modified BSD license

Afterwards mentioned:

- uses Trac in a BSD jail
- an alternative to Trac would be basecamp (or I guess Sourceforge software). See slashdot.org threads for more ideas.
- likes pfSense firewall as an appliance
- nagios alerts too much, and no good rule builder for multiple hosts
- own web framework called Rowdy (RWDE)
- software as complicated to install as RT should be treated as an appliance
- he submitted 6 related talks on software development environment, only 1 accepted.

Steve Souders
Chief Performance Yahoo!
souders@yahoo-inc.com
Exceptional Performance Group

http://developer.yahoo.com/performance

- IBM Page Detailer Pro
- yslow (crawls the DOM, not a packet sniffer)
- firebug
- jslint - The JavaScript Verifier

80-90% of end user response time is spent on the front-end. so optimize there.

14 Rules for a Better User Experience

1. make fewer HTTP requests
2. use a CDN
3. add an Expires header
4. gzip components - even JS and CSS
5. CSS at top
6. JavaScripts to bottom
7. avoid CSS expressions
8. make JS and CSS external
9. reduce DNS lookups
10. minify Javascript
11. avoid redirects
12. remove duplicate scripts
13. configure Etags - disable in most cases if load-balanced or multiple web servers
14. make AJAX cacheable

move JS to onload
remove bottom tabs
avoid redirects
images sprites
expires

Thursday

PHP Graphics
Luke Welling, OmniTI

Luke presented an overview of raster and vector graphics modules for PHP.

He prefers vector graphics, but there’s not many free options for doing that.

He feels that Yahoo! Finance and Google analytics sites are state of the art in presentation graphics with anti-aliasing, interactivity, esthetics, text and maps. I’d say that’s aiming a little low, but it’s a start I guess.

Perl Lightning Talks

  • Vani Raja did a talk on Yahoo! JS
  • Ask did his talk on qsmtpd again
  • talk on Test::More 3?
  • talk on task lists for hit and run volunteers
  • Schwern did one talk on making tea for 5 minutes, and one on “Blame Schwern” - just do it instead of waiting for permission
  • Tim Potter did a talk on a messaging standards effort for his employer, saying that the ANSI process was too slow and looking for an alternative
  • Andy Lester did a talk on ack
  • a talk on SVN::Notify
  • http://angerwhale.org/
  • Tim Bunce talked about DBD::Gofer Proxy and next-gen cross-scripting language DB API based on JDBC API
  • guitar song about #perl

YouTube: Perl Lightning Talks on Handycam by Schtonk

Perl Auction

Larry’s talk on comparative languages and Perl6. Sounds like we’ll be able to do something like foreach (1..infinity).

Full Text Search BOF
Peter Zaitsev
- based in London, England but often in SV
- uses Sphinx on several servers
- http://boardreader.com/ one TB of searchable data
- geneology is big on full-text search

- after insert, mysql full text gets slow, run optimize.
also, doing it at insert time causes index update per keyword
- gin or gyst for Postgresql 8.4?
- Michael Kimsal, SOLR
- hard disk space is free (enough for whatever indexing is required)
- mostly news search involves last 5 minutes of feed
- MessageOne stores email for lawyers to mine. They like to search, archive and expire. Mostly Exchange lusers, rarely Unix admins.
- Lucene and REST interface
- Monty says MySQL AB hired a programmer to work on search, but he’s working on another project now. They need somebody with a burning desire to make progress in an area like that, but they recognize the importance of search.
- Monty poured out free Finnish chocolate rum from a Pepsi bottle that was so powerful it scared most people. He said it was banned for 2 years in Finland because it was so addictive.

Sun BOF

- audience talked to senior Sun staff about Java and Solaris a little.
- free beer, cheese and crackers.

Friday

A bunch of Postgres people went to the Portland wine tasting on the river event in the afternoon.

Call for Software Whiteboard

OSCON07 Call for Software Whiteboard
flickr.com: Jeff Kubina’s OSCON 2007 Whiteboard set of tiles

FrontBase RDBMS

Friday, July 6th, 2007

I came across the Frontbase RDBMS recently. Their 288-page users manual was so nice and clear I decided to read through it.

The FrontBase project was started in 1996 for Copenhagen’s 911 service (MySQL has similar practical origins, written to support retail data warehousing contracts). They claim it’s a “database designed for the Internet” and requires zero maintenance. Currently they’re on version 4.2.

It’s a proprietary, closed-source database written in ANSI C, but free for most users. They currently charge for support and consulting, and sell licenses for embedded users. The FBAccess client library source is available on request for programmers to add support to their oses and devices.

It seems to have traction in the Mac community, likely because of the nice Mac client program and FileMaker and RealBasic support.

What’s nice is that it supports transactions and full-text search at the same time, unlike MySQL.

FrontBase supports:

- servers: Mac OS X, Linux, Windows, Solaris, FreeBSD
- clients: FrontBaseManager Aqua client for Mac OS X allows mgmt. and monitoring, also web, common scripting languages (Perl, PHP, Tcl), JDBC, ODBC, and the sql92 console client
- SQL92
- Unicode
- statement-based replication from transaction logs
- full SQL logging options, allowing replay (SET WRITE SQL TRUE GLOBAL;)
- full-text indexing
- stored procedures
- clusters
- table spaces
- raw device driver
- live (hot) backup due to versioning (WRITE DATA;)
- data and communications encryption
- whitelisting and blacklisting of IP addresses
- row-level locking
- row-level privileges
- read-only databases (note that replication slaves are read-only)
- import utilities for Access and FileMaker
- migration tools for FileMaker (tables only) and MySQL (no enum or set column support)

They have been changing their licensing, so contact gclem at frontbase.com for the latest info.

FrontBase sure looks like a clean, modern RDBMS implementation.

I plan to learn more about its reliability and performance.

I have noticed that it is prone to crashing on Mac OS X PPC. Even a simple query like SELECT DISTINCT(OID) FROM TABLE; on 500,000 records with mostly nulls can crash the database, but hard to tell if it’s Frontbase, the OS, or what.

YAPC::NA 2007 Perl Conference, Houston

Saturday, June 30th, 2007

I was at the YAPC::NA 2007 Perl Conference this week at the University of Houston in Texas.

Once again, it was a great conference, with about 200 people attending.

It was my first time in Houston, and boy was it humid. My eyeglasses fogged up every time I went outside. It might be the most humid place on earth, even moreso that Bali (on the equator.)

YAPC is a great value. The conference, 2 days of tutorials plus book and 5 nights of dorm accomodations was $384.

Sunday

- got up an hour before my flight
- hot and humid outside airport in Texas
- took bus from the airport, 73 and 30, 88 doesn’t run on Sunday
- city is not a war zone like Detroit or other inner cities, but rough. one drunk on the bus
- staying in Taub dorm in “the quadrangle”, has ok common
basement with TV lounge, weight room and laundry room
- spent evening at The Cadillac Mexican food restaurant
- very off-campus $25 cab ride
- $30 fajitas and cheesecake

Monday

- meeting rooms are in the 3-storey University Center
- scattered around vertically
- coffee
- number of fast-food restaurants: Wendy’s, Subway, Chili’s Too
- Cougar shop
- TV lounge with wide-screen flat-scree TV
- Game Center with FPS, pool, air hockey and bowling

Larry gave 2 talks in morning

- briliant comparative language analysis
- Perl borrows from other languages heavily
- Perl hits the sweet spot for programmer freedom
- Prolly too many object models in Perl5. one is enough.

Lifetime of Many Hats

- born near Pasadena
- almost drowned with styrofoam surfboard
- liked to specialize in everything
- dropped out of original uni program, worked, went
to missionary school and studied linguistics, back to uni

The Perl Foundation

- case study of dev shop that couldn’t find enough perl programmers,
tried Java, ended up with 25 of each
- creating materials for the public in MBA-speak
- Forester called them and asked for scripting language survey participation
- the analyst groups rank language by current capability
and long-term strategy equal weighed, perhaps problematic for perl
which has a mature perl5 but fuzzy perl6
- somebody has written a Perl module for the Facebook API,
overlooked in their initial API release

Dave Rolsky

- error, validation, logging
- good talk

Abigail

- sudoku solver with Perl regex
- regex is NP complete … so let’s do something insane
- 250 Kilobyte regex to solve 9×9 sudo, 90 minutes to run
- he can solve it by hand in less time

Tuesday

Resume talk was funny. Basically, don’t do things
like send a picture of yourself in a disco, show up
late, or be difficult. Based on real-world experience.

Perrin Harkin

- always thoughtful and practical
- but lolcats galore in slides
- tried VMware for snapshotting environment, 4x slow
disk io on their hw, moved to lvm
- uses version control, indifferent, cvs good enough
for a long time anybody who said otherwise was a crackpot
- use 2 branches (dev on main plus maintenance branch)
- test is important, Test::More is not enough because
of namespace pollution?
- tried to store db operations and roll back in perl,
prolly better to just truncate

Casey West

- working at SocialText, SocalCalc
- started project with MochiKit, web 2.0 ish framework
- roundelement for rounded corners
- cross-browser keyboard modifiers
- animation lib
- functional
- did one lolcat to mock Perrin
- switched to MooTools for JS, smaller memory footprint

Evan Carroll, Houston.pm, Dealermade

- catalyst with Dojo, template toolkit
- ruby with scriptaculous JS, integral

Nice dinner in Hilton ballroom.

Auction hosted by Uri. Quite humorous, raised about $4,000.

Dorm party afterwards, lot of alcohol. Talked to Julian Cash
a little. He is very artistic.

Talked to Beth and Adrian, 2 bioinformatics perl chicks.
Beth runs Ubuntu on her Powerbook G4 and Adrian Debian on her Dell
notebook.

Taub dorm party.

Wednesday

Beth from Ithaca did a talk on database programming with postgresql.
She uses it for bioinformatics work.

Some type of skit about the power of the yapc perl t-shirt.

Town Hall meeting rehashing usual Perl adoption issues:

- lack of corporate funding
- lack of PR
- lack of mindshare vs other languages
- poor state of Perl distros, but in LSB

Richard Dice mentioned some feedback from Forrester Research.
Perl was rated ok overall, with a hole in WSDL support.

This YAPC finished about $6,000 in black.

Talked to Perrin a little about file uploads and memory leaks
on mod_perl. He recommended either using Apache api,
reading by row, or using exit in your Apache::Registry script
to force child to die after the upload.

Talked to a Larry a little about parallel programming
with Perl6. Seems like comprehensive support for features
found in other languages like Haskell.

Thursday

Damion was unable to come to YAPC this year, so Randal Schwartz filled in for him for 2 days in the Advanced Perl training tutorial.

Randal did a good talk on his page-by-page views on Damion Conway’s Perl Best Practises book from 9 am to 2:30 pm.

I tend to identify more strongly with Randal’s programming style than Damion’s in some cases. For example, I’m not a big fan of English.pm.

Randal delved in detail into some areas of Perl upon prompting by the audience. He spent some extra time talking about Perl control structures.