Archive for the ‘Open Source’ Category

OSCON 2008, Portland

Friday, July 25th, 2008

I attended the O’Reilly Open Source Conference, once again in Portland, Oregon.

Overall my impression was that the talks and vibe were oriented towards Web 2.0 primarily.

I would say that the talks were not as strong as previous years, but it’s easy to compensate for that with the “hallway track” and access to the original Open Source authors.

Several attendees used the EEE sub-notebook computer, and were happy with it as a email/browser tool.

Wednesday

PHP Taint Tool: It Ain’t a Parser

- CS’y effort at PHP parser for code analysis, reminds me of early days of Perl’s B tools
- not suitable for end-users

Write Beautiful Code (in PHP), Laura Thomson, Mozilla

- good general background on good programming practises
- not a lot of specifics about PHP, but available for questions

Hypertable, Doug Judd, Zevents

- HyperTable is a clone of Google’s BigTable, from public paper
- room was packed, some turned away
- still alpha, maybe beta in August
- preferred distributed filesystem is HDFS, works with others
- I recommend reading web site and then looking at the curt slides
- plans to do benchmarks with same hardware as Google has published.

Open Source Virtualization for People Who Feel Guilty About Using VMware So Much, andy michelle, EDA

- cute talk about VirtualBox, Xen and VMware
- Xen has weird nomenclature compared to other tools
- VMware wins on tools and polish
- showed screenshots of unreleased and alpha mgmt. tools.

Barely Legal XXX Perl, Jos Boumans, RIPE

- stunning and twisted example of overloading, short-circuiting, import-faking, whatever it takes to make a loaded module do something other than intended
- illustrates great flexibility of perl, for good or ill
- could be useful for things like testing harnesses, etc.
- motivated to win bet of $100 or 1 vertical meter of beer
- said it took 3 or 4 hours to complete.

I walked around the exhibits area.

Got a demo of Atlassian’s continuous integration (CI) tool, Bamboo. They’re also the vendors of JIRA issue tracker and Confluence wiki, which I’ve used before.

One company had a public Wii game happening.

Thursday

Scaling Databases with DBIx::Router, Perrin Harkins

Ultimate Perl Code Profiling, Tim Bunce (Shopzilla)

- talk and screenshots about NYT perl profiler


The New York Times Perl Profiler

Top 10 Scalability Mistakes, John Coggeshall (Automotive Computer Services)

- good overview of writing high-performance, maintainable Internet systems
- interesting opinion that scalability is not just about increasing performance. scalability can be about scaling up or down, performance or maintainability, etc.
- recommended php.ini settings list

Perl Lightning Talks

- popular with audience, attendees seemed to like all the talks
- Mail::ESMTP looks very interesting for testing and production

Code is Easy, People are Hard: Developing Meebo’s Interview Process, Elaine Wherry (meebo)

- struggled to find time, right approach to interview new candidates in 1996, likely at behest of VCs
- external recruiters hit-and-miss, conferences and jobs email link useless
- phase where non-founder employees doing interviews wanted a founder involved in interview process
- trying to preserve culture (finger rockets, social networking, 2 female founders, etc.)
- came up with process involving reading resumes, phone screens, and office “sim” that adds a new candidate within 3-6 weeks
- “sim” has 3 versions: office manager (plan to erect a meebo office sign), front-end engineer (write a JavaScript app), and back-end engineer (write a server) in 4 hours
- current goal is to keep interview time down to 8 hours per candidate over 10 days
- now up to about 40 employees
- my feeling was that their hiring process started off clueless due to inexperienced mgmt. and is still oriented towards junior engineers. Silicon Valley is full of expert engineers and it doesn’t take 8 hours to interview them.

BOF

mysql-sandbox

Giuseppe Maxia discussed and demoed his very useful mysql-sandbox utility for managing several versions and instances of MySQL on the same machine.

He wrote it for his testing work at MySQL AB. Very well received by attendees. This is a great example of what I call “anti-virtualization” - using ports instead of resource-intensive VMs.

MySQL Conference 2008 Presentation

State of the Onion Address, Larry Wall

- talk about Perl6, random anecdotes, etc.

Friday

Open Voices, Jim Zemlin (The Linux Foundation), Keith Bergelt (Open Invention Network), Karen Sandler (Software Freedom Law Center), Phil Robb (Hewlett Packard)

- panel discussion of various free software efforts, some little-known

An Illustrated History of Failure, Paul Fenwick (Perl Training Australia)

Paul gave an interesting talk on notable Software Failures and estimated a price tag for each. I had heard news reports of many of them, but it was interesting to hear an updated analysis of what really happened behind the scenes.

Thanks to Google for sponsoring the fairly good almost-gourmet lunches. Sure beats the O’Reilly lunchbags from the dot bomb days. (Everybody I know bailed and found a subway shop back then.)

Notes

- Burgerville popular with attendees, can upgrade combos to a shake.
- Red Lion hotel has a small cardio gym with 1 universal machine, no free weights, open til 11 pm
- WiFi password changed weekly, in middle of remodel, lobby just finished.
- There is a 24-Hour Fitness that is actually open 24 hours near downtown Portland. Has basketball court and 2-lane pool. $15 for non-member visitors.

OSCON 2008 Presentations

YAPC 2008 Chicago

Friday, June 20th, 2008

Once again I attended the Yet Another Perl Conference (YAPC), and again it was at IIT in Chicago (same as in 2006.) Josh McAdams and his wife did a great job organizing the conference.

YAPC is an affordable ($100 conference fee) organized by volunteers for The Perl Foundation (TPF).

I’m already an experienced perl programmer, but perl is a vast programming environment and one can always learn more about techniques or available modules.

After the 3-day YAPC, I went to the 2-day Perl Catalyst framework class.

Overall, I would say that the talks were not as technical as previous years, but with 3 tracks there was always something interesting.

Many people make up their own “hallway track” anyway, since most of the perl heavyweights come each year and are very accessible.

The IIT dorm was only $60/night, but even that was over-priced. Some investment is needed in maintenance, and the attendants need to actually hand out linens and control the AC next time.

Although there was supposed to be an online form to add cash to the access card, one has to go to 201 Hermann Hall while they get organized.

Here’s my notes on some of the events that were memorable:

Monday

Tiny Modules, Adam Kennedy

- no dependencies on other modules
- fast to load
- fast to run (near real-time)

Config::Tiny (popular module)
XML::Tiny
Object::Tiny
Date::Tiny

Moving to mod_perl2, Jim Brandt

- Apache2::Compat can be used for backward compatibility
- some methods have different arguments now though
- loads everything, so uses a lot of memory
- slower because some code is now Perl instead of C
- content_language, write_client, send_http_header, get_remote_addr, etc.

Porting Tools

- Apache2::Reload
- Apache2::porting

Also read your error log and the Migration manual.

Apache::Registry is now Mod_perl::Registry

Photo Processing for the Web, Kent Cowgill

kentcowgill.net

- bunch of stuff for managing cell phone photos
- speaker talked about various image processing and mgmt. problems with his old nokia cameraphone
- embed iso in a pdf
- bought a real camera, problems went away

PAR+FUSE+PDF, Chris Dolan

Tuesday

HTML::App Framework, Jim Krajewski

Catalyst, Matt Trout

- a profane overview of handlers
- 490 CPAN Catalyst modules

Catalyst Downsides

- need packager for catalyst apps
- attribute syntax
- unaccelerated CGI not great (lack of persistence, slow to start)?

Dinner and Auction

- quite a variety of food: mediterranean, italian, indian, american
- dozens of books and t-shirts to bid on
- Wii games

Wednesday

Perl Lightning Talks

swish-e

- command line search tool
- now has perl interface, solid

cons

- no utf8
- not pi
- no index updates
- swish3 should fix that

joshr.com/src/docs
linux journal

where2getit.com
- AJAX maps with mod_perl
- openlayers, prototype, scriptaculous
- rewrote 100kloc old perl app into 22klog perl plus JS

Chemchains Sandbox

- boolean logic to understand and visualize myriad possible chemical reaction pathways

Math::Combinatorics

- works at bookfinder.com
- generate test data on authors using perl, then test clustering techniques

Devel::Cover::TT

Ingy strip show

Do You Believe in the Users?, Brian Fitzpatrick and Ben Collins-Sussman

- slide deck suggesting that developers focus on end user experience
- interesting graphical line added to most graphs accounting for programmer pain/cost

The Perl Foundation (TPF) Keynote, Richard Dice

Nokia 810

I talked to a fellow field-testing a Nokia 810 and keyboard as a notebook replacement before his next trip. He seemed pretty happy overall. He said he had to do a couple days of setup to get it working to his liking.

Thursday and Friday

Catalyst Class by Jonathan Rockway in association with Stonehenge

- Jon wrote a book on Catalyst and is a core catalyst programmer, less active at committing now.
- class actually a busy 2-day lab, not a lecture
- install Catalyst from CPAN (65 minutes!)
- also went over DBIx and sqlite
- modify various sample programs, like a small wiki and address book.

Thanks to the many corporate sponsors.

DRBD and MySQL: Just Say No

Sunday, April 20th, 2008

I’ve successfully used MySQL statement-based replication for several years across data centers and understand it’s quirks.

While at the MySQL Conference, I tried to see how DRBD could help the installations I manage, but I just can’t drink the DRBD Kool-Aid.

MySQL Replication Pluses

  • Free
  • Easy to setup if you already have a backup and master position
  • No shared storage to manage or corrupt
  • Light network load
  • Can use master for r/w and slaves for r.
  • can do maintenance on slave (ALTER TABLE, etc.) and failover afterwards
  • works well across Internet even with high-latency
  • many replication problems simple and hand-fixable

MySQL Replication Minuses

  • Slaves can/will get out of sync with the master, typically noticed after a few weeks or with Maatkit
  • Changing masters requires rebuilding slaves
  • There is always some replication lag when there is a busy master
  • no checksums or 2-phase commit

DRBD is a low-level driver to copy a disk partition in near real-time from a master to a failover node (cold standby.)

MySQL with DRBD Pluses

  • Free
  • No fsck or transaction log replay needed if manual failover.
  • Slaves don’t need SET MASTER updated unless DRBD fails.

MySQL with DRBD Minuses

  • DRBD partition corruption means failover node would be unusable (disadvantage of shared storage) and failback could destroy original master too.
  • if the master panics, then after failover both fsck and transaction logs replay must be performed
  • more work to setup initially than statement-based replication
  • NIC and network corruption is also propagated.
  • Failover node is a cold standby, cannot accept database traffic if that would change the DRBD partition
  • Could generate a lot of network traffic.
  • cannot do maintenance on cold standby database
  • 2 heartbeats needed on a reliable, local network

I can see how MySQL/DRBD would be appealing for those who operate on a reliable network and don’t need Master-Master for load or maintenance, or who have many slaves that cannot easily be rebuilt.

Eric Bergen: DRBD in the real world.

MySQL Conference 2008

Thursday, April 17th, 2008

I attended the MySQL Conference once again at the Santa Clara Convention Center.

Despite the January purchase by Sun, the conference had the same great vibe as usual, and everybody showed up again.

Top Conference Themes

Some of the conference themes I noticed are:

  1. Linux LVM snapshots are now popular for MySQL backups. Snapshots have long been used in enterprise IT, now it works well and for free on Linux. Another use for a snapshot backup is to copy a busy master offline for comparing to a slave with Maatkit mk-table-checksum.
  2. DRBD is popular for HA (one speaker wondered if his talk about DRBD was still relevant since everybody was already using it), but I see some drawbacks.
  3. developers are concerned with supporting massively multi-core CPUs in both database and cache code. The Sun Niagara multi-core architecture seems to be the future, with 128 or more threads.
  4. databases and storage are quickly increasing in size, so many DBAs are interested in MySQL 5.1 partitioning and other tools and techniques.
  5. cloud computing is entering common use with lots of Amazon EC2 and S3 users. mosso.com has been available for a couple years from Rackspace, and Google and IBM are entering cloud computing. Users would like more competition to reduce prices and improve reliability, SLA or not.
  6. most companies say it’s hard to find experienced MySQL DBAs, but most are lazy when it comes to training and compensation.

Top MySQL Conference Tips

  1. Disable swap if your version of Linux supports it (most do.) This avoids getting some of MySQL’s pages swapped out and crippling the box with IO.
  2. Use memcached. MySQL User Defined Functions (UDFs) for memcached are now available to auto-populate memcached from MySQL statements.
  3. Consider multi-level partitioning schemes with MySQL 5.1, like combining RANGE and KEY.
  4. Consider MySQL High Availability (HA), either with replication, DRBD, or both. Linux HA project.

Top MySQL Conference Misconceptions

I had to straighten out a lot of newbs:

  1. statement-based replication is not reliable, does not have checksums or two-phase commit, and masters and slaves tend to diverge over time
  2. many novices believe Innodb does row-locking only, but often does range and table locking
  3. mysqldump is ok in most cases, but you have to be careful with locking and locktime, matching charsets, and testing the dump.

Here’s my notes on some of the talks I attended. (In case you haven’t read my blog before, I’m a long-time user of MySQL, replication, LVM and memcached. I have not used DRBD.)

If you have a correction or improvement, please leave a comment and I will update this blog entry.

Monday (Tutorials)

Building Scalable & High Performance Datamarts with MySQL, Tangirala Sarma

Discussed general DW concepts at first.

3 main requirements for a succesful DW project are:

  1. good data quality
  2. the right tools
  3. phased results.

Talked about various partitioning schemes in MySQL 5.1. I’ve used it for about 6 months in one project, but most of the audience was new to MySQL partitioning and struggled to understand beyond RANGE partitioning for logging it seems.

Later talked about MySQL-particular DW aids, including:

  • Kickfire appliance, which has capabilities such as column-store, compression and fast loading.
  • Infobright, which also has column-store and fast loading.
  • Nitro

His recommended references are:

  • DW Toolkit, Kimball on Amazon.com
  • Enterprise DW with MySQL, MySQL AB
  • MySQL Roadmap 2008-2009, MySQL AB.

Also, there’s a list of books here:

DW and BI Starter Books

Queued up for sandwiches and salad. Not really a surprise with O’Reilly as the conference organizer, expected more.

Ate lunch with the DRBD programmers. They said that MySQL AB now provides 1st- and 2nd-level support, and their company provides 3rd-level support and cashes checks. :)

Memcached and MySQL: Everything You Need To Know, Brian Aker (MySQL), Alan Kasindorf (Six Apart)

Very detailed talk about tips and issues with using memcached.

  • Evolving online notes
  • Brian wrote about 30 man pages for memcached, edited by Mark Atwood. Unusual amount for an Open Source project.
  • memcached is very handy for people stuck with databases on 32-bit systems and a lot of otherwise unaddressable memory.
  • Patrick from Grazr has written MySQL UDFs to populate memcached, has a SoC student. Pipelines cache inserts. Handy for distributed DCs already using replication, triggers.
  • Postgresql has pgmemcache()
  • lighthttpd has mod_memcache, prolly url hash key
  • Apache has mod_memcached with CAS, GET/PUT/DELETE, still alpha, try at pandoraport.com
  • limits: key size 250 bytes, data size 1 MB, 32/64 bit limits
  • threading is new based on giant mutex, bad for more than 8 cores
  • stop swapping with -MLOCKALL, noswap, sizing
  • stats sizes to test efficiency
  • command line option to disable LRU
  • CRC most consistent hashing, normal
  • ring consistent hash
  • IP takeover
  • bad switches or intermittent network is very bad
  • pick a driver than can do multiget
  • ghetto lock
  • Tim Bunce’s Cache::Memcached::Libcache does not do Storable, which is what most people prolly want
  • time in seconds < 30 days is relative, > 30 days is absolute
  • namespace trick: versions in key name
  • uint32_t type parameter usually indicates whether compressed or storable, can be used for anything though.
  • memcached_tool: memcp, memrm, memstat, memslap (load testing)
  • showed Mixi MRTG graphs: 6800 reads/second, 200 servers, no CPU load
  • Brian added IPV6 support after mysqld update (pet project, but helped with multi-interface support and to optimize out name resolution)
  • binary protocol code available, not merged yet, helps with multi-byte charsets than embed spaces or newlines which break the text protocol
  • improvements needed are durable to disk, highly-threaded
  • persistent connections are good and recommended
  • UDP alpha code available, good for lots of sets
  • storing BLOBs with MogileFS or LUSTRE good
  • speakers did not have experience with commercial caches, but did say that most people find Java caches often too featureful and slow
  • MogileFS, Hypertable, Hbase interesting

Tuesday

EXPLAIN Demystified, Baron Schwartz (Percona Inc.)

Room was packed for this talk. Good step-by-step talk for understanding EXPLAIN better.

Replication Tricks and Tips, Lars Thalmann (MySQL), Mats Kindahl (MySQL)

Some good tips, but overall an assumption is made by MySQL AB that a MySQL master and slave actually have the same data.

  • mysqlbinlog has –hexdump option for seeing byte-level dump
  • examine both binlog and relay log when debugging replication
  • you can clone a slave from another slave if you trust it - just do STOP SLAVE, SHOW SLAVE STATUS, shutdown, copy over the files, SET MASTER, and START SLAVE

Dramatically Improving MySQL Database Performance in Data Warehouse Applications, Martin Farach-Colton (Tokutek)

His view is that the storage engine is the primary bottleneck in BI systems for loading and search, and showed how to use a B-tree to organize data to guarantee the maximum performance in a growing DW. (Not sure why the online summary talks about fractal trees.)

How to Achieve Operational BI on a Budget, Lance Walter (Pentaho Corporation)

Kind of overly Pentaho-product oriented, since Lance is a product manager for Pentaho.

However he made a useful distinction between historical and operational BI.

Historical BI is mainly reporting on what happened before yesterday, and operational is up-to-the-minute for business process analysis and improvement.

By designing your BI system for both historical and operational requirements, you can get both at the same time.

Interesting case study of the US Navy doing BI on pilot training and operations to reduce accident risk

MySQL Backup BOF (hosted by Zmanda rep)

- admins attending generally unaware of LVM, but awareness growing
- one guy split his database across multiple databases for easier mgmt. and uses a HP SAN with 1 TB of RAM, very happy with IO performance, not so happy with price.
- one guy using 100 EC2 instances and S3. ok except for recent outages, maybe a little pricey.
- one guy using dd for fast network copies
- Zmanda just integrates existing techniques and allows scheduling, but prolly quite useful for inexperienced DBAs to do point-in-time recovery and schedule backups.

MARIA BOF

Hosted by Monty with his usual wicked Finnish black vodka.

He said Falcon was supposed to take 3 months, but slipped, so work started on MARIA to replace MyISAM. He has made promises to deliver a working storage engine, so will continue and do so and prolly release MARIA in 6 months.

All of the MARIA programmers were required to read Jim Gray’s textbook Transaction Processing: Concepts and Techniques on Amazon.com, and each have said they understand it.

I got the impression that MARIA should end up with a cleaner codebase than Innodb.

Discussed table-level checksums for replication checking, which he is planning to do, might be an option. Still time to decide whether to use CRC32 or another algorithm.

One of his programmers gave a demo of yanking power on MARIA-current on his notebook computer, though I didn’t notice what the outcome was.

He said that ALTER TABLE optimizations are planned, including instantly dropping an index without copying the table, though add would still require copying the table.

Another conversation could be paraphrased as, “After MARIA, he will work with the Sun team that fixed Postgresql threads to fix various thread scaling problems in MySQL.”

Sphinx Search BOF

Hosted by Peter Zaitsev and Andrew Aksyonoff, the author.

Percona has used Sphinx Search in a few projects for web forum searching for the past few years. Separate index server is recommended.

About a dozen people talked about full-text search requirements and experience with Sphinx. One guy was spending a lot of money on encad(?) and didn’t want to spend more on a bigger license later.

Wednesday

MySQL Performance Under a Microscope: The Tobias and Jay Show, Tobias Asplund (MySQL), Jay Pipes (MySQL)

Presented slides on the performance of various workloads.

The MySQL Query Cache, Baron Schwartz (Percona Inc.)

Excellent in-depth discussion of the query cache.

Grazr: Lessons Learned Building a Web 2.0 Application Using MySQL, Patrick Galbraith (Grazr Inc.), Michael Kowalchik (Grazr Corporation)

Deadlocks, Wait Timeouts, and Other Transaction Issues, Jess Balint (MySQL)

Thursday

DTrace and MySQL, Ben Rockwood (Joyent Inc)

Good talk on using DTrace specifically with MySQL, mainly query debugging.

Scaling with MySQL using Materialized Views and a Shared Everything architecture, Moshe Shadmon (ScaleDB)

Listed 3 ways to do materialized views, but dwelt on their ScaleDB cluster product mostly.

High Availability MySQL with DRBD and Heartbeat: MTV Japan Mobile Services, Patrick Bolduan (MTV Networks Japan KK), Yoshinori Matsunobu (MySQL)

Talked about setting up HA using heartbeat, pingd and DRBD on a mostly-reads 5 GB CMS db. Used Enterpeise MySQL distro and support. No replication involved, likes mysqldump. Happy with MySQL 5.0 and Unicode with Japanese. Cute slides with Japanese maru symbols, etc. DBRD staff were on-hand to help with more difficult questions. Said LVM can be used under or over DRBD.

The Science and Fiction of Petascale Analytics, Jacek Becla (SLAC)

Talked about petabyte and exabyte DW for physics and astro programs. Contrasted science and industry PB databases (Google, MSN and Yahoo! likely each have 100 PB databases, but don’t disclose the size.) 5 years to plan DW for next experiment.

Spent 10 minutes with Rohit Nadhani and his programmer from Webyog looking at their MonYOG 2.01 version. Provided UI feedback for database operations use based on several months usage. Looking good.

Talked with Patrick Bolduan, somebody from DRBD and a NY Times IT guy who uses EC2 in the lounge. Apparently S3 can lose up to 10% of insert requests. Everybody is looking forward to when Amazon EC2 and S3 have competition for both pricing and reliability improvements.

Conference Evaluation and Recommendations

I go to a lot of conferences as a paying attendee, so I usually provide feedback to the organizers to help them improve the experience.

I mentioned to Jay that overall the conference was fine:

  • Talks were good, maybe less technical than previous years. Jay said Sun wanted more talks for novices since 2007 was too hardcore for some new attendees, but O’Reilly did not want to ghettoize newbs with a single track and room. For sure 2007 had too many sharding talks, mostly with users stuck on 4.0. There’s always the hallway track with developers anyway.
  • Food was ok but not great, although O’Reilly, the conference organizer, slipped in sandwiches for lunch on tutorial day. (SCCC is in an isolated location, so food is a big deal.) Still way better than OSCON in recent years. I guess the original MySQL conferences at the DoubleTree spoiled me.
  • Some of the vendors got too salesy in their presentations, but it’s hard to crack down on sponsors. Pentaho and ScaleDB come to mind.
  • conference still provides full access to MySQL managers and key programmers, in the best Open Source tradition
  • Still need a big-iron room with functioning demo SANs and HA setups, as I’ve suggested for a few years.

MySQL Conference 2008 Presentations
TechCrunch: Rackspace Offers Cloud Computing with Mosso
cnet.com: Is cloud computing more than just smoke?

linux iotop

Sunday, February 17th, 2008

Since IO accounting was added to the linux kernel in 2.6.20, it’s been possible to examine IO per task.

Guillaume Chazarain’s iotop.py takes advantage of that to show disk IO in a format similar to the venerable top program.

It’s unlikely yum install iotop will work on your older linux distro yet though …

Prerequisites for running iotop.py are Python 2.5+ (to preserve your existing python install, do make altinstall) and linux kernel 2.6.20+ with IO accounting enabled (TASKSTATS and TASK_IO_ACCOUNTING.)

make menuconfig

General setup --->
[*] Export task/process statistics through netlink (EXPERIMENTAL)
[*] Enable per-task delay accounting (EXPERIMENTAL)
[*] Enable extended accounting over taskstats (EXPERIMENTAL)
[*] Enable per-task storage I/O accounting (EXPERIMENTAL)


linux iotop program

iotop.py accepts several command line options for filtering, including PID, user and process/thread-view. Once the program is loaded you can use the keyboard arrow keys to change which column gets sorted.

Guillaume Chazarain’s blog
Linux::Taskstats::Read Perl module
Tricks to diagnose processes blocked on strong I/O in linux
DTrace iotop
DTrace iotop samples
Running DTrace from Solaris Against Linux in Brandz
Fedora Daily Package: iotop - Display I/O Activity by Process

Twiki Meetup in Santa Clara

Thursday, November 29th, 2007

There was a meetup tonite for Twiki users from 5:30 pm to 8:00 pm at the Plug and Play Tech Center, 440 North Wolfe Road, Sunnyvale, CA 94085.

I attended most of it, though 5:30 pm is pretty early for most people to leave work and drive there. Nonetheless, turnout was good, with over 30 audience members plus staff from twiki.net, the company providing support for Twiki.

The format was a slide show, followed by a very energetic community evangelist who got the audience involved.

Several members of other local Bay User perl and linux user groups dropped in.

Twiki is notable in offering many plugins that combine to create a very feature-rich wiki. For example, it’s possible to embed twiki spreadsheets, forms or do programming in twiki pages.

I’ve used twiki, confluence, mediawiki and trac. I’d say twiki is my favorite for complex wikis.

The pizza was not great, although it was nice of them to serve both soft drinks and wine.


Plug and Play Tech Center, Sunnyvale

IMUG: OpenOffice.org Internationalization (I18N) Framework

Thursday, November 15th, 2007

Karl Hong, from Sun., gave a talk on the OpenOffice.org Internationalization (I18N) Framework at IMUG.

Although ICU is a comprehensive i18n API today, it was not mature enough for most of OpenOffice’s development history. So at this point OpenOffice is about 90% custom i18n API and 10% ICU. George Rhoten, from IBM, was available to provide commentary on areas where ICU had matured, for example the calendar features and transliteration.

Karl mentioned that a lot of features in OpenOffice were added to remain competitive with features in Microsoft Office. Unicode normalization has not been a requested or supported feature yet.

After the talk, Karl showed a live demo of OpenOffice rendering and transliterating Simplified and Traditional Chinese. He also showed mixed Chinese and Hebrew BiDi issues. The Japanese search dialog in Impress is pretty amazing to look at.

Thanks again to Apple for hosting the meeting!

Ghetto MySQL Innobackup with rsync

Saturday, November 3rd, 2007

I was reading an interesting samba mailing list comment about using rsync on live MySQL databases.

The author said this:

“Assuming a short break in accessibility is tolerable, I’d

  1. run rsync to the backup
  2. stop the server
  3. run rsync to the backup (should be much much faster now)
  4. restart the server.”

Combining rsync and mysqlhotcopy we can get a little fancier:

Ghetto Innobackup-style backup with rsync

  1. STOP SLAVE; FLUSH TABLES
  2. run rsync to the backup
  3. FLUSH TABLES WITH READ LOCK; SHOW SLAVE STATUS;SHOW MASTER STATUS
  4. run rsync to the backup (should be much much faster now)
  5. UNLOCK TABLES
  6. START SLAVE

Note that the read lock and unlock must be done while on the same database connection, and innodb continues to update indexes even when read-locked.

Also, record the master and slave status values. They may be very useful later if you want to apply binlogs to the backup, or initialize a slave.

This technique would be very suitable for non-critical snapshots like QA copies and on quiet databases.

It may be suitable for busy databases if other methods aren’t working out, for instance you don’t have LVM snapshots setup and innobackup is locking your MyISAM tables too long.

rsync -a is also useful for backing up master binlogs every 5 minutes on a live site. Normally you’re better off setting up a slave just running the slave IO thread, though.

Many databases have features to allow “log shipping.” With MySQL, similar functionality is accomplished by doing FLUSH LOGS and rsync, or using replication (there is a command to not execute the replication stream, just save it to disk.)

FLUSH NO_WRITE_TO_BINLOG LOGS
FLUSH TABLES WITH READ LOCK