Archive for the ‘Perl’ Category

Upgrading Awstats and GeoIP

Sunday, June 17th, 2007

Just upgrading from awstats 6.4 to 6.6 tonite.

Time to also update the MaxMind GeoIP database. Actually there’s 3 free ones now: GeoIP.dat, GeoLiteCity.dat and GeoIPASNum.dat.

Not much new in awstats, although there are breakdowns by Linux distro and city.

I’m also using antezeta’s more comprehensive robots.pm file - seems to be about a year ahead of the file included with awstats, or about 15% more robots.

Awstats is a good log analyzer, though it doesn’t do clicktrail analysis like expensive commercial software such as Omniture can do.

Awstats still helps you spot:

  • Google AdWords click fraud
  • broken links
  • deep-linkers
  • abusive users and bots
  • downtime
  • search keywords
  • competitors viewing your site.

Awstats
MaxMind.com
MaxMind database downloads
antezeta.com: GeoIP Information for AWStats
antezeta.com: Enhanced Robots.pm for AWStats

MySQL Replication and Errors

Saturday, May 26th, 2007

MySQL LogoJust investigating ways to detect and fix replication errors on a daily basis - without reloading the slave. The database I am managing is large, but fortunately partitioned into lots of smaller, independent tables.

The most common error this year is malformed packets as the master and slave are in different data centers. Skipping that statement is often ok, with SET GLOBAL SQL_SLAVE_SKIP_COUNTER = 1.

I saw 2 talks by Baron Schwartz this year at MySQLCamp and the MySQL Conference, so I thought I’d look into his work.

I’m able to run his mysql-table-checksum-1.1.5 in ACCUM and BIT_XOR modes, but not CHECKSUM. I had to do some editing on the script, so it looks like it needs a little more testing.

Update: He’s fixed the ACCUM and BIT_XOR bind bugs.

Baron Schwartz’s work:
xaprb: Introducing MySQL Table Checksum
Sourceforge: MySQL Toolkit
Innotop

MySQL manual:
Replication Startup Options
SQL Statements for Controlling Slave Servers
SET GLOBAL SQL_SLAVE_SKIP_COUNTER
SHOW SLAVE HOSTS Syntax
CHECKSUM TABLE Syntax

sf.pm.org: Peter Thoeny talks about twiki

Wednesday, March 28th, 2007

TWiki LogoPeter Thoeny of TWiki and StructuredWikis gave a sf.pm.org talk tonite in San Francisco on wikis and knowledge management. About a dozen people attended. He said it was similar to a talk he gave at Google recently.

Peter is the original author of TWiki, based on the source of an older wiki project. TWiki is written in Perl and licensed under the GPL. He has done consulting for numerous Silicon Valley companies, installing and customizing internal wikis for companies such as Oracle and Wind River.

The TWiki project has 5 core programmers (committers), with many other people submitting patches, testing and writing plug-ins.

Peter is originally from Switzerland, but enjoyed working in Japan for 8 years, and moved to Silicon Valley in 1998.

He’s a typical European speaker, with a lot of slides (52) and a lot of writing on the slides. Very thorough.

The audience oohed and aahed when he demoed the form creation and editing feature of TWiki. It’s possible to prototype small applications using TWiki syntax.

A bank employee, who was required to attend, mentioned investigating wikis for corporate knowledge management and documenting source code.

His philosophy on access control to wiki pages is that if a user can see it, then they should be able to edit it. Also, all engineers in a company should be able to see all wiki pages related to engineering. Otherwise the benefits of knowledge sharing are lost when too many silos are erected.

Peter also talked a little about “situational wikis.” These are wikis or wiki sections created for special projects, used intensively, then kept online for historical purposes only after the project ends.

I used TWiki at Yahoo! and Cisco, but currently use mediawiki in a small company.

Another person who works at Yahoo! mentioned that they have finally upgraded TWiki from a 2000-ish version.

I would say that TWiki and mediawiki are the 2 most popular wikis in Silicon Valley, with TWiki more suitable for enterprise use when authentication is needed, and mediawiki being popular with groups already familiar with editing Wikipedia and who are not interested in lots of plug-ins.

We had a drink at Maxwell’s in the Hilton hotel afterwards. He said that he has worked with other scripting languages such as Python, but still appreciates Perl. His opinion of Perl’s OO features is that overall OO Perl is ok, but needing bless is quite odd.

2 attendees were looking for Perl work, and one person was looking for a Perl shopping cart programmer. Peter is hiring Perl programmers for a new stealth company he is involved in.

Thanks to Barclays Global Investors for hosting the talk and continuing support of sf.pm.org. I understand they are always looking for good Perl programmers who enjoy wearing a suit to work.

Randal Schwartz Non-Party in Milpitas

Tuesday, March 6th, 2007

Randal Schwartz threw a little legal victory party at Dave and Buster’s Great Mall, Milpitas tonite.

Several people showed up to celebrate with a beer or two, including Neil Bauman from GeekCruises, David Fetter, Jeffrey Thalhammer from Barclays, Mike Cheselka and others.

Randal was limping from a knee injury, but said he was quite busy with programming/consulting at his usual $clients.

Neil said he was very happy with running his ecommerce website on Postgres for the past 3 years, “rock-solid and no DBA needed.”

Jeffrey Thalhammer talked a little about his Perl Critic project on CPAN, which uses Adam Kennedy’s Parse, Analyze and Manipulate Perl (PPI) parsing module examine Perl source code for problems as identified by Damian Conway’s Perl Best Practices book. He said that PPI did a fairly good job of parsing Perl, but emits a fairly flat data structure in which it’s hard to identify problems in varargs calls, for example.

We also talked a little about Solaris vs. Linux, PHP vs. Perl, PHP 5.x vs PHP 5.x, and Perl5 vs. Perl6.

David mentioned the blog posting Unskilled and Unaware of It. He also mentioned turning over the unofficial Bay Area Postgres booster crown over to somebody even more enthusiastic about it. Sun has been sponsoring Postgres and including it with Solaris, prolly as a counter to Oracle.

I demoed my Blackberry 8700g and impressed a few people with how good midpssh and google maps mobile are.

I ordered the peppercorn sirloin with mashed potatoes and a greek salad. The food was fairly good actually, despite the acre of video games surrounding the dining areas. Price was under $30 including tax, and came with a $15 game voucher. The shrimp pasta looked good, too.

It’s a great venue for meetings since the staff has a great computerized billing system for split checks. Just go up to the terminal, claim your items, and they subtotal it on the spot. Try asking for split checks at Denny’s after 10 pm. D&B’s is located near the cinema.

Neil gave a me a ride back to South San Jose after the party. All I can say is this: be careful of the signage when transitioning from 680 in Milpitas to 280, since it is not clear how to end up on 280S near downtown San Jose. You can easily end up in Evergreen or back in Milpitas.

Browser File Upload Progress Bar Scripts

Saturday, December 9th, 2006

Just looking around at the latest in file upload progress bar scripts for web apps … seems like every site eventually needs an upload progress meter and in-page HTML WYSIWYG editing.

Apparently PHP makes it difficult to get access to the raw POST data to do a progress meter, so either a patch to PHP is needed, or a Perl helper CGI.

I assembled a File Upload Progress Bar Screenshot Gallery from most of the scripts listed below.

Commercial Multi-File Uploaders with Progress Bar

  1. SibSoft XUpload (Perl, cooperates with PHP) (non-Pro free, Pro $37 per domain) non-Pro demo Pro demo
  2. Encodable Industries FileChucker (AJAX) (Perl, cooperates with PHP) ($39/one server, $89 three servers) demo
  3. jupload.biz JUpload (Java applet) ($62 one server/$124 unlimited servers/$740 unlimited servers w/ source code) demo
  4. Motobit Huge ASP File Upload ($48 one server, $198 site license) demo (works ok on IE6, not on Firefox 1.5.0.8)
  5. Rad Inks Rad Upload Applet with Compression, Drag and Drop, and Recursive Folder support (Java) demo (free/$39/$49)

FOSS Uploaders with Progress Bar

  1. Uber-Uploader (AJAX Perl supports PHP) demo MPL
  2. tesUpload Progress Bar (AJAX) (Perl, cooperates with PHP, based on MegaUpload) author blog MPL
  3. raditha MegaUpload Progress Bar (Editions for Perl, PHP and JSP) demo broken as of 2006-12-09 author blog (not mod_perl safe) MPL
  4. devpro.it PHP upload progress
  5. pdoru.from.ro PHP Upload Progress Meter demo
  6. labs.beffa.org w2box - Web 2.0 File Repository for PHP (Perl, cooperates with PHP) demo
  7. Sean Treadway’s Rails Upload Progress demo

Perl CPAN File Upload Progress Bar Modules

  1. Apache::UploadMeter (mod_perl)
  2. Apache2::UploadProgress (mod_perl)
  3. Catalyst::Plugin::UploadProgress

Freshmeat search results for progress bar in category ‘Environment :: Web Environment’.

Please post a comment if you know of any more file upload progress bar scripts.

More Links

Dinke’s Personal Blog: PHP 5.2 upload progress meter Rasmus slides
XUpload vs. Uber Upload Technical Differences
Using PHP to handle file uploads is a bad idea.
digg discussion on file upload progress bars
http://the-stickman.com/web-development/javascript/upload-multiple-files-with-a- single-file-element/
http://ajaxian.com/archives/dojo-uploading-files-and-contents-with-ajax
http://ajaxian.com/archives/asynchronous-file-upload-with-ajax-progress-bar-in-php
http://ajaxpatterns.org/On-Demand_Javascript

Wiki Evaluation for Corporate Intranets

Wednesday, December 6th, 2006

I just spent a few hours looking at 3 major wikis: MediaWiki, TWiki, and MoinMoin.

Frankly, none look ideal for corporate intranet use.

However, there are export utilities available, so I’m planning to start with MediaWiki and see how it goes.

Wiki Evaluation Notes

MediaWiki

- PHP
- requires MySQL 4.0+ or Postgres 8.1+ database
- Intended for open sites (namey Wikipedia), not corporate intranets
- no wiki spreadsheet plugin
- easiest to use UTF-8 with MySQL 4.0
- authors want to re-write it
- large user base and publicity
- 60 languages supported
- largest userbase, though it only takes half an hour to learn a wiki syntax
- #mediawiki recommends using other software for intranet purposes.

TWiki

- Perl
- text files
- good for intranets
- has LDAP support with LdapContrib plug-in
- different syntax than MediaWiki, though has plug-in to support MediaWiki syntax
- limited language support (no Japanese or Russian yet, though EUC-JP should work)
- integrates with RCS but not Perforce yet
- has spreadsheet plug-in
- about 10 languages supported
- I used TWiki internally at Yahoo!, thought it was ok.

MoinMoin

- Python
- text files
- UTF-8
- can use external web server or internal Python http server
- good for intranets
- has a spreadsheet plug-in
- LDAP, authentication and ACLs

Can Your Programming Language Do This?

Friday, November 24th, 2006

Joel asks the following question, “Can Your Programming Language Do This?”, regarding functional programming constructs like map and reduce.

Joel, my primary language these days is Perl.

And yes it can. :)

Perl has anonymous functions and map built-in, as well as closures.

mysqlcamp Notes

Sunday, November 12th, 2006

MySQL LogoI attended mysqlcamp at Google in Mountain View.

(A “camp”, or “unconference”, is a recent West Coast trend for periodic informal technical conferences. The schedule has both scripted and unscripted sessions, and the speakers are not really separate from the audience. The organizer was Jay Pipes from MySQL AB.)

Friday

Opening Meet and Greet

I had already met many of the MySQL staff at various conferences, but most of the users were new to me.

At the Friday intro session, each of 200 people introduced themselves. Employees of Google, YouTube, LiveJournal, etc. attended.

Often attendees would throw out questions asking about what other people were doing with MySQL, or problems they have encountered and want opinions about how to solve it.

Bug #15815 was discussed, and Ken Jacobs, the Oracle InnoDB manager, talked about his role a little (Oracle employee #18, 25-year Oracle employee, former comittee member of ANSI SQL and TPC groups.)

Introducing mod_ndb, a REST Web Services API for MySQL Cluster

John David Duncan gave a talk on mod_ndb and REST. He’s now a MySQL AB sales engineer, but has a deep background in the database world.

mod_ndb is a 3,000 line Apache x.x module that allows direct access to NDB tables with GET, POST, DELETE returning JSON, XML, Raw, and ApacheNote formats. He wrote it to spread NDB outside the telco world to the web world.

After loading mod_ndb, other languages running inside apache like Perl or PHP should be able to just make a subrequest to call the mod_ndb interface.

Test Client JMeter 2.2 on XP, Test Server PowerBook 12″ 1.3

mysql 2250 pages/min
mysqli 2250 pages/min
mod_ndb 1650 pages/min on (non-persistent 75% Apache 1.3, 85% Apache 2, persistent 115%, embedded 156% 2500 pages/min)

(jdd did this to some extent because he wanted to experiment with sort-merge-join instead of old MySQL nested loop join. So have PHP do 2 big requests and merge them in 6 lines of PHP.)

mod_ndb allows a different kind of 3-tier architecture with browser JS as a 1st tier. However you need a security model that prevents injection and DoS, hoping that Apache user authentication and rate limiting can deal with it.

Some discussion of a memcached storage engine for MySQL by Brian Aker. JDD is not sure what MySQL adds in that scenario. Also, a patch for InnoDB-style row caching to MyISAM. Falcon is all about row cache. Also mentioned was using Linux 5 x nbd and “mounting” memcached or storage engines across network.

One inconvenience of the current interface is discovery, since the table definitions are hard-coded in the apache conf file. So you would need to publish a document on what’s available perhaps. or have mod_perl auto-configure the interface from metadata in a perl section using ndb_dest.

Lunch

Filet mignon, vegetables with gourmet veggies, salad, chocolates for dessert.

Talked to Monty a little. He’s getting over the flu, but happy to be doing more programming. Falcon is coming along.

Kevin Burton gave me a little demo of his memcached monitoring setup with rrd. He mentioned alternatives as ganglia and munin. BigTable is an ultra-scalable database.

Chip Turner, Open Source MySQL tools from Google

Download MySQL tools released by Google SVN Trunk

compact_innodb.py is an off-line packer. Written for older version of MySQL (4.0 or less) normally they fail over to another server first then compact it. Can boost performance on linear scans by 30%-40%. Alternatives are alter table to MyISAM and back, or ANALYZE table. Or use file per table with InnoDB to more easily recover space later.

Should track qps vs. io busy over time, do an OPTIMIZE when ratio falls.

mypgrep.py can list queries across databases and identify connectionless queries, for example.

Sheeri Kritzer’s blog

Paul Tuckhead is an Oracle DBA using MySQL who is joining Google. He said he has a script to run on the slave to parse the binlog, do SELECTs in 3 threads to prime the cache for the replication thread. Suitable for non-IO-bound situation. Lots of corner cases where both the SELECT and UPDATE blow out the cache.

Temporary tables can be troublesome for replication, even 5.0.22 with InnoDB. If the replication thread hangs, the temp table will still exist, meaning you can’t restart replication until you drop it.

DBIx::dwiw

DRRaw nice program for rrd/cacti

Most people seem to prefer manual failover in a master-master setup to prevent split-brain situations.

MySQL at Google

Steve Gunn and Michael Dickerson, Database Operations

Architecture Overview/Terminology

- partitioned on some key that’s useful
- shard is member of partition
- slims are read-only slaves
- wrote a client to do parallel dll and sql to cluster
- master can run out of buffer cache on large results and request dies, slaves die from binlog truncation

High Availability and Failover Strategies

- DNS-based
- cron-based python script scores load as green, yellow, red
- slave 5-10 seconds behind, can work on master if need transaction or no delay
- BigTable is more of a Berkeley store than ACID database
- db#-physical
- db#-logical, like db0-accounting
- never trust your glibc or your memory allocator. Fedora Core 3 is bad when doing static builds, do dynamic links. MySQL AB uses Suse glibc.
- needs to change MySQL privilege model or add roles, too much memory pressure with default system.
- be nice to have connectionless queries stop instead of running a long time
- they do explain on queries across the system to detect changes in the query plan overtime, drift
- lint checker for fk and schema

- ibbkp off primary, don’t trust replicated slaves yet
- have written library to do ordered and unordered checksums on table (start slave until, compare checksums)
- formal schema changes process

Mark Callaghan, InnoDB Scalability - formerly database internals at Oracle for 8 years, sorting, floating point, customer requests

- Google databases are IO bound
- looking forward to page compression in InnoDB in 5.1
- would like transactional counter back in InnoDB same as 4.0 so transactions on commodity boxen are recoverable

Had dinner with Jeremy 2, wife, Matt (WordPress) and his bizdev guy.

Saturday

Brian Aker, Clustered Database Approaches

- NDB
- Emic 2 would have problems with SP because of how they intercept queries …
- Solid and Falcon to in-memory transactions, good for web
- InnoDB uses disk more, good for data warehousing
- Amazon S3, use a trigger to store copy on S3, like archival or SoX data
- Every developer today should be asking: How am I going to partition and cluster this?
- would prefer if all the databases used the same binary protocol, since they do the same thing
- Tridge has a persistent memcached-like cache (cnet and facebook may have more than 1,000 nodes in memcached)

Architectural questions:

- how do I partition it?
- how do I avoid crippling it?
- what can I afford to lose vs. durability
- how do I minimize number of nodes?

- NDB good to 63 nodes, can accept table scans, while Solid will prolly be good to 8 nodes based on experience
- iSCSI 10 Gbps demos fast as FiberChannel

- One could handle a Slashdot-type situation with Amazon EC2 for example.

- immediate read after write makes horizontal scaling difficult. near synchonicity makes things very easy.
“Your comment will appear in a minute or so.”

- model for web start-up could be: start with 1 MySQL db with a memory table for sessions, grow to 2xDB and multiple memcached session caches

- Alexa does thumbnails with S3?

- think about architecture that works for both internal and external users, avoid balkanization

Baron Schwartz: How to use the innotop InnoDB and MySQL Monitor

- cool
- think about allocating more key buffers and assign queries to use that. can disable caching for logging table, for example
- install linux in solaris zone, setup slave, dtrace it
- tcpdump profile measurement - time inbound sql frame and outbound result frame
- EXPLAIN EXTENDED and SHOW WARNINGS
- dependent subquery
- MySQL does not give adequate performance counters or data

Sheeri Kritzer — Better Performance with Booleans (using bitwise operations)

- uses colinux on windows

- problem: given hair color, eye color, sex, status
- question: how to optimize performance in MySQL for general dating search?
- played with various MySQL binary statements

Jeremy Cole: MySQL Replibeertion: Replication; Uses, Performance, Problems, Brainstorming

Reasons

- backup
- scaling
- data warehousing
- multiple storage engines

Problems with MySQL Replication

- delayed
- masters don’t track slaves
- should checksum events
- nice to have guid for multiple masters

- Golden Gate $40,000 for MySQL - Oracle

- for unreliable network, slave reconnect default is 60 seconds. try master connect retry=1

Sunday

- 9 am Innodb session cancelled
- group photos with umbrellas in back of Building 40
- Brian Aker and myself got a tour of suspended Space Ship One mockup
- started MySQL Forge Data Warehousing Wiki page, in-person comments from Rick James and Frank Flynn, a former Red Brick user

- Jeremy Cole demoed his profile patch for MySQL, very nice: SHOW PROFILE …
- Stage 0 is checked-in, 4 more stages to go
- minimal performance overhead impact
- shows about a dozen query stages elapsed times in microseconds, option to show source function
- some day could have graphical tool

- talked about SQL optimization for Web 2.0 tag clouds (20,000 tags x 2 million uses)
- Jay Pipes showed a slide with his effort to use a derived query and LIMIT to prevent “run-away” large tag matches
- Monty mentioned that GROUP BY … ORDER BY NULL is faster when using GROUP BY, which by default does an implicit ORDER BY.
- he also mentioned a max-join-size mysqld option to return an error on overly large joins
- Brian Aker and Jay Pipes led a MySQL trivia quiz
- what is wrap behavior of enum
- what is procedure analyze

Lunch

Indian food.

ScaleDB Inc.

- trie data structure concept
- patricia tries (compact, used in routers)

Falcon Database

- in memory, commit to disk, similar to Solid
- better for OLTP than DW prolly
- tracks hot records in memory
- InnoDB is moving to fewer locks for OLTP performance, may slow down replication
- MySQL does benchmarking with Quest, can only make Open Source comparisons public

MySQL++

- been working 4 months on it
- complete do logs, great for data warehousing importing

memcached

- memcached table type is being added so it will have a MySQL interface
- kind of strange for some users, since no result for empty cache, could do stored procedure to hit real database
- toy compared to ndb
- key lookup only, no range queries without a patch
- use MySQL as a SQL language router with stored procedure or trigger to not change app
- Brad did a 1.2 release 2 months ago, mostly with FaceBook patches. smugmug is using it in production.

MySQL Enterprise Dashboard

- available to MySQL Network Silver Level and up subscribers
- many monitoring and alerting features
- ball matrix monitoring display
- very nice
- code name was Merlin

Requested MySQL Enterprise Dashboard Enhancements

- favicons specific to this product and alert level
- KB article integration with bugs
- dump variables into a bug report
- scale to hundreds of servers
- publish alerts as RSS feeds
- compatible email format to RT and Bugzilla
- some alerting is not helpfully implemented as of now. messages are not clear and hard to disable.

MySQL Binary End of Life Announcement

- 3.2 and 4.0 binaries already EOL’d, unless you purchase extended support, source available to satisfy GPL
- need to upgrade some classes of users, like large ISPs, cpanel providers, etc.
- notify others

Thanks to MySQL and Google for a great job organizing and hosting the event. Nice venue, fresh food, friendly security staff and conference photo policy.

MySQL Camp Unconference News