SphinxSearch Day in Santa Clara 2012

SphinxSearch Logo“Sphinx Search Day and SkySQL & MariaDB Solutions Day and Drizzle Day” was held today in Santa Clara.

I couldn’t make it this time, but wanted to collect the links in one place for future reference.

(I did attend Andrew’s SphinxSearch talk at the MySQL Conference yesterday, and I will be updating my installation of SphinxSearch to enable real-time updates this week.)

The Sphinx Search Day in Santa Clara USA
#sphinxsearch
SphinxSearch Documentation and Talks Links

Posted in Cloud, Conferences, Linux, MySQL, Open Source, Oracle, OSCON, Perl, Storage, Tech, Toys | Leave a comment

Percona Live MySQL Conference And Expo 2012

MySQL LogoThe MySQL Conference was held once again at the Santa Clara Convention Center, but hosted by Percona this year. Percona did a great job of organizing the show on fairly short notice, with a good program, full exhibits area, and a few thousand attendees.

(The rumour I heard was that O’Reilly declined to host the conference again without promotional money from Oracle, and Percona decided to go it alone without MontyProgram. Oracle even refused to send speakers.)

My favorite talks were Vadim’s “SSDs and MySQL” lecture, and Giuseppe’s “New replication features in 5.5. and 5.6″ lecture. Hopefully they can update them and present them again next year.

Some suggested improvements for next year:

  1. BoFs should be two per nite, starting at 6 pm or 7 pm, not one at 9 pm
  2. every year, I suggest that there be a room for storage and SSD vendors to demonstrate systems
  3. after the closing keynote, possibly have a hacking session area.

Tuesday Exhibits

Caught up with Monty. Shiny new Nikon 1 camera.
Showed Greg L. and Yngwie my RHEL/CentOS/SL graph.

Tuesday BOFs

Query Optimization

Rick James from Yahoo! talked about improving performance with MySQL.

We used to work in the same group at Yahoo!, so he played good cop and I played bad cop. :)

The audience had some interesting ideas:

- try using SSL with replication to reduce statement corruption
- monitor query cache for problems like fragmentation. Consider using flush query cache periodically if it’s an issue.
- use partitions if doing heavy deletes periodically. Ie. Delete 10 million rows daily or weekly. Maybe partitions will help?

Wednesday Keynotes

The MySQL Evolution
Peter Zaitsev and Baron Schwartz, Percona, Inc.

Making LAMP a Cloud
Marten Mickos, Eucalyptus

- the usual Marten talk (stroll through memory lane, capped by how he’ll make money off Open Source)

The New MySQL Cloud Ecosystem
Brian Aker, hpcloud.com

Best quote:

Multi-tenancy – Your neighbours can throw a party anytime. (And your mom won’t be able to see your online pics.)

RedDwarf is an AWS RDS clone.
HP’s RDS based on Percona 5.5. Sweet.
REST interface, cloud staff not allowed to login. Icky.

Wednesday Exhibits

STEC had a DBT2 demo in their booth.
Introducing EnhanceIO SSD caching software soon.

Wednesday Talks

SphinxSearch
Andrew Aksyonoff, Author

LIKE – that’s not really search. It’s grep on your database.
Database full-text search is a bastard child. No knobs ie. for short words

Free/Open choices

Solr and ElasticSearch based on Lucene
Mnogosearch, xapian, ferret, zettair

Keyword search is now a commodity but …
- maintenance, performance, scalability are still important considerations

Expect 300 qps/core for 1 million documents, 1.3 GB, prolly comparable to SOLR

For large sites, indexing speed is more important

Some use cases:

- 200 servers
- CL over 200,000,000 queries per day
- 30 billion documents about 50 TB
- Wikipedia on a 64 GB cell phone

Thinking about auto-cluster configuration but Skynet is not reasonable. Some assistance needed.
Relevance should be more than word frequency. Sphinx has a dynamic expression based ranker.

Wednesday Lunch

- same table as Baron, so talked about who’s maintaining Percona Toolkit (2 employees now: Daniel Nichter and somebody in S.A.)

Innodb Compression
Facebook

- mlog
- seems complicated
- what happens if compression algorithm is changed later and you’re applying old logs?

Performance Instrumentation
Peter Zaitsev

- Facebook PHP profiler Open Sourced but no longer used
- if you use slow query log, log all
- use apache customlog format and load into mysql

What’s New In MySQL 5.5 and 5.6 Replication
Giuseppe Maxia, Continuent, Inc.

5.6 Global Unique Transaction IDs

Must enable following 4 options on master:

- log-bin
- log-slave-updates
- gtid-mode=on
- disable_gtid_unsafe_statements

Not allowed:

- CREATE TABLE SELECT
- MyISAM statements, including on mysql.*, except GRANT/REVOKE statements
- automatic tool included with MySQL Workbench now to check for safe statements.

- MySQL 5.6 is alpha quality for these features.

Production War Stories: Lessons from the Frontlines

- zuora.com

- pt-upgrade
- pt-table-checksum

Peter’s comments:

- Good idea to QA new mysql release in case of a reserved keyword error, slow replication, etc.
- mysqldump and restore still necessary for new features like barracuda or innodb file per table.

- sugarsync.com

tcpdump -i bond0 -s0 3306 ...
pt-query-digest
pt-log-player

Database was 2 TB using 32 GB ram and lots of disks.

Using 4x memory with Innodb compression and 8 KB page size and slower disks resulted in 4x improvement.

Lightning Talks

Schemadoc

- schemadoc can be used to create javadoc-style listings from MySQL schemas and comments

Continuent, Robert Hodges, CEO

- airline overbooking example
- Business processes are eventually consistent if you think about it

NuoDB, Robert Buck

Tokutek

- funny comedy sketch with Mr. Bill (in costume) as a database getting physically compressed, replicated, etc.

Wednesday BOFs

SSDs and MySQL
Vadim T, Percona

- about 30 people attended, including 4 SSD vendors
- best war story: a Softlayer customer lost his RAID 10 databases when Crucial M4 expired after 5000 hours
- one non-Facebook person was using FlashCache
- one person wondered about using virtualization with many databases on flash

Thursday Keynotes

MySQL: Still the Best Choice for Mission-Critical Data
Sam Ghods (Box)

What Comes Next
Mark Callaghan, Facebook

“If my SSD has 100,000 IOPS, and MySQL only uses 10,000 IOPS, can I get a discount from my flash vendor?”

Future Perfect: The Road Ahead for MySQL

To some extent this panel got high-jacked by a security vendor. Certainly HIPAA users always want more auditing features, but I feel the existing MySQL GRANT system is fine for 99% of installations, and as the first Internet database, is more network-hardened than Oracle or SQL Server.

Thursday Lectures

Upgrading MySQL: Best Practices
Peter Zaitsev, Percona

- it’s a good idea to test a new MySQL version before upgrading in production
- some possible issues:
– new reserved keywords
– different replication performance

- upgrade time is the best time for doing things that require planning like moving to innodb_file_per_table
or innodb_engine=barracuda

- since upgrades don’t happen very often for most people, consider calling Percona.

MySQL and SSD: usage and tuning
Vadim Tkachenko, Percona

- benchmarks lie, but SSD benchmarks take lying to a new level: performance depends on controller state and how full the SSD is, so test over a 24 hour period
- excited about Fusion IO’s new Atomic Writes API (InnoDB uses 16 KB pages, which is a multiple of 1K or 4K disk blocks. Therefore 2 writes must be made, 1 to the double-write buffer, then to SSD. With this API, a modified InnoDB will not need the double-write buffer, halving write IO and doubling SSD lifetime.)

Verifying MySQL Replication Safely With pt-table-checksum 2.0
Daniel Nichter, Percona

- pt-table-checksum 2.x is gentler on databases, and less likely to abort for no clear reason

pt-table-checksum -h host -u root --ask-pass

Thursday Closing Talk

- draw for exhibit bingo card attendees with several nice prizes (3 different tablets, $2,000 free hosting from Joyent, headphones)
- show appreciation for organizers, sponsors and attendees

I got my second conference MongoDB mug, so now I can reliably store coffee. :)



Conference Slides Hosted on Box.com
Conference Videos
Percona Presentations for 2012
SphinxSearch Conference 2012
Global Transaction Identifiers Feature Preview
Oracle Blog: MySQL 5.5: What’s New in Replication
MySQL Manual 5.5: CHANGE MASTER TO Syntax

Posted in Business, Cloud, Conferences, Linux, MySQL, Open Source, Oracle, Perl, Storage, Tech, Toys | 1 Comment

DIA Emergency Radio Call Confusion

Unfortunately, non-standard aviation radio phraseology resulted in a mishandled emergency at DIA on April 3 between United Express Flight 5912 and ATC.

The pilot excitedly reported smoke in the cockpit and requested “roll trucks please.” ATC couldn’t identify the plane and initially did not roll trucks.

I listened using headphones to an abbreviated MP3 release of the tape, and heard the following mistakes:

  • the emergency request transmission did not have a clear call sign
  • the emergency request sounded paniced, with higher than normal pitched voice and rushed speech (a “lid” in ATC slang)
  • the emergency request did not use “mayday mayday mayday”
  • the reply to ATC used incomplete call sign (just “fifty-nine twelve”), which ATC read as “United 12″

The result was that ATC:

  • did not know who or what plane was calling
  • did not believe it was a legitimate radio transmission
  • did not know where to send rescue, even if it was a legitimate emergency call.

The landing went fine, but the FAA are investigating how an emergency call could get that bungled.

Note that anytime you declare an emergency, or ATC infers one, then ATC has a tendency to ask a lot of questions when you have the least amount of time to answer them.

Some lessons to be learned so far:

  • always use your full call sign and proper radio phraseoloy, especially in an emergency
  • maintain a professional voice tone.

Controller Dismisses Emergency Call (With Audio)
DIA air traffic controller fumbles pilot’s emergency call; FAA responds

Posted in Psychology, Tech | Leave a comment

Google Chart Tools and Visualization API Notes

Google Chart Tools/Visualization API is a feature-rich graphing tool for programmers and UI designers to play with.

Some notes:

  • Chart Tools can be used to create interactive graphical charts and text tables with clickable and sortable elements
  • the results are very attractive with anti-aliased text and graphics
  • you can either write a JavaScript program, or pass parameters to a URL (which can be generated from Google Spreadsheet)
  • Data sources can be either remote (including a Google Spreadsheet), embedded in JavaScript, or generated in JavaScript
  • there is a handy online IDE tool called Code Playground
  • Chart Tools is updated often, so check back for more chart types and features.
  • as always, check Google’s licensing. You are not allowed to locally host Google’s API code, for example, on your intranet. That means you need to have a working Internet connection whenever you want to view charts.
  • 3 years backward compatibility is “guaranteed” by Google.

Take a look at my “RHEL, CentOS and Scientific Linux Release Announcements” interactive multiple time series charts. (You can click on the lines and legend labels to highlight that release, and mouseover data points to see tooltips of values.)

Posted in Cloud, Tech, Toys | Leave a comment

Silicon Valley Perl Mongers: Continuous Integration using Jenkins

Perl LogoJoe McMahon of White Hat Security talked about “Continuous Integration using Jenkins” tonite.

Joe spent about half the time discussing Perl test harnesses (Test::More, Test::WWW::Selenium, Devel::Cover, etc.) and the other half on Jenkins CI.

Although Perl scripts are run by an interpreter and seldom need to be “built” like a C program, a CI tool can be used to:

  • run the test suite, log errors and notify the authors
  • test with multiple slaves using different versions of Perl or OSes
  • create a distro tarball, AMI, etc.
  • be used as a cron server for monitoring, etc.

Joe recommends that developers should be able to do a test run in under 5 minutes. There can be longer tests that analyze code coverage, third-party modules, etc.

Jenkins is easy to get started with:

  1. fetch a package from Jenkins CI
  2. run jenkins:
    $ nohup java -jar jenkins.war > $LOGFILE 2>&1
  3. configure it with the web-based admin tool on port 8080.

Thanks again to the Sunnyvale Plug & Play Tech Center for hosting the event.

wikipedia: Jenkins CI

Posted in Linux, Open Source, Oracle, Perl, Tech, User Groups | Leave a comment