Database Permute Indexes Utility

Perl LogoI wrote a perl program called to autogenerate the list of index statement permutations for a MySQL, Oracle, Postgres or SQL Server query.

This is an example of the Monte Carlo method, applied to database applications. An interesting twist is that the samples are not random (they’re permutations), but the database optimizer behavior is treated as random – a reverse Monte Carlo method. :)

Some use cases:

  • accelerate existing Open Source (WordPress, Joomla) and commercial apps
  • test database optimizer results
  • stress-test databases with hundreds of index objects
  • verify DBA understanding of index behavior
  • as part of a continuous integration pipeline by diffing EXPLAIN output before and after
  • and the most interesting to me, improve performance of infrastructure tools that use metadata stores (Open Stack, CloudStack, DynamoDB, Bacula, etc.)


Autogenerate all valid index statements for EXPLAIN to optimize complex queries automatically.


$ vi (update user settings)

$ | tee permute_index.sql
   alter table t1 add index idx_jb_001 (c1,c3);
   alter table t1 add index idx_jb_002 (c1,c2,c3);
   alter table t1 add index idx_jb_003 (c1,c3,c2);
   alter table t2 add index idx_jb_007 (c4,c5);

# make a backup of your production database and restore in dev or qa

$ mysql -h dev -u root -p test <permute_index.sql

$ mysql -h dev -u root -p test
  mysql> explain select * from t1, t2 where c1=c4 and c1=? and c2=? 
            and c3=? and c5=?;
  Table | Key
  t1    | idx_jb_002
  t2    | idx_jb_007

(Now drop the indexes except idx_jb_002 and idx_jb_007.)

Now anybody can be a Peter Zaitsev! :)

Posted in Linux, MySQL, Open Source, Oracle, Perl | Leave a comment

The Hubble Deep Field May Prove That We Aren’t in a Simulation

Hubble Deep Field
Hubble Deep Field: dark sky is not empty even at this magnification (angle) and exposure (time), and galaxies are statistically distributed the same

I happened across the Hubble Deep Field image again today, and a thought occurred to me.

Some scientists question whether we exist in reality (a natural universe, that may be infinite or expanding) or a simulation (artificial construct). Others question whether time is something that flows or is just an anthropomorphic (human) perception.

If we are living in a simulation, normally it would not be apparent to us. We would need to find discontinuities or limits unexpected in reality to detect the simulation.

There are microscopic observability limits according to our laws of particle physics that explain why we cannot keep “zooming in” more. The proposed experiments to prove or disprove simulation tend to be at this level, where our maximum perception would be at the possible simulation resolution. We would need to see beyond the resolution of the simulation at the Nyquist rate to verify if we are in a simulation, which by definition is not possible since that would be outside our frame of reference.

However, Hubble lets us see at the macroscopic level. The narrower the viewing angle or longer we expose the photo, the more detail (resolution) we see. Normally we interpret that as history (time for light to reach us at c), but if we live in a simulation, the detail means something different: it represents the relative effort in building the simulation and also resolution (grid spacing and/or timing.) For a long enough exposure time, we should be able to either see infinite detail (a complex reality), or not (a limited universe or a simulation.)

Briggs’ Cosmology Theory: by repeating the Hubble Deep Field imaging in the darkest area with even smaller angles and more time, and statistically analyzing galaxy clustering and comparing the results with Hubble Deep Field and other images, we can verify if our universe is a simulation.

Either way, win-win! :)

Additional Notes:

  • red-shift tends to confirm that we are living in an expanding universe, and would take more effort to simulate
  • but our success in understanding small forces (atomic, electrostatic) more so than gravity indicates we don’t know much about the “container” we exist in
  • Planck data, if it is high-enough resolution, could provide interesting statistical analysis to understand the structure of the universe. For studying simulation, it seems easier to do higher-resolution Deep Field imaging though.
  • an additional thought experiment is this: If there are lower limits on atomic scale, then we live in a grid anyway – one might think of even reality as a limited-resolution simulation.

Wikipedia: Hubble Deep Field, Simulation hypothesis
“the universe, statistically, looks the same in all directions.”
Saslaw: The Distribution of the Galaxies

Briggs’ Storage Law

Posted in Photography, Tech | Leave a comment

Top Utility for Cassandra Clusters – cass_top

DataStax’s OpsCenter is pretty, but sometimes you don’t want to chop holes in your firewall for the server and agents.

So here’s cass_top. It works like top, but colorizes the output of nodetool status. It also lets you build nodetool commands using menus, run and log the output.

What’s especially nice is that it uses bash (no python required), and uses minimal screen real estate, so you can view all your clusters on one monitor using eterms.

$ cass_top

cass_top Screenshot
cass_top Help Screenshot

Please leave a comment with your suggestions.

github: Cassandra Top cass_top

Posted in Cassandra, Linux, Storage, Tech, Toys | Leave a comment

Perl Clients for Cassandra

The Perl module Net::Async::CassandraCQL supports CQL3 and comes with a few sample programs and several tests.

To get the perldoc sample code to work:

  1. Perl 5.10 or higher is required. If you’re using Redhat 5 or CentOS 5, then use perlbrew or source to install a user-local version of perl (you should leave the vendor perl alone)
  2. install the Perl Cassandra client modules and dependencies with path-to-new-perl/bin/cpan Net::Async::CassandraCQL
  3. run the CREATE commands:
$ cqlsh
create keyspace if not exists "my_keyspace" with replication = { 
   'class' : 'SimpleStrategy', 'replication_factor' : 3
create table my_keyspace.numbers (v int primary key);
# Program:
# Note: includes bug fixes for Net::Async::CassandraCQL 0.11 version

use strict;
use warnings;
use 5.10.0;

use IO::Async::Loop;
use Net::Async::CassandraCQL;
use Protocol::CassandraCQL qw( CONSISTENCY_QUORUM );

 my $loop = IO::Async::Loop->new;

 my $cass = Net::Async::CassandraCQL->new(
    host => "localhost",
    keyspace => "my_keyspace", # changed dash to underscore in keyspace
    default_consistency => CONSISTENCY_QUORUM,
 $loop->add( $cass );


 my @f;
 for my $number (1 .. 100) {  # added brackets around VALUES
    push @f, $cass->query( "INSERT INTO numbers (v) VALUES ($number)" );

 Future->needs_all( @f )->get;

 my $get_stmt = $cass->prepare( "SELECT v FROM numbers" )->get;

 my ( undef, $result ) = $get_stmt->execute( [] )->get;

 for my $row ( $result->rows_hash ) {
    say "We have a number " . $row->{v};

To cleanup the test keyspace:

$ cqlsh
drop keyspace my_keyspace;

RT#97260: lost a sequence Future

Perl Support For Older Versions of Cassandra (Thrift-based)

SO: Using perl to connect to apache cassandra 2.0.1
Perlcassa – an Apache Cassandra Perl Client

Posted in API Programming, Cassandra, Open Source, Perl, Storage, Tech | Leave a comment

Notes on “Getting Started with Time Series Data Modeling” in Cassandra

Patrick McFadin’s tutorial on Getting Started with Time Series Data Modeling aka “The Weather Model” is a very good introduction to a few important concepts in Cassandra:

  1. it’s the canonical time series use case in Cassandra. (Try to DESCRIBE the OpsCenter keyspace for how DataStax does time series rollups.)
  2. basic intro to CQL syntax.

Here’s some notes on getting more out of Patrick’s tutorial:

  • INSERT and UPDATE are the same in CQL and are known as UPSERT. That implies that if event_time was not part of the key, then the time and temp would just keep updating the same row each hour. It also implies that if we do a double-insert with the same timestamp, we will only see the last one to make it even if event_time is part of the key.
  • “PRIMARY KEY ((weatherstation_id,date),event_time)” and “PRIMARY KEY (weatherstation_id,event_time),) WITH CLUSTERING ORDER BY (event_time DESC)” both use event_time as a clustering (sorted) key, but only the latter syntax allows the descending ordering
  • try to do some SELECT statements without specifying the weather station id. What happens? Why?
  • the reason that Patrick is so excited with TTL is that with RDBMS, either you have to do DELETE FROM table where event_time < now() or setup partitions and DROP old ones and CREATE new ones. The first one causes massive IO and the second one increases administration overhead. Also, Cassandra's row TTL is much more granular than table-level

  • there is no GROUP BY syntax, but denormalizing into PRIMARY KEY ((station_id), temperature) does something similar
  • although Patrick only inserted 4 test records, that’s still more than most database developers bother to test new features with, so do your own acceptance tests :)

Some more general CSQL tips are:

  • Datastax is promoting CSQL to users more so than Thrift
  • queries on the same node in the same partition are relatively fast, everything else being slower.
  • column names are stored as strings in each column, so keep them short if you want to save space. Remember that wasting space inflates memtables, SStables, snapshots, backups, AWS data transfer, etc.
  • SECONDARY INDEX is generally not recommended as it will usually do a scatter-gather across multiple nodes, thus either taking a long time or even timing out
  • prior to CQL3, you could use the cassandra-cli program to list the column families in more low-level detail to see how they were stored
  • to change the row key or clustering you need to drop the table (column family) and create it again

For those new to csqlsh, here’s the full list of commands to get the sample code working:

$ cqlsh

   'class' : 'SimpleStrategy', 'replication_factor' : 3 };

SELECT * FROM system.schema_keyspaces;

USE weather_model;

CREATE TABLE temperature (
   weatherstation_id text,
   event_time timestamp,
   temperature text,
   PRIMARY KEY (weatherstation_id,event_time)

INSERT INTO temperature (weatherstation_id,event_time,temperature)
VALUES ('1234ABCD','2013-04-03 07:01:00','72F');

INSERT INTO temperature (weatherstation_id,event_time,temperature)
VALUES ('1234ABCD','2013-04-03 07:02:00','73F');

INSERT INTO temperature (weatherstation_id,event_time,temperature)
VALUES ('1234ABCD','2013-04-03 07:03:00','73F');

INSERT INTO temperature (weatherstation_id,event_time,temperature)
VALUES ('1234ABCD','2013-04-03 07:04:00','74F');
SELECT event_time,temperature FROM temperature
WHERE weatherstation_id='1234ABCD';

Display the read and write stats. (Use -h host if you get zero reads or writes.)

$ nodetool cfstats weather_model

Keyspace: weather_model
	Read Count: 2
	Read Latency: 0.1925 ms.
	Write Count: 4
	Write Latency: 0.141 ms.

Then clean up:

DROP TABLE temperature;

DROP KEYSPACE weather_model;


Cassandra Query Language (CQL) v3 Cassandra Data Modeling Best Practices

Posted in API Programming, Cassandra, Open Source, Tech | Leave a comment

MariaDB Patch: CREATE [[NO] FORCE] VIEW Options

MariaDB LogoBelow is my patch that implements the CREATE [[NO] FORCE] VIEW options against MySQL/MariaDB 10.1.0.

It adds two new options that look like this:

  1. CREATE NO FORCE VIEW v1 AS SELECT * FROM TABLE1; — base TABLE1 must exist, as before
  2. CREATE FORCE VIEW v1 AS SELECT * FROM TABLE1; — base TABLE1 doesn’t need to exist


  • these options follow the Oracle Enterprise options fairly closely. NO FORCE works like the old default – a user needs database, table, column access and CREATE VIEW grant to create a view (more or less). FORCE allows a user to create a view with only database access and CREATE VIEW grant and no underlying base table. At SELECT time, full access control and grant checking is performed, and an error will occur if those constraints are not met.
  • views are more complicated than one would expect, and can be composed of base tables, derived tables, INFORMATION_SCHEMA (IS), and other views. The only table object not allowed is a temporary table
  • CREATE FORCE VIEW is an important option when managing large sets of views when you don’t want to track the creation sequence, or when creating views via program. An example is mysqldump, which can be simplified by replacing the current temporary tables ordering workarounds with FORCE VIEW.
  • It’s a fairly solid patch. I think the best thing is to commit it to alpha and let it bake for a while.
  • One permutation that will need special handling is this: CREATE FORCE VIEW view1 AS SELECT * FROM table1; Since * is not resolved to column names by FORCE, currently ” AS SELECT * AS ” is generated, causing an error. So just use explicit column names like CREATE FORCE VIEW view1 SELECT id, col1, col2 FROM table1; See this bug.
  • it passes t/view.test:
    # ./ view
    Logging: ./  view
    vardir: /usr/local/mariadb-10.1.0/mysql-test/var
    MariaDB Version 10.1.0-MariaDB-debug
    TEST                                  RESULT   TIME (ms) or COMMENT
    main.view                            [ pass ]   1896
    The servers were restarted 0 times
    Spent 1.896 of 7 seconds executing testcases
    Completed: All 1 tests were successful.
  • I wrote tests/ which does 8,000+ test permutations. It passes. :)

$ cat create_force_view.patch

--- ../mariadb-10.1.0/sql/sql_view.h 2014-06-27 04:50:36.000000000 -0700
+++ sql/sql_view.h 2014-09-02 02:35:42.000000000 -0700
@@ -29,10 +29,10 @@
/* Function declarations */

bool create_view_precheck(THD *thd, TABLE_LIST *tables, TABLE_LIST *view,
- enum_view_create_mode mode);
+ enum_view_create_mode mode, enum_view_create_force force);

bool mysql_create_view(THD *thd, TABLE_LIST *view,
- enum_view_create_mode mode);
+ enum_view_create_mode mode, enum_view_create_force force);

bool mysql_make_view(THD *thd, File_parser *parser, TABLE_LIST *table,
uint flags);
--- ../mariadb-10.1.0/sql/sql_lex.h 2014-06-27 04:50:33.000000000 -0700
+++ sql/sql_lex.h 2014-09-02 01:21:10.000000000 -0700
@@ -170,6 +170,12 @@
VIEW_CREATE_OR_REPLACE // check only that there are not such table

+enum enum_view_create_force
+ VIEW_CREATE_NO_FORCE, // default - check that there are not such VIEW/table
+ VIEW_CREATE_FORCE, // check that there are not such VIEW/table, then ignore table object dependencies
enum enum_drop_mode
DROP_DEFAULT, // mode is not specified
@@ -2442,6 +2448,7 @@
enum enum_var_type option_type;
enum enum_view_create_mode create_view_mode;
+ enum enum_view_create_force create_view_force;
enum enum_drop_mode drop_mode;

uint profile_query_id;
--- ../mariadb-10.1.0/sql/ 2014-06-27 04:50:34.000000000 -0700
+++ sql/ 2014-09-02 02:34:31.000000000 -0700
@@ -4943,7 +4943,7 @@
Note: SQLCOM_CREATE_VIEW also handles 'ALTER VIEW' commands
as specified through the thd->lex->create_view_mode flag.
- res= mysql_create_view(thd, first_table, thd->lex->create_view_mode);
+ res= mysql_create_view(thd, first_table, thd->lex->create_view_mode, thd->lex->create_view_force);
--- ../mariadb-10.1.0/sql/sql_yacc.yy 2014-06-27 04:50:37.000000000 -0700
+++ sql/sql_yacc.yy 2014-09-05 17:19:29.000000000 -0700
@@ -1851,7 +1851,7 @@
statement sp_suid
sp_c_chistics sp_a_chistics sp_chistic sp_c_chistic xa
opt_field_or_var_spec fields_or_vars opt_load_data_set_spec
- view_algorithm view_or_trigger_or_sp_or_event
+ view_algorithm view_or_trigger_or_sp_or_event view_force_option
definer_tail no_definer_tail
view_suid view_tail view_list_opt view_list view_select
view_check_option trigger_tail sp_tail sf_tail udf_tail event_tail
@@ -2446,6 +2446,7 @@
Lex->create_view_algorithm= DTYPE_ALGORITHM_UNDEFINED;
Lex->create_view_suid= TRUE;
+ Lex->create_view_force= VIEW_CREATE_NO_FORCE; /* initialize just in case */
@@ -15887,6 +15888,15 @@
| event_tail

+ /* empty */ /* 411 - is there a cleaner way of initializing here? */
+ { Lex->create_view_force = VIEW_CREATE_NO_FORCE; }
+ { Lex->create_view_force = VIEW_CREATE_NO_FORCE; }
+ { Lex->create_view_force = VIEW_CREATE_FORCE; }
+ ;

DEFINER clause support.
@@ -15944,7 +15954,7 @@

- view_suid VIEW_SYM table_ident
+ view_suid view_force_option VIEW_SYM table_ident
LEX *lex= thd->lex;
lex->sql_command= SQLCOM_CREATE_VIEW;
--- ../mariadb-10.1.0/sql/ 2014-06-27 04:50:36.000000000 -0700
+++ sql/ 2014-09-05 19:33:58.000000000 -0700
@@ -248,7 +248,7 @@

bool create_view_precheck(THD *thd, TABLE_LIST *tables, TABLE_LIST *view,
- enum_view_create_mode mode)
+ enum_view_create_mode mode, enum_view_create_force force)
LEX *lex= thd->lex;
/* first table in list is target VIEW name => cut off it */
@@ -259,7 +259,7 @@

- Privilege check for view creation:
+ Privilege check for view creation with default (NO FORCE):
- user has CREATE VIEW privilege on view table
- user has DROP privilege in case of ALTER VIEW or CREATE OR REPLACE
@@ -272,6 +272,7 @@
checked that we have not more privileges on correspondent column of view
table (i.e. user will not get some privileges by view creation)
if ((check_access(thd, CREATE_VIEW_ACL, view->db,
@@ -285,6 +286,11 @@
check_grant(thd, DROP_ACL, view, FALSE, 1, FALSE))))
goto err;

+ if (force) {
+ res = false;
+ DBUG_RETURN(res || thd->is_error());
+ }
for (sl= select_lex; sl; sl= sl->next_select())
for (tbl= sl->get_table_list(); tbl; tbl= tbl->next_local)
@@ -369,7 +375,7 @@

bool create_view_precheck(THD *thd, TABLE_LIST *tables, TABLE_LIST *view,
- enum_view_create_mode mode)
+ enum_view_create_mode mode, enum_view_create_force force)
return FALSE;
@@ -391,7 +397,7 @@

bool mysql_create_view(THD *thd, TABLE_LIST *views,
- enum_view_create_mode mode)
+ enum_view_create_mode mode, enum_view_create_force force)
LEX *lex= thd->lex;
bool link_to_local;
@@ -425,14 +431,13 @@
goto err;

- if ((res= create_view_precheck(thd, tables, view, mode)))
+ if (res= create_view_precheck(thd, tables, view, mode, force))
goto err;

lex->link_first_table_back(view, link_to_local);
view->open_type= OT_BASE_ONLY;

- if (open_temporary_tables(thd, lex->query_tables) ||
- open_and_lock_tables(thd, lex->query_tables, TRUE, 0))
+ if (open_temporary_tables(thd, lex->query_tables) || (!force && open_and_lock_tables(thd, lex->query_tables, TRUE, 0)))
view= lex->unlink_first_table(&link_to_local);
res= TRUE;
@@ -513,6 +518,7 @@

+if (!force) {
/* prepare select to resolve all fields */
lex->context_analysis_only|= CONTEXT_ANALYSIS_ONLY_VIEW;
if (unit->prepare(thd, 0, 0))
@@ -612,6 +618,7 @@

res= mysql_register_view(thd, view, mode);

@@ -621,7 +628,7 @@
meta-data changes after ALTER VIEW.

- if (!res)
+ // if (!res)
+ if (!res && !force) /* 411 - solves segfault problems with CREATE FORCE VIEW option sometimes */
tdc_remove_table(thd, TDC_RT_REMOVE_ALL, view->db, view->table_name, false);

if (mysql_bin_log.is_open())
@@ -908,6 +915,8 @@
fn_format(path_buff, file.str, dir.str, "", MY_UNPACK_FILENAME);
path.length= strlen(path_buff);

if (ha_table_exists(thd, view->db, view->table_name, NULL))
if (mode == VIEW_CREATE_NEW)
--- ../mariadb-10.1.0/mysql-test/t/view.test 2014-06-27 04:50:30.000000000 -0700
+++ mysql-test/t/view.test 2014-09-06 00:23:32.000000000 -0700
@@ -5263,4 +5263,17 @@
--echo # -----------------------------------------------------------------
--echo # -- End of 10.0 tests.
--echo # -----------------------------------------------------------------
+create no force view v1 as select 1;
+drop view if exists v1;
+create force view v1 as select 1;
+drop view if exists v1;
+create force view v1 as select * from missing_base_table;
+drop view if exists v1;
+--echo # -----------------------------------------------------------------
+--echo # -- End of 10.1 tests.
+--echo # -----------------------------------------------------------------
SET optimizer_switch=@save_optimizer_switch;

Posted in API Programming, Linux, MySQL, Open Source, Oracle, Storage, Tech | Leave a comment

Installing Datastax Cassandra and Python Driver on CentOS 5

Cassandra Logo

Cassandra can run on CentOS 5.x, but there is no yum repo support.

If you can’t upgrade linux distros, here’s how to install Datastax Cassandra Community Edition and the python cassandra driver on CentOS 5.x.

It’s not difficult, but there’s several steps, including updating java.

(The following steps would make a complete chef or puppet recipe for a non-SSL install with vnodes.)

# setup environment
groupadd -g 602 cassandra
useradd -u 602 -g cassandra -m -s /sbin/nologin cassandra
mkdir /var/lib/cassandra /var/log/cassandra /var/run/cassandra
touch /var/log/cassandra/system.log
chown -R cassandra:cassandra /var/lib/cassandra /var/log/cassandra /var/run/cassandra
mkdir -p /opt && cd /opt

cat >> /etc/security/limits.conf <<EOD
cassandra soft memlock unlimited
cassandra hard memlock unlimited
cassandra soft nofile 8192
cassandra hard nofile 10240

# upgrade java
yum remove java
# download, then install JDK 7.x from
rpm -Uvh jdk-7u67-linux-x64.rpm
# download, then install recent jna.jar from
mv jna.jar /usr/share/java
ln -s /usr/share/java/jna.jar /opt/cassandra/lib/
# update envariables
cat >> /etc/profile <<"EOD"
export JAVA_HOME=/usr/java/default
export JRE_HOME=/usr/java/default/jre
export CASSANDRA_HOME=/opt/cassandra

# get Datastax DCE
curl -L >dsc-cassandra-2.0.9.tar.gz
tar zxvf - < dsc-cassandra-2.0.9.tar.gz
ln -s /opt/dsc-cassandra-2.0.9 /opt/cassandra
chown -R root:root /opt/cassandra/
bash cassandra/switch_snappy 1.0.4

# open cassandra firewall ports if necessary (not needed if using internal interface on most servers)
vi /etc/sysconfig/iptables
-A INPUT -i eth0 -m state --state NEW -m multiport -p tcp --dport 7000,7199,9042,9160 -j ACCEPT
service iptables restart
# configure /opt/cassandra/conf/cassandra.yaml (at least listen_address, rpc_address, seeds and tokens before starting server. If you need a do-over, clean the cassandra data with # rm -fr /var/lib/cassandra/*)

# download startup script:
wget -O /etc/init.d/cassandra
chown root:root /etc/init.d/cassandra
chmod 755 /etc/init.d/cassandra
chkconfig --add cassandra

# start cassandra server (if it is standalone, or a seed server. otherwise start after the seed servers):
service cassandra start

# cat /etc/redhat-release 
CentOS release 5.10 (Final)

[root@www1 conf]# nodetool status
Datacenter: datacenter1
|/ State=Normal/Leaving/Joining/Moving
--  Address   Load       Tokens  Owns   Host ID                               Rack
UN  71.87 KB   256     66.8%  8302c6d5-4c88-4695-bbf4-762bc7f24544  rack1
UN  136.63 KB  256     69.9%  eddb03b2-98d3-46ff-be63-95435414a883  rack1
UN  100.08 KB  256     63.3%  2a8dde5e-29b0-4a67-8204-40769376c44a  rack1

If you only see the node on localhost, then you have a problem:

  • read and fix any errors in /var/log/cassandra/system.log until there are zero errors. snappy-related errors are from /tmp being noexec or not running the switch_snappy 1.0.4 command above.
  • disable iptables firewall, test and reenable later
  • in, increase log4j.rootLogger to DEBUG
  • if you have multiple NICs, JMX (ie. nodetool) can bind to the wrong interface. You likely need to configure the-Djava.rmi.server.hostname=[address] option in - to the address you want to listen on
  • public/private IP address problems in AWS EC2. You may need to set broadcast_address: [public_ec2_address]
  • normally rmiregistry is not needed unless you have some atypical firewalling or routing (NAT.)

Datastax Opscenter 5.0

You can install the binary from yum or tarball, but the important things to know are:

  • the monitoring agent will be installed on each cassandra node and uses port 61621. The init script is called datastax-agent.
  • the UI only needs to be installed once, but needs ports 61620, and 8888 for HTTP.
  • to allow Opscenter to remotely manage nodes with ssh, remove old ssh entries from .ssh/known_hosts first, connect manually to each node, then Opscenter should be happy
  • by default, Opscenter listens for agents on, phones home to each day, and does not require web authentication, so you likely want to change those.

Python also needs to be upgraded if you want to use cqlsh or the python client cassandra driver.

# install python 2.6 and dependencies
yum install gcc python26 python26-devel libev libev-devel

# install python's pip module
curl --silent --show-error --retry 5 | python26

# install cassandra driver for python
pip install cassandra-driver

# install
tar zxvf - < blist-1.3.6.tar.gz
cd blist-1.3.6
python26 install
cd ..

# - test installation

from cassandra.cluster import Cluster

cluster = Cluster([''])

def dump(obj):
   for attr in dir(obj):
       if hasattr( obj, attr ):
           print( "obj.%s = %s" % (attr, getattr(obj, attr)))

# python26

obj.__class__ = <class 'cassandra.cluster.Cluster'>

Troubleshooting connection problems in JConsole Storing OpsCenter Data in a Separate Cluster

Posted in Cassandra, Cloud, Linux, Open Source, Tech | Leave a comment

MySQL 5.6 Views and Stored Procedures Tips

MySQL LogoI recently tuned an existing application that used dozens of views and hundreds of stored procedures using MySQL 5.6.

There seems to be three attitudes towards using views and stored procedures (SPs) with MySQL:

  1. don’t use them at all to increase portability
  2. just use SPs to reduce network traffic in large reporting queries (my choice)
  3. go crazy and use them everywhere like old-school Oracle Enterprise apps.

Here are some notes on using views:

  • before creating views, review your schema to ensure keys have matching types and charsets for good performance. It’s much easier to spot schema problems in a text listing than to guess why a view is slower than expected at execution time. (This is doubly true for MySQL Cluster.)
  • MySQL currently doesn’t have CREATE VIEW FORCE, although MariaDB 10.1.0 alpha has my patch. The FORCE option will greatly simply view administration and also mysqldump output, which creates temporary tables to ensure views can be created regardless of table/view ordering issues
  • When looking at the MariaDB source code, it’s apparent that some view options were never actually implemented, like RESTRICT/CASCADE

And some notes on stored procedures (SPs):

  • if a SP makes a stateful session change, like set sql_log_bin=0, ensure that isn’t going to be a problem later if an exception condition doesn’t reset it
  • after running a SP, SHOW PROFILES will list all the queries executed with performance statistics
  • SPs that do non-essential SELECTs or INFORMATION SCHEMA queries probably need to be reviewed by a DBA for fundamental problems like non-atomic “reading before writing”
  • MySQL compiles SPs again for each thread.

Both views and SPs are relatively new MySQL features, so budget some extra development and testing time when using them, especially with replication.

[MDEV-6365] CREATE VIEW Ignores RESTRICT/CASCADE Options Using MySQL triggers and views in Amazon RDS

Posted in MySQL, MySQL Cluster, Open Source, Oracle, Tech | Leave a comment

SVLUG: Devops and Release Canaries with Linux, CloudStack and MySQL Cluster

I did a talk at the Silicon Valley Linux Users Group (SVLUG) tonite on “Devops and Release Canaries with Linux, CloudStack and MySQL Cluster.”

Thanks again to Symantec for hosting.

Posted in API Programming, Cloud, Linux, MySQL, MySQL Cluster, Open Source, Oracle, Tech | Leave a comment

Velocity Conference Santa Clara 2014 Tips Game Cards

The O’Reilly Velocity Web Operations & Performance Conference is June 24-26 in Santa Clara.

Next to the messages/jobs board was a Web Ops & Performance Tips board:

– use source maps to debug compressed JS and CSS
– use ::before to optimize font rendering
– use local storage to persist markup and templates to reduce requests and payload
– avoid CSS block rendering in chrome by not using screen media type until after. Then put screen back to element
– use gatling stress tool for load generation/perf testing (Apache Licence 2.0)
– learn curl
– learn POSIX before recreating another tool that already exists. Bill Joy (?)
– “if you do it more than twice a week, automate”
– it takes no skills to do NoOps! :)

Posted in Cloud, Conferences, Open Source, Tech | Leave a comment

AWS Pop-up Loft, San Francisco

Amazon Web Services pop-up loft (Ask an Architect area, lecture hall, kitchen/lounge)
Photo credit:

I happened to be in SF today, so I went to the Amazon Web Services pop-up loft on Market St.

Amazon rented an empty storefront for 4 weeks for lecture sessions upstairs, and a computer lab and an ‘Ask an Architect’ bar downstairs.

One of the hosts said the loft was a shell in May, and they had to build out everything: the kitchen area, 2 bathrooms and various partitions.

I asked the experts about new EBS and RDS features, and they had answers as well as a $100 AWS credit.

The weather was sunny and warm in SF.

Lots of street performers and hustlers, including a very smooth male R&B singer. A young rapper named Rap2K15 was selling hand-made CDs.

Update 2014 06 23: Apparently a drawing was held, and I was one of 3 winners of a free general pass to the AWS:Reinvent Conference :)

Update 2014 06 24:

AWS Bootcamp

Full-day AWS overview, including EC2, S3, RDS, VPC and IAM, with 2 labs.

“Provisioning and Managing AWS Infrastructure with Chef” with special guest George Miranda, Chef Technical Consultant, Chef

George talked about using Chef tools like chef metal, knife and chef zero and a minimal amount of ruby to make an AMI and provision a MySQL server and 5 Nginx web servers.


@gmiranda23, chef-ami-factory

Update 2014 06 26:

Dealing With Obstacles at Scale, Bob Hagemann, Twilio

To reduce pain:

– UTC timezone
– UTF8
– use thin AMI and chef/puppet instead of thick AMI
– wrote boxconfig a few years ago (like netflix asgard)
– remote admin mainly
– small teams 3-8
– services should run in 3 AZs
– monitoring with nagios, cron, pingdom
– haproxy on each host as proxy
– MySQL, MHA, LVM. Manual failover.
– global low latency with route53
– @bobzilla42
– Uses freeswitch plus own telcom sw
– billing system 100s QPS
– Ops team is about 8 people
– VPNs to HQ and carrier-approved colo
– three founders, one came from Amazon.

925 Market Street, SF
June 4 – 27, 2014 (likely closed on the 27th for dismantling)
Free registration, tshirts and lunch. Closes 5:30 pm, 6:00 pm or 8:00 pm daily.
Muni 30 and 45 return from Market St. and 5th to Caltrain.

@AWSstartups #AWSloft

AWS Loft Returning in Fall 2014

Posted in API Programming, Business, Cloud, Conferences, Linux, MySQL, Open Source, Oracle, San Jose Bay Area, Tech | Leave a comment

Advanced Liquibase Techniques

Liquibase LogoI recently did some work with liquibase. Here’s some techniques for advanced users to workaround limitations to calculate query cost.

Liquibase Introduction

Liquibase is an Open Source (Apache 2.0 License) Java utility and API for specifying and versioning schema changes (DDL) for several popular databases. It is commonly introduced to projects by programmers, rather than DBAs.

What liquibase can do:

  • allow “refactoring” of SQL schema changes to target multiple databases using XML by using a database-independent syntax, or raw SQL, depending on your preference
  • allow conditional execution and rollback of SQL based on database type or environment.

What liquibase can’t do:

  • has no built-in provisions for operational concerns, like conditionally executing SQL based on time/cost. There’s an assumption that schema changes are online, often true on Oracle and SQL Server, less so on MySQL, especially prior to 5.6 (unless you do micro-sharding)
  • does not do intelligent merges to the same object across changesets, like adding multiple columns to the same table in one statement.

How liquibase works:

  • the programmer specifies schema changes in Java, XML or JSON and runs the liquibase command
  • liquibase creates 2 tables in your database to store version, user and patch name information and to lock out other simultaneous liquibase runs.

How to Make Liquibase Consider Cost for MySQL

After some experimentation, there’s a couple liquibase features you can use to do more advanced things:

  1. create a savepoint using the tag and rollback options:
    • liquibase tag rel0; liquibase update …; liquibase rollback rel0
  2. prepend and append logic to each changeset to use information_schema on the SQL DDL statement. on failure, exit with 1 (See XML example below)


<?xml version="1.0" encoding="UTF-8"?>


    <changeSet id="1" author="james">
       create table if not exists `profiling` ( `connection_id` int(11) not null default 0, `query_id` int(11) not null default '0', `state` varchar(40) default '', KEY (query_id));
       truncate table profiling;
       set profiling=1;

       alter table department add column test2 int default null;
       insert into profiling (connection_id, query_id, state) select connection_id(), query_id, state from information_schema.profiling where query_id=2;
        <sql>alter table department drop column test2</sql>

    <changeSet id="1-post" author="james">
      <preConditions onFail="HALT">
        <sqlCheck expectedResult="0">SELECT count(*) from profiling where state='copy to tmp table'</sqlCheck>


  1. the changeset DDL statement will still have run, even if the precondition HALTs – they’re separate changesets, after all
  2. the rollback in “1” will not be executed, even if “1-post” HALTs.

The workaround for those 2 issues is to combine the two techniques in a shell script:


liquibase tag rel0

liquibase update changeset.xml || {
    # fail the build pipeline to not propagate changeset to next stage
    # (ie. don't run in production)
    liquibase rollback rel0
    mysql -e 'alter table test.department drop column test2' 
    exit 1

The above looks a little kludgy, but provides a stepping stone for the reader to customize in their particular environment. (The preConditions and bash script can be easily autogenerated with a Perl or Python script.)

An alternative to XML is using the Java API to set everything up.

Please leave a comment if you have any suggestions or a Java API program.

Posted in API Programming, MySQL, MySQL Cluster, Open Source, Oracle, Tech | Leave a comment

Percona Live MySQL Conference Santa Clara 2014

The Percona Live MySQL Conference was held once again in Santa Clara from April 1-4, 2014.

Executive Summary:

  1. Percona hosted another excellent conference, with 1,150 attendees from 43 countries plus a vibrant exhibit hall.
  2. The overall themes that emerged this year were “What’s new in MySQL 5.6?” and “The rise of Galera Cluster.” Unfortunately, Oracle delivered the 5.6 features they promised, but didn’t bother to ask production DBAs what they really needed (ie. GTIDs require downtime to configure, and ALTER ONLINE doesn’t support throttling or background operation on slaves (SR 3-8856341908).)
  3. MySQL 5.7 is promising about double the performance of 5.6, but note that the 5.7 feature micro-benchmark effort hasn’t translated into a complete understanding of whole database performance yet.
  4. the current active branches are now: Oracle 5.6/5.7, MariaDB 10.0/10.1, Webscale SQL (Facebook, Google, LinkedIn, and Twitter), Facebook 5.6 with Deployable GTIDs, and Percona Server 5.6. (The version you want to migrate to is one based on MySQL 5.6.17 or later.)

Severalnines Booth booth. They create and support cluster and cloud database solutions. Photo credit: Steve Barker,


Wed. Keynotes

Percona Live 2014 opening keynote with Percona CEO Peter Zaitsev
Robert Hodges – Getting Serious about MySQL and Hadoop at Continuent
(Continuent needs to pivot into another market as MySQL’s new built-in features displace their replication products.)
‘Raising the MySQL Bar’ with Oracle’s Tomas Ulin, VP of Engineering for MySQL, Oracle
Adventures in MySQL at Dropbox, Renjish Abraham

Wed. Talks

Online schema changes for maximizing uptime, David Turner, Dropbox, Ben Black, Tango

– MySQL 5.6 has online schema change capability, however there’s no way to throttle IO consumed during the operation and the single-threaded slave will lag
– David has tested the ALTER ONLINE in MySQL 5.6.17 and will use it when ported to Percona Server
– for now uses Percona Online Schema Change utility for its throttling feature.

Be the hero of the day with the InnoDB Data recovery tool, Marco “The Grinch” Tusa and Aleksandr Kuzminsky, Percona Services

– tools have been created by Percona to recover Innodb data if you don’t have backups and you’re out of business otherwise. Call them! :)

Galera Cluster New Features, Seppo Jaakola, Codership

– reviewed features in Galera Cluster versions 3 and 4
– looking good.

MySQL Cluster Performance Tuning, Johan Andersson,

- Disable NUMA
- echo 0 > /proc/sys/vm/swappiness
- bind data node threads to CPUs
- cat /proc/interrupts


LDM = cores/2

TC = LDM/4



Tune redo log



Practical sysbench, Peter Boros, Percona

– prefers “latency” graph style with transparent dots vs. line charts
– uses R and ggplot2 for graphing
– attendees tried to guess SSD performance on Peter’s notebook for different block sizes, most were proven totally wrong by sysbench

Birds of a Feather (BoF) Sessions

“Meet MySQL Team (at Oracle)” BoF

– discussion again this year about parallel query execution (same as at MariaDB BoF last year), with Peter Zaitsev also bringing it up again
– discussion about raw partitions (belief is that they will be 20% more space-efficient and 30% faster, and avoid Linux endless limitations and bugs)
– internal “development roadmap” only extends about 12 months at a time, subject to customer demands
– I griped about FK panic/data loss issues in MySQL Cluster 7.3.3. Tomas Ulin, Vice President, MySQL Engineering, said that was news to him. (See SR 3-8717994851 and SR 3-87646727311)
– Mark Callaghan, Facebook, said he was working on MongoDB now, but requested named keys in flexible schema in MySQL.
– Peter Zaitsev, Percona, said several clients are using GTIDs and they seem to work.
– Oracle pleaded with users to drop MyISAM. I mentioned the main reason was that legacy systems used older compression methods, but InnoDB could be used since it has compression too
– The Oracle MySQL Fabric project is an attempt to counter MongoDB’s automatic slave promotion.


Thursday Keynotes

‘9 Things You Need to Know…’, Peter Zaitsev, Percona
The Evolution of MySQL in the All-Flash Datacenter, Nisha Talagala, Fusion-IO
MySQL, Private Cloud Infrastructure and OpenStack, Sean Chighizola, Big Fish Games
Keynote Panel: The Future of Operating MySQL at Scale

Thu. Talks

Benchmarking Databases for Scale, Peter Boros and Kenny Gryp, Percona

Question: “What is Percona’s secret to professional benchmarks?”
Answer: “Benchmark absolutely everything multiple times, time permitting.”

MySQL 5.7: Performance & Scalability Benchmarks, Dimitri KRAVTCHUK

– comprehensive micro-benchmarking graphs of 5.7 to gain a deeper understanding of parts
– the challenge remains: how to tune the whole database to perform well?

Use Your MySQL Knowledge to Become an Instant Cassandra Guru, Robert Hodges, Continuent and Tim Callaghan, Tokutek

– good comparison of relational data modelling and C* data modelling, lots of similarities
– note that MariaDB has a Cassandra plugin

RDS for MYSQL, Tips, Patterns and Common Pitfalls, Laine Campbell, Blackbird (formerly PalominoDB)

Write Conflicts in Multi-Master Replication Topologies, Seppo Jaakola, Codership

– it’s good to see that Codership is paying attention to the details of replication

MySQL Community Awards

Shlomi has a comprehensive post on this years winners.

MySQL Lightning Talks (5 minutes each)

Truncating Sub Optimal DBA Verbal Responses Vectors, David Stokes (Oracle)

MySQL 5.6 Global Transaction IDs: Benefits and Limitations, Stephane Combaudon (Percona)


Zero database downtime using the Federated storage engine and Replication, prasad mani (BBC)

Scaling via adding a Table, Rick James (self)

Rick knows some clever ways to optimize solutions with MySQL. He’s doing consulting now, so contact him.

Extra Table Saves the Day: Slides

No es ‘ano’, es ‘año’! A take on encoding in your DB, Ignacio Nin (Vivid Cortex)

What Not to Say to the MySQL DBA, Gillian Gunson (Blackbird (formerly PalominoDB))
“I’ll code around it. ”
“Stop micro-optimizing. ”
“Use passive master for QA”
“MySQL is a toy database. ”
This conference is a support group. ”

Hall of Shame, Shlomi Noach
Triple active-replication in gaming anecdote: don’t do that.

The bash slave-prefetch oneliner, Art van Scheppingen (Spil Games)

Unsung Relay Log, Vishnu Rao, FlipKart
Com_relaylog_dump for tungsten and mysql 5.5

Unique User Count — Rollup, Rick James (self)

Formula for user visit estimation by counting bits.

Logical Backups in the Cloud, Bill Karwin, Percona
Backups for PHP designers
PHP class Mysql/Dump

How to Squat, Kyle Redinger (VividCortex, Inc)

Iron DBA Replication Challenge, Attunity


Friday Keynotes

Percona CMO Terry Erisman opens the 3rd and final day of Percona Live 201

Keynote: OpenStack Co­Opetition, A View from Within, Boris Renski, Mirantis and OpenStack Boardmember

– one of the best conference keynotes ever, and a great primer on Open Source marketing … up there with the O’Reilly Open Source Conference keynote on the importance of Android – before it shipped.

Friday Talks

Global Transaction ID at Facebook, Evan Elias, Santosh Banda and Yoshinori Matsunobu, Facebook

– just write your own MySQL branch if a feature is too hard to deploy :)

R for MySQL DBAs, Ryan Lowe and Randy Wigginton, Percona

– R has about 1,000 interesting sample databases (demos included diamonds and cars)
– good interface for quick graphing, not so great for complex programs
– Percona usess R and ggplot graph module for most of the graphs you see now.

MariaDB for Developers, Colin Charles, Chief Evangelist, MariaDB

Closing Prize Drawing

About 30 high-end gifts were handed out.

Some nice prizes contributed by exhibitors, including Nexus 7 tablets, $250 AWS gift certificates, SQLyog and Monyog licenses, and a quad drone!


The exhibits are one of my favorite things at the conference each year because of how strong the MySQL third-party community is.

Some notable absences were Clustrix and Violin memory, but those were offset by new exhibitors. Webyog was a sponsor but I didn’t see a booth. PalominoDB changed their name to Blackbird, and appear to be offering DevOps as well as DBA services.

And of course, as the organizers, Percona had a large, central spread. :)

Thanks to the sponsors and exhibitors for making a conference like this financially possible.

Facebook Debuts Web-Scale Variant Of MySQL

Facebook’s Yoshinori Matsunobu on MySQL, WebScaleSQL & Percona Live
Twitter’s Calvin Sun on WebScaleSQL, Percona Live
Tweets about PerconaLive
Percona Live MySQL Conference Highlights

Posted in Cassandra, Cloud, Conferences, Linux, MySQL, MySQL Cluster, Open Source, Oracle, Perl, San Jose Bay Area, Storage, Tech | Leave a comment

Cassandra Operations Checklist

Most of the Cassandra rollouts I’ve heard about at conferences have been “Devopsed” – written by Dev and productionized by Dev, with hand-off to Operations long afterwards.

That’s the opposite to how RDBMS projects are usually deployed in large companies.

As Cassandra becomes more mature, this hand-off will occur earlier after development ends.

Here is a checklist for handing off a Cassandra database to Operations (I only consider non-trivial rings of 3 or more nodes in production with a full data set):

  Node Impact
  Item Comments Performance/ Space/ Time/IOPs/BW
Cassandra Server Version Should be exactly the same minor version across cluster except briefly during server updates
Token or vnodes? needs to be configured before first start of server
Cassandra Client/Connector Version Thrift or CQL?
Snitch name? Why? several choices
Replication Factor (RF)? Why? usually RF=3 for SoT* data, defined at keyspace level
Compaction method? Why? Size or Level, defined at CF level
Read Consistency Level? Why? Netflix recommends CL=ONE. ALL seldom makes sense.
Write Consistency Level? Why? ALL seldom makes sense.
TTL? Why? Defined at row level.
Expected Average Query Latency 10 ms is reasonable, 1 ms is tough.
nodetool repair/scrub needed weekly yes more space more
Bootstrapping a new node yes yes
Java gcpause stop the world yes yes
Are there any wide columns? do they get wider over time? pathological case for Cassandra yes more space more
Backup in case of application bug or a disaster. Opscenter, Priam, custom. yes slightly more for incremental backups, double for local cold copy more
Restore requires Cassandra node shutdown yes
If a storage volume fills, howto fix it? Especially a problem with multiple JBOD volumes, which fill unevenly. yes less space less
If a storage volume fails, howto fix it? yes less space less
What is the total data size now? Projected in 12 months? affects most operations yes yes yes
What is the acceptable query latency? affects network and hardware choices
What is the best maintenance window time each week?
What are the business and practical SLAs?
What training is needed for your Operations team? Datastax Admin and Data Modelling Classes (recommend most recent Cassandra version)
What partitioner is used? Opscenter only supports random partitioner or murmur 3 partitioner for rebalancing
What procedures need to be written for your Operations team?
What monitoring tools?
  1. DSE or DCE/OpsCenter
  2. nodetool
  3. Jconsole/jmxterm
  4. Boundary
  5. nagios/zabbix
What bugs have been encountered? Which ones still apply?
What lessons can Devops share with the Operations team?

SoT = Source of Truth

About Data Consistency in Cassandra
ConstantContact techblog: Cassandra and Backups Do I absolutely need a minimum of 3 nodes/servers for a Cassandra cluster or will 2 suffice?

Posted in Business, Cassandra, Cloud, Tech | Leave a comment

Howto Add a New Command to the MySQL Server

MySQL LogoAdding a new statement or command to the MySQL server is not difficult.

First, decide if you want to modify the server source code, or if a User-Defined Function (UDF) will meet your needs.

Since I just added the SHUTDOWN server command, I thought I would be helpful to outline the steps needed to add a new command.


  1. some familiarity with C/C++ syntax and programming (like “The C Programming Language”, by Kernighan and Ritchie.)
  2. some familiarity with lex and yacc. (I read the Dragon Book a long time ago.)
  3. access to a linux account with cmake, gcc, make and bison packages.
# CentOS
yum install cmake gcc make bison

# Ubuntu
apt-get update
apt-get install cmake gcc make bison

# unpack the MySQL source code:

tar zxvf - < mariadb-5.5.30.tar.gz

# most of the files you need to modify are in this directory:

cd mariadb-5.5.30/sql
  • sql_yacc.yy
  • sql_lex.h

# add the token(s) (commands and arguments you think you will need) and verify the syntax:

bison -v sql_yacc.yy

# if you get warnings, fix %expect in

# cut-and-paste a code block from a command with similar syntax in to implement your new command, and build a test version of MySQL

# build your new server in a sandbox:


cd mariadb-5.5.30
cmake . -DCMAKE_INSTALL_PREFIX:PATH=/usr/local/mariadb-5.5.30
make --with-debug
sudo make install

# test your new server with 3 terminal windows:


killall mysqld
/usr/local/mariadb-5.5.30/bin/mysqld_safe --user=mysql --debug &
tail -f  /tmp/mysqld.trace | grep Got &
tail -f /var/log/mysqld.log &
mysql -u root -p
# login, then test your new command while watching the log and trace

# read /var/log/mysqld.log and /tmp/mysqld.trace for errors and panics like this:

Version: '5.5.30-MariaDB-debug'  socket: '/var/lib/mysql/mysql.sock'  port: 3306  Source distribution
mysqld: /home/james/mariadb-5.5.30/sql/ int mysql_execute_command(THD*): Assertion `0' failed.
130515 11:25:19 [ERROR] mysqld got signal 6 ;

This could be because you hit a bug. It is also possible that this binary
or one of the libraries it was linked against is corrupt, improperly built,
or misconfigured. This error can also be caused by malfunctioning hardware.

The above panic was caused by the SQLCOM_ switch falling through, because the new command was not defined yet.

# When you’re done, make a test

vi mysql-test/t/my_new_command.test

# Create a patch file:

mv mariadb-5.5.30 mariadb-5.5.30-new
tar zxvf - < mariadb-5.5.30.tar.gz

cd mariadb-5.5.30/src
for i in sql_yacc.yy sql_lex.h; do
   echo $i
   diff -u $i ../../mariadb-5.5.30-new/sql/ >>patch.txt
# don't forget mysql-test/t/my_new_command.test

# apply your patch file:

patch -b < patch.txt

# do a build and test your patch before distributing it.

Easy peasy, right! :)

Sergei Golubchik wrote on the MariaDB developers list: "Reserved words are keywords (listed in the sql/lex.h) that are
not listed in the 'keyword' rule of sql_yacc.yy (and 'keyword_sp' rule, that 'keyword' rule includes)."

How can I get the output of the DBUG_PRINT
How to find shift/reduce conflict in this yacc file?
MariaDB Contributor Agreement (MCA) Frequently Asked Questions
wikipedia: diff

MySQL Internals Manual XtraDB / InnoDB internals in drawing
Overloading Procedures
innodb_diagrams project
Understanding MySQL Internals By Sasha Pachev (O'Reilly)
DTrace can tell you what MySQL is doing
MySQL C Client API programming tutorial
MySQL 5.1 Class Index

  • IRC, #maria channel on Freenode
  • (ideas)
  • (search for unassigned tasks)

Keywords: MariaDB, MySQL server programming, tutorial, patch.

Posted in API Programming, Linux, MySQL, Open Source, Oracle, Tech, Toys | 3 Comments

Patch to Add Shutdown Statement to MySQL MariaDB

MySQL LogoAt the OSCON 2011 MariaDB Birds-of-a-Feather (BoF) session, I suggested adding a MySQL SHUTDOWN statement to Monty, which was written up as WL#232. Other databases have this feature, and it’s very handy when automating management of a cluster of MySQL servers.

And at the Percona Live MySQL Conference 2013, Monty suggested to MariaDB BOF attendees that a good way to get a new feature added is to to write a patch to pave the way for a committer to start with.

Phase 1

So … I sat down last nite and wrote the patch against MariaDB 5.5.30.

Basically it meant telling mysql’s lex/yacc files to parse “shutdown”, then calling the existing MySQL API shutdown kill_mysql() function.

This code is released under the Open Source BSD-new License, according to the MariaDB Contributor Agreement.

shutdown_0.1.patch.txt – MariaDB 5.5.30:

---	2013-03-11 03:29:13.000000000 -0700
+++ /home/james/mariadb-5.5.30-new/sql/	2013-05-15 13:17:05.000000000 -0700
@@ -1305,7 +1305,6 @@
@@ -1333,7 +1332,6 @@
     STATUS_VAR *current_global_status_var;      // Big; Don't allocate on stack
@@ -3736,6 +3734,31 @@
+  {
+    // jeb - This code block is copied from COM_SHUTDOWN above. Since kill_mysql(void) {} doesn't take a level argument, the level code is pointless.
+    // jeb - In fact, the level code should be removed and Oracle Database statements implemented: SHUTDOWN, SHUTDOWN IMMEDIATE and SHUTDOWN ABORT. See WL#232.
+    status_var_increment(thd->status_var.com_other);
+    if (check_global_access(thd,SHUTDOWN_ACL))
+      break; /* purecov: inspected */
+    enum mysql_enum_shutdown_level level;
+    if (level == SHUTDOWN_DEFAULT)
+      level= SHUTDOWN_WAIT_ALL_BUFFERS; // soon default will be configurable
+    else if (level != SHUTDOWN_WAIT_ALL_BUFFERS)
+    {
+      my_error(ER_NOT_SUPPORTED_YET, MYF(0), "this shutdown level");
+      break;
+    }
+    DBUG_PRINT("SQLCOM_SHUTDOWN",("Got shutdown command for level %u", level));
+    my_eof(thd);
+    kill_mysql();
+    res=TRUE;
+    break;
+  }
--- sql_yacc.yy	2013-03-11 03:29:19.000000000 -0700
+++ /home/james/mariadb-5.5.30-new/sql/sql_yacc.yy	2013-05-15 11:12:03.000000000 -0700
@@ -791,7 +791,7 @@
   Currently there are 174 shift/reduce conflicts.
   We should not introduce new conflicts any more.
-%expect 174
+%expect 196
    Comments for TOKENS.
@@ -1645,6 +1645,7 @@
         definer_opt no_definer definer
         parse_vcol_expr vcol_opt_specifier vcol_opt_attribute
         vcol_opt_attribute_list vcol_attribute
+        shutdown
 %type  call sp_proc_stmts sp_proc_stmts1 sp_proc_stmt
@@ -1796,6 +1797,7 @@
         | savepoint
         | select
         | set
+        | shutdown
         | signal_stmt
         | show
         | slave
@@ -13715,6 +13717,17 @@
+          SHUTDOWN
+          {
+            LEX *lex=Lex;
+            lex->value_list.empty();
+            lex->users_list.empty();
+            lex->sql_command= SQLCOM_SHUTDOWN;
+          }
+        ;
           expr { $$=$1; }
         | DEFAULT { $$=0; }
---	2013-03-11 03:29:11.000000000 -0700
+++ /home/james/mariadb-5.5.30-new/sql/	2013-05-15 03:07:00.000000000 -0700
@@ -2173,6 +2173,7 @@
   case SQLCOM_KILL:
---	2013-03-11 03:29:14.000000000 -0700
+++ /home/james/mariadb-5.5.30-new/sql/	2013-05-15 01:20:11.000000000 -0700
@@ -3333,6 +3333,7 @@
   {"savepoint",            (char*) offsetof(STATUS_VAR, com_stat[(uint) SQLCOM_SAVEPOINT]), SHOW_LONG_STATUS},
   {"select",               (char*) offsetof(STATUS_VAR, com_stat[(uint) SQLCOM_SELECT]), SHOW_LONG_STATUS},
   {"set_option",           (char*) offsetof(STATUS_VAR, com_stat[(uint) SQLCOM_SET_OPTION]), SHOW_LONG_STATUS},
+  {"shutdown",             (char*) offsetof(STATUS_VAR, com_stat[(uint) SQLCOM_SHUTDOWN]), SHOW_LONG_STATUS},
   {"signal",               (char*) offsetof(STATUS_VAR, com_stat[(uint) SQLCOM_SIGNAL]), SHOW_LONG_STATUS},
   {"show_authors",         (char*) offsetof(STATUS_VAR, com_stat[(uint) SQLCOM_SHOW_AUTHORS]), SHOW_LONG_STATUS},
   {"show_binlog_events",   (char*) offsetof(STATUS_VAR, com_stat[(uint) SQLCOM_SHOW_BINLOG_EVENTS]), SHOW_LONG_STATUS},
--- sql_lex.h	2013-03-11 03:29:13.000000000 -0700
+++ /home/james/mariadb-5.5.30-new/sql/sql_lex.h	2013-05-15 01:19:17.000000000 -0700
@@ -193,6 +193,7 @@
     When a command is added here, be sure it's also added in

To apply:

tar zxvf - < mariadb-5.5.30.tar.gz
cd mariadb-5.5.30/sql
patch -b < shutdown_0.1.patch.txt


cd mariadb-5.5.30
cmake . -DCMAKE_INSTALL_PREFIX:PATH=/usr/local/mariadb-5.5.30
make --with-debug
sudo make install


killall mysqld
/usr/local/mariadb-5.5.30/bin/mysqld_safe --user=mysql --debug &
tail -f  /tmp/mysqld.trace | grep Got &
mysql -u root -p

mysql client (with mysqld.log and mysql.trace entries overlaid):

mysql> shutdown;
ERROR 2013 (HY000): Lost connection to MySQL server during query
mysql> 130515 13:20:38 mysqld_safe mysqld from pid file /var/run/mysqld/ ended


T@4    : | | | >parse_sql
T@4    : | | | <parse_sql
T@4    : | | | >LEX::set_trg_event_type_for_tables
T@4    : | | | <LEX::set_trg_event_type_for_tables
T@4    : | | | >mysql_execute_command
T@4    : | | | | >deny_updates_if_read_only_option
T@4    : | | | | <deny_updates_if_read_only_option
T@4    : | | | | >stmt_causes_implicit_commit
T@4    : | | | | <stmt_causes_implicit_commit
T@4    : | | | | SQLCOM_SHUTDOWN: Got shutdown command for level 16
T@4    : | | | | >set_eof_status
T@4    : | | | | <set_eof_status
T@4    : | | | | >kill_mysql
T@4    : | | | | | quit: After pthread_kill
T@4    : | | | | <kill_mysql
T@4    : | | | | proc_info: /home/james/mariadb-5.5.30/sql/  query end


130515 13:20:08 mysqld_safe Starting mysqld daemon with databases from /var/lib/mysql
130515 13:20:08 InnoDB: !!!!!!!! UNIV_DEBUG switched on !!!!!!!!!
130515 13:20:08 InnoDB: The InnoDB memory heap is disabled
130515 13:20:08 InnoDB: Mutexes and rw_locks use GCC atomic builtins
130515 13:20:08 InnoDB: Compressed tables use zlib 1.2.3
130515 13:20:08 InnoDB: Initializing buffer pool, size = 128.0M
130515 13:20:08 InnoDB: Completed initialization of buffer pool
130515 13:20:08 InnoDB: highest supported file format is Barracuda.
130515 13:20:09  InnoDB: Waiting for the background threads to start
130515 13:20:10 Percona XtraDB ( 5.5.30-MariaDB-30.1 started; log sequence number 1597945
130515 13:20:10 [Note] Plugin 'FEEDBACK' is disabled.
130515 13:20:10 [Note] Event Scheduler: Loaded 0 events
130515 13:20:10 [Note] /usr/local/mariadb-5.5.30/bin/mysqld: ready for connections.
Version: '5.5.30-MariaDB-debug'  socket: '/var/lib/mysql/mysql.sock'  port: 3306  Source distribution
130515 13:20:37 [Note] Got signal 15 to shutdown mysqld
130515 13:20:37 [Note] /usr/local/mariadb-5.5.30/bin/mysqld: Normal shutdown

130515 13:20:37 [Note] Event Scheduler: Purging the queue. 0 events
130515 13:20:37  InnoDB: Starting shutdown...
130515 13:20:38  InnoDB: Shutdown completed; log sequence number 1597945
130515 13:20:38 [Note] /usr/local/mariadb-5.5.30/bin/mysqld: Shutdown complete

130515 13:20:38 mysqld_safe mysqld from pid file /var/run/mysqld/ ended

A possible test would be like this, but it would interfere with operation of the test mysqld instance:



Phase 2

My above patch applies cleanly within the existing MySQL shutdown framework, which implements a feature like Oracle Database's SHUTDOWN IMMEDIATE command.

However, my patch is a Pyrrhic victory, since there's so much wrong with MySQL's existing shutdown framework that it will take an internals committer to sort it out.

The shutdown framework is badly designed, if it was designed at all, since it fails the "does this feel programmed on purpose?" test, and in fact doesn't work reliably:

  1. Conceptually, there should be 3 Oracle Database-style SHUTDOWN options: WAIT, IMMEDIATE and ABORT. Implementing SHUTDOWN WAIT would mean intrusive changes to the MySQL source code, while SHUTDOWN ABORT would be easier to program, but at the risk of data integrity.
  2. the following bug reports describe a race condition between mysqld threads and the shutdown thread:

I guess I'll have to pay myself the worklog bounty of $100. :)

This is actually my second MySQL patch contribution. In 1997 or 1998 I submitted a patch for the installer, which was one of the most troublesome components at that time. Monty rewrote it, but I liked my version better.

Update: Sergei Golubchik committed this patch to MariaDB 10.0.4 on 2013-06-25. Thanks, Sergei!

MySQL's Missing Shutdown Statement
Bug #63276: skip sleep in srv_master_thread when shutdown is in progress

Posted in Linux, MySQL, Open Source, Oracle, OSCON, Tech | 1 Comment

AWS Reinvent Conference 2014

I had a free ticket to the AWS Reinvent Conference, but I just got back from Las Vegas so decided to watch the streaming broadcast.

Wednesday Keynotes

– Medical use case
– 1 PB per month storage


– 100,000 writes per second, 5x reads
– 6 copies across 3 DCs, backups to S3
– immediate restart (no log replay)
– pre-warmed cache


– CodeDeploy
– internal developers like it, miss it

Hybrid Mode


– customer-focused, rather than competitor or product-focused
– saved $350 million
– customers were surprised initially
– how many vendors give back money?


– moved old DC, to AWS first
– 2013 Partner of the Year
– worked with AWS on new security features
– moving everything else now

Johnson & Johnson

– lots of M&A, but divisions operate locally
– need analytics
– users asked for better, faster, cheaper, offering more of the same thing wasn’t enough
– AWS makes their engineers happy

AWS Security Keynote
Stephen Schmidt, GM & CISO, AWS

– security is not new to Amazon
– more visible than most on-premises
– one bar – same for everybody
– objects have records, logs and timestamps in Cloudtrail
– sign up for SMS on changes
– HSM launched in Europe first because of demand
– Cause of Error (COE) investigation for each problem
– security engineers are embedded in service teams, CISO reports to CEO. Security is job 1
– pen testing black and white
– collect ddos cannons and mitigate, often beforehand

Infrastructure as Code
Alex Corley and David Winter, AWS
Tom Wanielista, Simple

– Tom needed PCI. So had to start over with VPC, scan each node and binary, etc.

The Quest for the Last 9: Building Highly Available Services from the Ground Up
Khawaja Shams, Technical Advisor, AWS

– Cross-Region Replication


– ad server
– 1 million QPS on DynamoDB, 1 PB logs (compressed) per month
– rather scale engineering team for products, not infra
– 1% downtime is $1 million on $100 million in revenue, plus your customers

Managing the Pace of Innovation: Behind the Scenes at AWS
Charlie Bell, SVP of Utility Computing Services, AWS
Jim Scharf, Technical Advisor, AWS

– “a team should be small enough to be fed on 2 pizzas”
– more than 40 price reductions in 8 years

James Hamilton, VP & Distinguished Engineer, AWS

– great talk, deep and thought-provoking at the same time, as always for JH

– 1 million active clients, storage doubled in 12 months, 5x size of combined competitors
– cost of networking at 8% and growing is anti-Moore’s Law – 2 reasons: indefensible vendor profits, servers getting fatter
– just the support cost was $10+ millions, availability better with own gear because better understanding
– tested on 3 megawatt DC with 8,000 servers. rented of course.
– team metrics must reflect customer complaints, or simply wrong
– 11 regions, over 28 data center buildings
– East-West replication at 74 ms can only be async with data loss on emergency cutover and a 1-week cleanup, so nobody wants to fail over
– reason for AZs is to limit blast-radius while having sync replication under 2 ms
– SR-IOV reduces avg. latency 2x, outliers down 10x
– built 2 power plants because utilities move too slow. Slightly cheaper, but very handy to have power engineers on staff now.
– Aurora is mysql with steroidal storage engine

Thursday Keynotes

Werner Vogels, AWS

– single button deploy to another region

– lego set for building a music service
– music industry could not be made more complicated if you tried
– with what I know now, I wouldn’t touch the music industry with a barge pole. I should have written a taxi app.

The Weather Company

– most distributed cable channel in USA (, weather underground)

Pristine Heath Care

– NICU broadcast with Google Glass
– AWS and Docker
– joined AWS startup program

EC2 Container Services

– new
– demo with Redis

Ben Golub, CEO

– Docker just turned 18 months old
– AWS is respectful of Docker isms

Extreme Availability for Mission-Critical Applications
Raul Frias and Eduardo Horai, Solution Architects, AWS
Andre Fatala, CDO, Magazine Luiza

– challenges of doing HA IT in South America
– misc.

Building Mission Critical Database Applications: A Conversation with AWS Customers about Best Practices
Swami Sivasubramanian, General Manager, AWS NoSQL

Financial Times

– loves the Internet, audience is biggest ever, and more global audience
– UCS “pay as you go, without the go” :)
– 50x faster and 20x less. on Redshift. “some of the analysts thought it was broken”
– “we have machines older than you. we cut them up into bookends and give them to the IT staff.”
– easy to demo Redshift. a week to do setup, a few weeks to try it out. easier than months-long RFP process.
– MapReduce is a lot of conversion effort and retraining for staff compared to SQL
– getting HA vertically is exponentially more expensive per 9. We prefer to re-think availability around AWS.
– kafka
– microservices
– no pets: “the day that a server with 464 day uptime goes down would be a terrible day.”
– rebooting for shellshock wasn’t a concern since they reboot all the time
– SAN will just propagate block-level corruption to all apps, so not HA enough
– “data debt is the worst type of technical debt, because you have to go through the whole company to collect it”
– wish list: would like more inference and semantic database querying, few available are ready to scale now.


– lost 3 drives in disk array, had to wait 5 days for replacements. gave up on colo after that and moved to AWS
– 1 PB per month
– 100 ms deadline per request from request to price quote (pulled out of a hat, now industry standard)
– “we spend less on Dynamodb than on snacks” (operating redis in-house cost 2 FTE plus 32 machines)
– they are paying half of a published Cassandra ring doing 1 million ops
– need consistent latency below 5 ms, scalable, cheap
– initial capacity planning is a guess
– for every 10 ad companies that die per day, 100 start up
– moving to Route 53 => 4xELBs => stateless EC2 with health monitoring daemon that dies after 5 min, ships logs, terminates. ASG.
– good backup and restore makes experimenting cheaper.
– wish list: would like in-memory Redshift to be 100x faster and PB-scale.


– “we don’t have that many engineers working on Dynamo, but I can’t tell you how many”

State of the Union: Amazon Compute Services
Peter De Santis, Vice President, Amazon Compute Services, AWS
Matt Garman, Vice President, Amazon EC2, AWS

– Nov. 2011 1064 nodes Top 500 #42, only one on ethernet
– enhanced networking instances have high pps
– HGST does hard disk drive design on c3, a million permutations in 8 hours, 70,000 cores. $5,593.94
vs. in-house cluster of $15 million
– C4 instance has dedicated network paths for EBS, network, etc. now at no extra cost

Develop an Enterprise-wide Cloud Adoption Strategy
Blake Chism, Senior Consultant, AWS Professional Services
Miha Kralj, Principal Consultant, AWS Professional Services

– Professional Services helps document enterprise architecture
– for EA “it’s easier to slice the elephant into smaller pieces”
– interesting modeling slides
– download the whitepaper, first of a series of 8
– interesting blue bike outfit.


– big system – 21,000 miles of rail
– “business process, people, technology” are 3 components
– mainframe, software that needs to be re-architected, ready to move

Notre Dame

– started in EC2 classic, likes auto-scaling, moving to VPC
– committed to moving 80% of IT services to cloud in next 3 years
– Google apps user
– hosting external auth in AWS for traveling users
– AWS Professional Services monitors big games

Brendan Gregg, Netflix: Performance Tuning EC2 Instances

Posted in API Programming, Business, Cloud, Conferences, Linux, MySQL, Open Source, Oracle, Storage, Tech | Leave a comment

Kudos to WordPress for Using the Browser Cache

WordPress LogoWordPress has found a great use for the 5 MB “local storage” per origin in recent browsers:

“Connection lost. Saving has been disabled until you’re reconnected. We’re backing up this post in your browser, just in case.”

“Local storage” is also known as “DOM storage” or “browser storage” and is considered to be an HTML5 feature.

Of course, the question remains why my connection is so bad in Silicon Valley that I need that feature, but hey, at least M@ is taking care of me. :)

Posted in API Programming, Open Source, Storage, Tech | Leave a comment

The Illusion of Certainty

Most of the people I talk to are well-educated engineers in Silicon Valley, yet even they don’t understand what is true and false in our society.

Fallacy 1: “I want to see peer-reviewed results.”

Peer review just means that some people, usually experts, have read the paper, not that they have duplicated the results. In fact, they virtually never attempt to reproduce the results in the paper.

Going deeper, there’s several problems with peer review:

  1. the only person with the grant money to write the paper is the author, so there’s usually no way to even finance reproducing the results
  2. paper authors seldom include the raw data with their paper for 2 reasons: they can get more papers out of the same data set, and they can get credit for each paper without furthering competitors. Of course without the underlying data, who knows if their are mistakes or malfeasance?
  3. When experiments are duplicated, they seldom match. Recently some drug companies have attempted to reproduce important papers to create known foundations for their own programs. They virtually always fail to see the same results. (Mendeleev himself likely published fake results, as statistics shows his ratios are too good.)
  4. peer reviewers are subject to the same politics and biases as any other human endeavor
  5. In the dismal case of “medicine science”, generally all of the New England Medical Journal published results are considered to be wrong after 10 years
  6. In the case of physics, how do you verify even a single paper on string theory? :)

Fallacy 2: “It’s good/safe if it is FDA-approved.”

FDA approval just means that a new drug is better than a placebo in trials, not that it is better than existing drugs. Also, the drug may work well in one ethnic or gender group and not in another group. The fact that drugs have unintended effects when used off-label proves that the manufacturer doesn’t even know what all the effects of the drug are.

Fallacy 3: “I want data-driven results.”

Data-driven analysis is all the rage in Silicon Valley today, but few people I talk to even know what that means, or how futile it is in inventing new products.

Prerequisites would be:

  1. enough relevant data
  2. enough statistical and domain knowledge to model the data, design tests and interpret the results
  3. enough resources to do the data analysis without reducing ability to do product development or customer support. Google and Yahoo! can afford legitimate A/B testing. Most startups simply can’t.

“The best way to predict the future is to invent it.” Apple’s recent consumer products success is because of delivering and marketing products that nobody was asking for yet, not analytics.

Fallacy 4: “The economy is this or that. Unemployment is this or that.”

Economists are captive to their employers, whether government or private. Either they parrot what their employer wants, and what their “economic school” dictates, or they’re soon unemployed.

Regarding statistics on unemployment, they’re generally quietly restated a couple years after a recession. In the case of Silicon Valley, during and immediately after a recession the newspapers publish unemployment rates of 10%, then restated them as 25% to 30% two years later.

Posted in Business, Tech | Leave a comment