Archive for the ‘BSD’ Category

Some ZFS News

Friday, August 27th, 2010

Phoronix has a really well-written article on ZFS, including news on a company planning to release a CDDL-licensed linux kernel module.

ZFS is the holy grail of filesystems. Many Database Administrators have switched from Linux to Solaris because ZFS has much better snapshot support than LLVM, as well as good SSD support.

phoronix.com: Native ZFS Is Coming To Linux Next Month (Aug. 27, 2010)
phoronix.com: Btrfs, EXT4 & ZFS On A Solid-State Drive (Aug. 9, 2010)
phoronix.com: Benchmarking ZFS On FreeBSD vs. EXT4 & Btrfs On Linux (July 27, 2010)
phoronix.com: Running ZFS With CAM-based ATA On FreeBSD 8.1 (July 26, 2010)
github: Native ZFS for Linux
FreeBSD Wiki: ZFS

O’Reilly Open Source Conference 2010, Portland

Friday, July 23rd, 2010

Once again, the O’Reilly Open Source Conference (OSCON) was held in Portland, Oregon.

It was a good conference, and we had beautiful weather all week long.

Executive Summary

The themes promoted by the conference organizers were Cloud Computing, NoSQL, Emerging Languages (Scala, Erlang, Parrot, Go) and Android phone development.

The @oscon twitter channel was heavily used to coordinate amongst organizers and attendees. I used the TwiXtreme twitter client program on my BlackBerry.

Plug Computers were very popular in the Expo area. They are 5 watt ARM-based computers running Debian Linux that fit into a power brick-sized case and cost $99 to $129 depending on features. The Marvell booth had a few models on display, from GlobalScale (GuruPlug) and Ionics. High-end models have dual gigabit NICs, multiple USB ports, a WiFi access point and other expansion ports.

There was also continuing buzz regarding Facebook’s Flashcache SSD module (GPL v2) for linux, and also ZFS snapshots.

Tutorials

I went to the Gearman Cookbook tutorial, the first half of the Chef tutorial and some of the Cloud Summit talks.

The Gearman Cookbook tutorial was excellent. After a detailed overview of the Gearman architecture and implementations in Perl and C, a number of use cases were explored in detail, including before and after code samples. The talk was both easy to listen to as an overall survey, as well as providing immediately useful info for those wanting to deploy it.

The Chef tutorial was very detailed – too much so perhaps. I went to the first half only, since I am not planning to implement Chef soon (I use PXE and anaconda/kickstart with CentOS), and did not need that level of detail at this time. cfengine, puppet and chef are ops tools for configuring servers. Chef uses Ruby data structures for its configuration files, and has include files and other useful syntax. Basically, users can “code” server configuration, as if they were traditional apps.

I went to some of the Cloud Summit talks and BOFs, but found that anybody who has done a simple project using EC2 knew as much or more than the speakers, some I would call blowhards.

Marten Mickos, president of Eucalyptus, is refreshing in that he is always clear about being in it for the money, while also promoting Open Source.

Sessions

Some of the most memorable sessions to me were:

Introduction to MongoDB, Kristina Chodorow (MongoDB)

Kristina is the maintainer of the Perl and PHP drivers for MongoDB. She gave an overview of MongoDB, a NoSQL document store, and its command-line interface, which uses JavaScript.

Some day she will release a sharding tool for MongoDB.

Scaling SourceForge with MongoDB, Nosh Petigara (10gen), Rick Copeland (SourceForge.net / GeekNet)

Nosh and Rick gave an excellent review of incorporating MongoDB into the SourceForge site.

- SF query load is mostly read-only
- ops team benchmarked a few NoSQL candidates, and MongoDB won on performance
- original MySQL servers had 64 GB RAM. After migration to MongoDB, same server machines but only 8 GB RAM
- backup dumps are verified to be bitwise the same as masters
- have to be careful not to dump all documents in your database to the network or it will max out switches
- SF relies on first-class data centers and replication slaves, less worried about MongoDB mmap (not crash-safe)
- I personally looked at their performance numbers and site graphs (on an iPad), and the end result was impressive.

Perl Lightning Talks

As always, the Perl Lightning Talks are a highpoint of the conference.

The “cartoon” of Vincent Pit’s remarkable CPAN module(VPIT) contributions was both informative and hilarious. Vincent is a French Ph.D. candidate in advanced geometry.

Cloud BOF (3 Hours)

The Cloud BOF was disorganized, starting 30 minutes late and for some reason was subdivided into 4 audience groups. Startups and vendors trying to make a cloud sales push led the BOF, including cloud and DNS service providers.

The Health Regulations subgroup came up with a couple ways to make the Cloud palatable to regulators by using encryption on all data due to the multi-tenancy issues with sharing public VMs.

I was in the NoSQL group, which discussed general issues and particular successes. Memcached was the clearest winner, while some people also had success with MongoDB and Redis.

My neighbor was an engineer at Postrank.com. He said that they were happy with HAProxy, but much less happy with the unpredictable IO available when running MySQL on EC2. He also said to carefully look at storage volumes available to your instance, as one is a useful tmpfs. They use AuthSMTP to get around EC2 being generally blacklisted for outbound email.

Database BOFs

MySQL BOF

The MySQL AB engineering staff has left Oracle. Monty Program AB (21 staff) has the core developers, and Percona Inc. (32 staff) has the consultants. Oracle still has some of the InnoDB programmers.

The business plan for Monty Program AB is 60% commercially-sponsored MySQL development, and 40% community-request development. Monty would like commercial users of MySQL to sponsor patches that would benefit them.

Mark mentioned that using Nehalem instructions for CRC were much faster, and that Facebook was using partitions for truncating tables instead of doing multi-record deletes. (See his blog for more details.)

One person mentioned using a commercial backup tool, R1Soft, that inserts a linux kernel module to allow filesystem snapshots. He said to carefully test backup and restore in your environment, especially for filesystems greater than 1 TB which may exceed certain block counter limits. Peter said that some of his clients had used it with varying success.

It worked for him in his environment, and the file browser allows selective file restore (he uses it to restore by priority where a system runs multiple applications.) It starts at $299 for the Standard Edition, and also has MySQL Add-on and Enterprise Editions.

PostgreSQL BOF

The PostgreSQL BOF talked about 30 or so changes that went into version 9.

One of the most exciting new features is a native replication feature, called streaming replication (block-based.) The advantage over Slony-I replication is that Slony-I is trigger-based, so has a variety of issues included inability to replicate DDL commands.

Some of the developers mimed replication events, which was rather amusing to watch. Yes, it was taped.

PostgreSQL is released under the PostgreSQL Licence, which is BSDish.

Peter Zaitsev, co-founder of Percona, organized 3 BOFs, including XtraDB, XtraBackup, Maatkit, Percona Server, Sphinx Search and Running Databases on Flash Storage.

Sphinx Search BOF

Andrew Aksyonoff, the original programmer of Sphinx Search (GPL v2), couldn’t make it to OSCON (the good excuse was that he was busy coding), so Richard Kelm (Sphinx sales/customer support honcho) and Peter filled in (Percona is a business partner with Sphinx, and many of Percona’s clients use it.)

Some of the attendees were existing users, like myself, and some from HP and other companies were looking for a large-scale search solution or alternative to Lucene.

Monty mentioned that the latest MySQL 5.1 should be used, as there have been a number of performance and reliability improvements. Full-text search is supposed to be 10x faster than 5.0, and replication is nearly bug-free by now.

Sphinx Search now has real-time index updates in version 1.1.0 beta. Another very nice feature is SQL+FS indexing.

Here is the full Sphinx 1.1.0 changelog.

Running Databases on Flash Storage BOF

The Running Databases on Flash Storage BOF had a combination of MySQL and Postgres users who have tested or used most of the SSD products: FusionIO, violin, Intel, OCZ, etc. Everybody was happy with SSD IOPS performance, but less so with cost and metadata RAM requirements with the add-in boards (FusionIO may require 4 GB RAM for metadata.)

Peter said that 20% to 30% of his clients are already using SSD – across the spectrum of vendors and models. Some are also trying “massive RAM” solutions, like Cisco servers with 384 GB RAM.

Some users had 1+ TB Postgres databases with very thorny backup and mgmt. issues. One solution was to start a snapshot, but not do the copy operation.

Expo Notes

I had an enjoyable talk with Austin Hook, who has operated the OpenBSD Store for many years. He lives near Calgary, the center of OpenBSD/OpenSSH/PF development. He mentioned that some perennial financial contributors had stopped because of the recession, so here’s the donations link.

I also talked to some reps from a Brazilian outsourcing firm, ActMinds. They currently have 400 employees across Brazil and a sales office in Philadelphia. Brazil is only 2 hours ahead of EST. They said the minimum project size is 2 developers and developer turnover a low 5%/annum. Their pricing is $35 to $45/hour.

And I had fun handling the plug computers on display at the Marvell booth. The Ionics boards are amazingly densely populated.

Discussions

I had the opportunity to talk to a long-time Portland resident who works as a computer consultant. He said that the Portland economy is not doing great, and really hasn’t done well since old-growth logging was stopped after 90% of the forests were cleared. And although hundreds of miles of fiber optic has been laid downtown, it’s not available for residential use. However, the Beaverton area does have ubiquitous FTTH.

I also talked to somebody who attended the Emerging Languages talks. He’s working on his M.Sc. in Computer Science, so found those talks fascinating.

Twitter Humor

There were some humorous tweets:

- “my MongoDB and CouchDB mugs are fighting each other.”
- “I got one MongoDB mug, but need two to safely store coffee.”

Notes

Note to self: skip the nightly parties unless you have a date. The bars are too loud to talk to anybody.

Note to the O’Reilly conference organizers: use meetup.com for the BOFs like ApacheCon does. The average audience was about 10 people, and with meetup it would be 4x that.

OSCON 2010 Slides
Tim Bray: Desperate Perl Hacker
Youtube: OSCON 2010 videos
blip.tv: OSCON2010 videos
wikipedia: Plug Computer
Jeremy Zawodny: MongoDB Early Impressions

SVLUG Meeting: Not Your Father’s Assembly Language with Randall Hyde

Wednesday, July 7th, 2010

At Silicon Valley Linux Users Group tonite, Randall Hyde talked a bout a more modern implementation of assembly language, HLA – the High Level Assembler.

He talked about his career as a programmer, college lecturer at UC Riverside, computer book author and developer of nuclear reactor control software.

It was interesting to hear first-hand that CS students during the dot com boom actually did enroll “just for the money”, regardless of interest in science or ability.

Originally his book on HAL was a download-only book, but No Starch Press was looking for content and actually contacted him for permission to publish it. It proved to be a popular book and another version is planned.

He said it takes about 2 years to learn the domain-specific knowledge about nuclear reactors, plus whatever time it takes to learn the programming languages or tools used for the project.

Using a debugger on nuclear reactor control software results in a scram, so planning ahead is a good idea.

Thanks again to Symantec for hosting the meeting.

ClusterIt dtop Command Ported to Linux

Tuesday, May 25th, 2010

The light-weight ClusterIt toolkit mostly worked on linux, but dtop (distributed top) still expected BSD-style top syntax.

Here’s a diff I wrote to make dtop work on recent versions of Linux (tested on CentOS 5.5 x86_64):

$ diff dtop.org.c dtop.c

311a312,322
> 	char buf2[30];
>
> 	if (strstr(c, "Swap:") != NULL) {
> 		sscanf(c, "Swap: %30s total, %*s used, %30s free", buf, buf2);
> 		nd->swap = dehumanize_number(buf);
> 		nd->swapfree = dehumanize_number(buf2);
> 		nd->inactmem = nd->wiredmem = nd->execmem = 0;
> 		return;
> 	}
>
>
344a356
> #if ! defined(__linux__)
364a377
> #endif
470a484,486
> #if defined(__linux__)
> 		case 11:
> #else
471a488
> #endif
517a535,539
> #if defined(__linux__)
> 		} else if (strstr(c, "Tasks:") != NULL) {
>                         sscanf(c, "Tasks: %d ",&nodedata[nn].procs);
> #else
519a542
> #endif

The output of dtop on linux looks like this:

HOSTNAME  PROCS  LOAD1  LOAD5 LOAD15 ACTIVE  INACT   FILE   FREE SWPFRE SWUSED
  g00-int     64   0.11   0.04   0.01	   0	  0	 0  7345M  2047M  0.00%
  g01-int     64   0.00   0.01   0.00	   0	  0	 0  7033M  2047M  0.00%
  g02-int     61   0.08   0.02   0.01	   0	  0	 0  6980M  2047M  0.00%
  g03-int     64   0.16   0.06   0.01	   0	  0	 0  7011M  2047M  0.00%
  g04-int     64   0.04   0.04   0.01	   0	  0	 0  6996M  2047M  0.00%
  g05-int     61   0.02   0.01   0.00	   0	  0	 0  7424M  2047M  0.00%

Here is the final, hardened version of dtop.c that uses secure C programming techniques (strn API and double-free safe.)

My long-term preference would be to rewrite dtop in Perl since parsing text input in old-school C is brittle.

Also, dtop should be able to handle top results from heterogeneous systems, and the linux ifdefs contribute to preventing that.

And here are some of the debugging commands I used:

ulimit -S -c unlimited > /dev/null 2>&1
valgrind -v --leak-check=full --show-reachable=yes --track-origins=yes ./dtop
gdb ./dtop core

Dan Saks: Why size_t matters
Karpov: About size_t and ptrdiff_t

fsync Links

Saturday, May 8th, 2010

This is a placeholder post for links about fsync on linux and Perl.

tchrist: some good news && bad news on fsync
Don’t fear the fsync!
Delayed allocation and the zero-length file problem
libeatmydata
Firefox 3 & ‘fsync’ issue
Brad’s diskchecker.pl

Apple Xsan Links

Thursday, April 29th, 2010

Apple Xsan is a 64-bit cluster file system used with Fiber Channel Xserve RAID (discontinued) or Promise VTrak E-Class disk enclosures.

(When Apple discontinued the Xserve RAID they were cool enough to “unlock” the firmware to use any OEM disk drive instead of only Apple-branded drives, making upgrading and supporting Xserve RAIDs easier and less expensive.)

Xsan has been available since 2005, and is now at version 2.2.

The software costs $999/node (workstations plus 1 metadata (MD) node) plus optional AppleCare or AMP support. In addition, each node needs a Fiber Channel card, available in the Apple Store for $600 or $1000.

Additionally, you would need hardware consisting of 2 or more computer workstations with FC adapters, one FC switch for SAN traffic, one dedicated gigabit switch for metadata traffic, MD node, and reliable DNS or AD.

Video-editing companies have had a lot of success using Xsan to get the file-streaming bandwidth needed, typically approaching 1 Gbps for 3 editing workstations, to work smoothly with centralized video data.

Xsan is the cheapest non-DIY SAN solution that I can think of. (A DIY solution with Datacore SANMelody for Windows or OpenSAN for Linux software are DIY alternatives.)

List pricing for the Promise RAID enclosures are comparable to a Dell MD1000 or MD3000 DAS – the least expensive brand-name external mass storage that you can buy.

Xsan does not support the Mac Mini because it has no fiber channel ports, no space to add a card, and Mac Minis are supported as 32-bit servers at this time.

AFP548 XSAN Deployment Advice
Apple Store: XSAN
Promise 32TB VTrak E-Class 16x SATA RAID Subsystem

Xsanity
“The only 2TB 7200rpm 32MB cache enterprise drive out at the moment is the Hitachi Ultrastar HUA722020ALA330.”

Xserve, Xserve RAID: Apple Drive Module (ADM) compatibility
support.apple.com: Promise VTrak: Configuring for optimal performance

SSH Configuration Tips

Wednesday, March 10th, 2010

I came across a useful blog post with 20 SSH configuration tips.

I’ll have more to say later about why the tips are useful, but the title of “Top 20 OpenSSH Server Best Security Practices” is not really accurate.

Top 20 OpenSSH Server Best Security Practices

HAProxy Comments

Saturday, February 27th, 2010

Just trying out HAProxy in a new data center for http load balancing.

I’m not expecting a lot of site traffic initially, but using a load balancer from Day One lets you get all the data center servers assigned, and allows sysadmins to do maintenance whenever convenient.

I was looking around at similar Open Source software, and what caught my attention about HAProxy is that Willy “obsessed with reliability” Tarreau is the author.

HAProxy has several nice features, including speed (fast enough for 10 GB connections at up to 132,000 connections per second), and epoll, cookie, multicore, chroot support and much more.

There are ports available for most Unix systems, including linux, FreeBSD and Solaris.

Here is the build script I wrote for a Dell 1950 (after installing libpcre):

#!/bin/bash

make clean
make TARGET=linux26 USE_PCRE=1 ARCH=x86_64
# no make test
make install

You can do a graceful restart of HAProxy running on multiple cores by adding this to your startup script (the tr is needed to handle when nbproc > 1):

graceful() {
  /usr/local/sbin/haproxy -c -q -f /etc/haproxy.cfg
  if [ $? -ne 0 ]; then
    echo "Errors found in configuration file, check it with 'haproxy check'."
    return 1
  fi
  /usr/local/sbin/haproxy -V -f /etc/haproxy.cfg -p /var/run/haproxy.pid -sf
`tr '\n' ' ' < /var/run/haproxy.pid`
}

I talked to somebody using it on EC2, so HAProxy is 'cloud-ready.'

HAProxy Documentation
wht: HAproxy - Quick and Dirty HTTP Load balancing Tutorial on Redhat/Centos
Session Based Load Balancing with HAproxy
tito: Zero-Downtime Restarts with HAProxy
Building an easy and scalable load-balanced high-availability web-hosting solution. Part One : The front.
How To Tell Apache To Not Log Certain Requests In Its Access Log
Pricing for Zeus software on Amazon EC2
microsoft.com: Network Load Balancing Technical Overview
loadbalancer.org: FAQ
Tenereillo.com: Why DNS Based Global Server Load Balancing (GSLB) Doesn't Work (2005)
davew: Thoughts on Global Server Load Balancing
ksalchow: Shame on GSLB? Shame on Me?
Vegan Load Balancing Mailing List
HATop: Interactive ncurses client for HAProxy