MongoDB San Francisco 2013

MongoDB Logo
I attended MongoDB San Francisco 2013 in the beautiful Palace Hotel on Thursday and Friday. #MongoDBDays

Executive Summary:

  1. MongoDB 2.4 is a feature-rich document-oriented database with essential built-in Operational features like HA and sharding that are suitable for evaluation for production. It is single-master, single-record-atomic, but writes can be made to multiple shards.
  2. 10gen, the company behind MongoDB, has built a hard-core technical and training team, and is further addressing the needs of Operations with optimizations in the 2.6 roadmap (late 2013). See their TCO and Operations Best Practices whitepapers for more details.
  3. In particular, MongoDB’s 3-node replica set topology is ideal for HA deployment in the Cloud. (The main limitations are that voting takes a few seconds, and your apps will have to reconnect after a reconfiguration.) See Figure 1.
  4. MongoDB supports sharding with automatic shard balancing for scale-out, but 90% of installations do not need the additional performance, and sharding on top of replica sets for HA is complex to monitor and maintain – 13+ daemons (services) on as many servers – a lot of moving parts. See Figure 2.

Simple Replica Set Architecture for Automated HA
Simple Replica Set Architecture for Automated HA

Sharded Replica Set Architecture
Sharded Replica Set Architecture – Lots of Moving Parts

Thursday Lunch

Nice spread! Chinese-style jumbo shrimp, chicken and beef broccoli with chocolate fortune and pecan cookies.

Thursday Afternoon Workshops

RedHat OpenShift Overview
SteveCP (Steve0) and Grant, RedHat OpenShift Evangelists

OpenShift is a PaaS (like Heroku, Engineyard)
– everything is Open Source and in git
– uses AWS East large, selinux for access control, Linux cgroups for RAM, linux quota for disk space, to overcommit 4,000x! (maybe LXE in future, like a year)
– why AWS East? to be comparable with Heroku, etc. already in East
– 3 gears free forever (512 MB RAM, 1 GB disk)
– uses git to install apps, known as “cartridges”
– good integration with Eclipse
– automatically scalable (takes 15-20 seconds to rsync your data, depending on data size)
– OpenShift is trying to avoid Google’s “beta” pricing issues (cheap/free for beta, 3x for production)
– with 2×1 GB RAM gears, can run Tomcat in 1 gear and Solr in another!
– no root access, can write in /tmp with PAM namespaces (still ephemeral) and $OPENSHIFT_DATA_DIR (EBS, mirrored plus striped likely)
– hands-on lab: “picking up lines on your resume” (you deployed a python app in the cloud and made code modifications)
– env | grep OPENSHIFT
– mongodb sharding: can’t shard right now, OpenShift is handing off mongo cartridge maintenance to 10gen for sharding and replication

MongoDB: Operations Hands On
Asya Kamsky, Senior Solutions Architect, 10gen @asya999

Talk description: “Do you need to grow a replica set? Migrate servers to different hosts? Repair a deployment after hardware failures? If so, then this workshop is for you. Attendees will work through several model operational scenarios, covering both planned and unplanned maintenance tasks, backups and recovery processes, responding to database growth requirements, and more!”

– use mongodb 2.2.x or 2.4.x or higher for replica sets or sharding testing

# build a replica set
for i in `seq 1 3`; do
   mkdir -p $MPATH/rs$i
   mongod --dbpath $MPATH/rs$1 --port 3701$1 -replSet lab &

– replica sets are nice for HA, but secondaries usually not that useful as read-slaves
– can use arbiter node, which votes but does not store data
– max 12 nodes in a replica set to reduce mesh traffic
– max 7 nodes can vote to reduce voting traffic
– local database is not replicated
– even the primary node will go read-only if network-partition occurs
– oplog is a circular buffer, idempotent (increment operations => set operations)

db.adminCommand({setParameter:1, logLevel:0});

– use snapshot or file copy to put data on another secondary. It will recognize itself vs. copy and fix itself.
– tags, like dc or purpose: etl
– nearest is shortest ping
– rs.slaveOK() on a connection if you want to query collections on a slave instead of the master
– priority
– slaveDelay
– “maybe you’re in somewhere like Amazon where the disk write is over the network anyway”
– set write concern per connection (like w2 or wmajority), like password update.

Replica Set Exercises

  1. Set up a 3 node replica set
  2. Run the command to step down a primary: rs.stepDown() and ensure that a secondary is elected the new primary. Bring the set back up to 3 members again
  3. Set a priority on a node. Terminate the primary node and practice automated failover, see what happens
  4. Add a new node (so 4 members now) in your RS and kill 2, does a primary get elected? See what happens.


3 Rules for Effective Sharding (in the unlikely event you actually need sharding)

  1. pick a good shard key, depending on write and read loads, and next scale jump
  2. don’t wait to shard too long (or you will be overloaded during automatic resharding), but don’t do it at all if you don’t need sharding because of the additional complexity
  3. configdb very important (run 3 config servers in production)

– mongos (mongo shard router)
– replica sets are orthogonal to sharding in mongo
– hash or range partitioning
– automatic resharding using 64 MB chunks
– shard key and values immutable without redesign
– shard key must be indexed, limited to 512 bytes in size, used to route queries

Miscellaneous Production Notes

– MongoDB can handle 20,000 connections, if your driver has connection pooling, even more.
– MongoDB relies on OS (filesystem, filehandles, etc.) and a lot of algorithms are quadratic down there
– for multi-tenant, many users use separate databases because security/permissions is enforced at the db level

MongoDB Docs: Deploy a Sharded Cluster


An early Caltrain hit Eric Salvatierra, a PayPal VP, near Menlo Park, so my train was delayed 90 minutes and arrived in SF at 11 am.

MongoDB Capacity Planning
Shaun Verch, Software Engineer, 10gen

Based on Asya’s Slides


– Availability?
– Throughput?
– Responsiveness? The fuzziest one!


– speed
– cores
– avoid global write locks db.doceval ?
– 10gen is trying to reduce locks (or lock more local data structure level locks than global)

– non-indexed data is a memory hog
– sorting
– cores affect the amount of commands each connection can execute


– latency
– WriteConcern
– ReadPreference
– Batching
– Documents (and Collections)

– throughput
– update/write patterns
– reads/queries


– working set
– active data in memory in blocks/pages
– measured over periods

– sorting
– aggregation
– ?

– qps vs. page faults
– qps vs. disk util


– IOPs
– Size
– data and loading patterns


– Active
– Archival
– Loading patterns


– Application Metrics


– mongotop, mongostat
– linux tools

– in 2.4
workingSet option
– iostat
– mongoperf

– Load/Users
– Response Time/TTFB

System Performance
– Peak Usage

Basic Sharding in MongoDB
Brandon Black, 10gen (Ruby Driver Maintainer)

– fairly high-level overview, since only 40 minutes

Brandon’s Presentations

Indexing and Query Optimization
Max Schireson, CEO, 10gen

Executive Summary: If you know MySQL indexes well, then MongoDB index performance tuning is conceptually identical and nearly practically identical, which is a good thing.

– “I’m the CEO, but I still do this talk because it’s important. You can make your database 1000x faster.”
– B-trees
– in compound indexes, ascending or descending order matter
– index locality, like a log timestamp, is good for single mongod, likely bad for shards, which needs randomness to avoid hotspots
– can create index in background, but it still has to scan data, affecting caches
– dropdupes, sparse, geospatial options
– 64 indexes per collection, queries usually use only 1 index, except merge
– sort/select/limit => benchmark order
– index covered queries
– anchored regexes are indexed, but not case-insensitive (use lower-cased copy of it)
– nulls are indexed unless you ask for sparse, so doing sparse index is smaller if you expect a lot of nulls

Data Processing and Aggregation Options
Asya Kamsky, Senior Solutions Architect, 10gen

Aggregation Framework

– computed fields
– final result must fit in a single document (16 MB)
– pipeline operator memory limits (10% of total system RAM)


– original way of doing processing

Hash-based Sharding in MongoDB 2.4
Brandon Black, 10gen

– the ObjectID’s leading bytes are generated from the current timestamp, causing a hotspot when used as a shard key
– MongoDB now has hash-based indexing that uses MD5 to evenly distribute keys, but then you can no longer do efficient range lookups
– it’s still better to use a natural key like (customer_id,month) to preserve locality of range queries

Brandon’s Presentations

Advanced Replication Internals
Scott Hernandez, Software Engineer, 10gen

– know the oplog

Advanced Replication Features
Dwight Merriman, Chairman/Co-Founder, 10gen

– overview of replica set topologies and use cases
– DR replica set members generally are hidden or have a priority of 0 so they are not normally chosen as a primary

Best Practices

– odd number of set members
– read from the primary except for:
– geographical distribution
– analytics
– use logical names, not IP addresses in configs
– set WriteConcern appropriately for what you’re doing
– monitor secondaries for lag (alerts in MMS)
– use w:all or w:majority when bulk loading every Nth item to “flow control”

Closing Notes – MongoDB Roadmap
Dwight Merriman, Chairman/Co-Founder, 10gen

Dwight talked about the MongoDB roadmap and increasing 3rd-party community support.

Vendor Exhibits

It’s nice to see a vibrant third-party ecosystem developing around MongoDB.

mongolab (MongoDB-as-a-Service on AWS, Azure, Joyent, Rackspace, no sharding yet)
Redhat OpenShift (PaaS)
CumuLogic (MongoDB-as-a-Service on HP Cloud)
SoftLayer dedicated and cloud hosting, featuring (MongoDB-as-a-Service) (but not hands-on mgmt, no sharding yet) They have a beta feature to create a library of os images that can be deployed to bother dedicated and cloud hosts.
Microsoft Open Technologies – Brian Benz is a Senior Technical Evangelist 425 706 5793
NetEnrich Inc. (Infrastructure Mgmt.) Narayan D. Raju 408 436 5900 x7103

10gen links: Education, Presentations, Events, Whitepapers

[mongodb-user] How does slaveDelay work in replica sets (are queries aggregated)?
Stop worrying about Time To First Byte (TTFB)

This entry was posted in API Programming, Business, Cloud, Conferences, Linux, MySQL, Open Source, Perl, San Jose Bay Area, Storage, Tech, Toys. Bookmark the permalink.

Leave a Reply

Your email address will not be published.

This site uses Akismet to reduce spam. Learn how your comment data is processed.