- MongoDB 2.4 is a feature-rich document-oriented database with essential built-in Operational features like HA and sharding that are suitable for evaluation for production. It is single-master, single-record-atomic, but writes can be made to multiple shards.
- 10gen, the company behind MongoDB, has built a hard-core technical and training team, and is further addressing the needs of Operations with optimizations in the 2.6 roadmap (late 2013). See their TCO and Operations Best Practices whitepapers for more details.
- In particular, MongoDB’s 3-node replica set topology is ideal for HA deployment in the Cloud. (The main limitations are that voting takes a few seconds, and your apps will have to reconnect after a reconfiguration.) See Figure 1.
- MongoDB supports sharding with automatic shard balancing for scale-out, but 90% of installations do not need the additional performance, and sharding on top of replica sets for HA is complex to monitor and maintain – 13+ daemons (services) on as many servers – a lot of moving parts. See Figure 2.
Simple Replica Set Architecture for Automated HA
Sharded Replica Set Architecture – Lots of Moving Parts
Nice spread! Chinese-style jumbo shrimp, chicken and beef broccoli with chocolate fortune and pecan cookies.
RedHat OpenShift Overview
SteveCP (Steve0) and Grant, RedHat OpenShift Evangelists
– OpenShift is a PaaS (like Heroku, Engineyard)
– everything is Open Source and in git
– uses AWS East large, selinux for access control, Linux cgroups for RAM, linux quota for disk space, to overcommit 4,000x! (maybe LXE in future, like a year)
– why AWS East? to be comparable with Heroku, etc. already in East
– 3 gears free forever (512 MB RAM, 1 GB disk)
– uses git to install apps, known as “cartridges”
– good integration with Eclipse
– automatically scalable (takes 15-20 seconds to rsync your data, depending on data size)
– OpenShift is trying to avoid Google’s “beta” pricing issues (cheap/free for beta, 3x for production)
– with 2×1 GB RAM gears, can run Tomcat in 1 gear and Solr in another!
– no root access, can write in /tmp with PAM namespaces (still ephemeral) and $OPENSHIFT_DATA_DIR (EBS, mirrored plus striped likely)
– hands-on lab: “picking up lines on your resume” (you deployed a python app in the cloud and made code modifications)
– env | grep OPENSHIFT
– mongodb sharding: can’t shard right now, OpenShift is handing off mongo cartridge maintenance to 10gen for sharding and replication
MongoDB: Operations Hands On
Asya Kamsky, Senior Solutions Architect, 10gen @asya999
Talk description: “Do you need to grow a replica set? Migrate servers to different hosts? Repair a deployment after hardware failures? If so, then this workshop is for you. Attendees will work through several model operational scenarios, covering both planned and unplanned maintenance tasks, backups and recovery processes, responding to database growth requirements, and more!”
– use mongodb 2.2.x or 2.4.x or higher for replica sets or sharding testing
# build a replica set MPATH=/var/lib/mongo for i in `seq 1 3`; do mkdir -p $MPATH/rs$i mongod --dbpath $MPATH/rs$1 --port 3701$1 -replSet lab & done
– replica sets are nice for HA, but secondaries usually not that useful as read-slaves
– can use arbiter node, which votes but does not store data
– max 12 nodes in a replica set to reduce mesh traffic
– max 7 nodes can vote to reduce voting traffic
– local database is not replicated
– even the primary node will go read-only if network-partition occurs
– oplog is a circular buffer, idempotent (increment operations => set operations)
– use snapshot or file copy to put data on another secondary. It will recognize itself vs. copy and fix itself.
– tags, like dc or purpose: etl
– nearest is shortest ping
– rs.slaveOK() on a connection if you want to query collections on a slave instead of the master
– “maybe you’re in somewhere like Amazon where the disk write is over the network anyway”
– set write concern per connection (like w2 or wmajority), like password update.
Replica Set Exercises
- Set up a 3 node replica set
- Run the command to step down a primary: rs.stepDown() and ensure that a secondary is elected the new primary. Bring the set back up to 3 members again
- Set a priority on a node. Terminate the primary node and practice automated failover, see what happens
- Add a new node (so 4 members now) in your RS and kill 2, does a primary get elected? See what happens.
3 Rules for Effective Sharding (in the unlikely event you actually need sharding)
- pick a good shard key, depending on write and read loads, and next scale jump
- don’t wait to shard too long (or you will be overloaded during automatic resharding), but don’t do it at all if you don’t need sharding because of the additional complexity
- configdb very important (run 3 config servers in production)
– mongos (mongo shard router)
– replica sets are orthogonal to sharding in mongo
– hash or range partitioning
– automatic resharding using 64 MB chunks
– shard key and values immutable without redesign
– shard key must be indexed, limited to 512 bytes in size, used to route queries
Miscellaneous Production Notes
– MongoDB can handle 20,000 connections, if your driver has connection pooling, even more.
– MongoDB relies on OS (filesystem, filehandles, etc.) and a lot of algorithms are quadratic down there
– for multi-tenant, many users use separate databases because security/permissions is enforced at the db level
An early Caltrain hit Eric Salvatierra, a PayPal VP, near Menlo Park, so my train was delayed 90 minutes and arrived in SF at 11 am.
MongoDB Capacity Planning
Shaun Verch, Software Engineer, 10gen
Based on Asya’s Slides
– Responsiveness? The fuzziest one!
– avoid global write locks db.doceval ?
– 10gen is trying to reduce locks (or lock more local data structure level locks than global)
– non-indexed data is a memory hog
– cores affect the amount of commands each connection can execute
– Documents (and Collections)
– update/write patterns
– working set
– active data in memory in blocks/pages
– measured over periods
– qps vs. page faults
– qps vs. disk util
– data and loading patterns
– Loading patterns
– Application Metrics
– mongotop, mongostat
– linux tools
– in 2.4
– Response Time/TTFB
– Peak Usage
Basic Sharding in MongoDB
Brandon Black, 10gen (Ruby Driver Maintainer)
– fairly high-level overview, since only 40 minutes
Indexing and Query Optimization
Max Schireson, CEO, 10gen
Executive Summary: If you know MySQL indexes well, then MongoDB index performance tuning is conceptually identical and nearly practically identical, which is a good thing.
– “I’m the CEO, but I still do this talk because it’s important. You can make your database 1000x faster.”
– in compound indexes, ascending or descending order matter
– index locality, like a log timestamp, is good for single mongod, likely bad for shards, which needs randomness to avoid hotspots
– can create index in background, but it still has to scan data, affecting caches
– dropdupes, sparse, geospatial options
– 64 indexes per collection, queries usually use only 1 index, except merge
– sort/select/limit => benchmark order
– index covered queries
– anchored regexes are indexed, but not case-insensitive (use lower-cased copy of it)
– nulls are indexed unless you ask for sparse, so doing sparse index is smaller if you expect a lot of nulls
Data Processing and Aggregation Options
Asya Kamsky, Senior Solutions Architect, 10gen
– computed fields
– final result must fit in a single document (16 MB)
– pipeline operator memory limits (10% of total system RAM)
– original way of doing processing
Hash-based Sharding in MongoDB 2.4
Brandon Black, 10gen
– the ObjectID’s leading bytes are generated from the current timestamp, causing a hotspot when used as a shard key
– MongoDB now has hash-based indexing that uses MD5 to evenly distribute keys, but then you can no longer do efficient range lookups
– it’s still better to use a natural key like (customer_id,month) to preserve locality of range queries
Advanced Replication Internals
Scott Hernandez, Software Engineer, 10gen
– know the oplog
Advanced Replication Features
Dwight Merriman, Chairman/Co-Founder, 10gen
– overview of replica set topologies and use cases
– DR replica set members generally are hidden or have a priority of 0 so they are not normally chosen as a primary
– odd number of set members
– read from the primary except for:
– geographical distribution
– use logical names, not IP addresses in configs
– set WriteConcern appropriately for what you’re doing
– monitor secondaries for lag (alerts in MMS)
– use w:all or w:majority when bulk loading every Nth item to “flow control”
Closing Notes – MongoDB Roadmap
Dwight Merriman, Chairman/Co-Founder, 10gen
Dwight talked about the MongoDB roadmap and increasing 3rd-party community support.
It’s nice to see a vibrant third-party ecosystem developing around MongoDB.
mongolab (MongoDB-as-a-Service on AWS, Azure, Joyent, Rackspace, no sharding yet)
Redhat OpenShift (PaaS)
CumuLogic (MongoDB-as-a-Service on HP Cloud)
SoftLayer dedicated and cloud hosting, featuring (MongoDB-as-a-Service) (but not hands-on mgmt, no sharding yet) They have a beta feature to create a library of os images that can be deployed to bother dedicated and cloud hosts.
Microsoft Open Technologies – Brian Benz is a Senior Technical Evangelist msopentech.com 425 706 5793
NetEnrich Inc. (Infrastructure Mgmt.) Narayan D. Raju 408 436 5900 x7103