How to benchmark MongoDB with YCSB?


While talking about system performance characteristics, most DBaaS providers limit themselves to providing information about the hardware that their systems are provisioned on. Indeed, it is hard to talk accurately about the actual throughput/latency characteristics of a cloud based deployment given the number of variables in such a system. Virtualized environments, unpredictable workloads, network latencies, different geographies are only some of the considerations.

However it is a good idea to have a fair understanding of the actual performance of your Mongodb deployment: so that you can provision accurately based on your application needs; so that you can actually compare various DBaaS providers to ensure that you are getting the most “bang for the buck”.

This blog is a primer on running some basic performance benchmarks on your MongoDB cluster. It goes into the details of how to configure and run YCSB benchmarks tests and interpret the results. Inspiration for it came from the recent MongoDB blog about performance improvements in MongoDB 3.0

YCSB is a popular Java open-source specification and program suite developed at Yahoo! to compare relative performance of various NoSQL databases. It’s workloads are used in various comparative studies of NoSQL databases.

Setting up YCSB

This and later sections will guide you through a step by step process to setup, configure and run YCSB tests on your favorite DBaaS provider system.
In order to run workload tests, you will need a client machine, preferably in the same geographic location as your MongoDB cluster to avoid over the Internet latencies. Select a configuration that has a decent amount of juice to run multiple threads to load your Mongo cluster appropriately. The machine needs to have a recent version of Java, Maven and git installed.


  • If Java, Maven or git is not already installed on your system, install them. Refer to the documentation available for your specific OS. Ensure that you install a Maven version compatible with your Java version. Test that all dependencies are working correctly. For e.g.
$ javac -version
javac 1.8.0_25
$ mvn -version
Apache Maven 3.3.1 (cab6659f9874fa96462afef40fcf6bc033d58c1c; 2015-03-14T01:40:27+05:30)
Maven home: /usr/local/Cellar/maven/3.3.1/libexec
Java version: 1.8.0_25, vendor: Oracle Corporation
Java home: /Library/Java/JavaVirtualMachines/jdk1.8.0_25.jdk/Contents/Home/jre
Default locale: en_US, platform encoding: UTF-8
OS name: "mac os x", version: "10.10.2", arch: "x86_64", family: "mac"
$ git --version
git version 1.9.5 (Apple Git-50.3)
  • As suggested by the Github page of YCSB you could wget the tar archive of YCSB. But we recommend building it from source. Steps are documented in the MongoDB README of YCSB. This will help us enable MongoDB authentication for your cloud provider later.
git clone git://
mvn clean package

Continue reading

The fastest mongodb on Azure!

Everybody claims to be fast – but our fast is faster!  Over the past few weeks our team has been busy benchmarking our systems on Azure and the results have been fantastic.

Earlier this year before we ported our existing infrastructure from AWS to Azure, we spent a lot of time understanding the structure of the Azure cloud and optimizing for best performance. The reality is that Azure is fairly different from AWS and the performance strategy that works on one cloud probably will not work on the other. Our development team did a lot of custom work over the disk architecture that we use in our clusters – the goal was to provide the best disk performance on Azure.

Continue reading

How do you test your MongoDB application upgrades?

You have chosen MongoDB as your application database. You probably have a lot of production data in your database already. Now you need to make a major change to  your application. How do you go about testing to make sure the new version of your application behaves well with your production data?

Production data is always infinitely more varied that your test data and exercises more edge cases consequently leading to more bugs. It is not recommend to export production data into your test environment due to policy, privacy & security issues. On the other hand it is fairly difficult and expensive to identify and test bugs in production. So how do you go about ensuring that the new version of your application works well with production data?  Here is what we recommend at MongoDirector

Continue reading

MongoDB on Azure: How to choose the right instance type?

MongoDB on AWSAzure is now a popular platform to deploy and manage MongoDB servers. Once you have chosen Azure as the platform for MongoDB one of the first decisions that you need to make is to select the instance type that you need to deploy. In this matter Azure fortunately is much simpler than AWS . Azure basically offers three types of instances

1. A  series
A series offers general purpose instances that fit most workloads. They are available in various sizes ranging from 0.75 GB to 56 GB. Inside A series you are offered two options – ‘Basic’ and ‘Standard’.  The ‘Basic’ version costs less but does not offer load balancing, auto-scaling etc. From a database perspective the most important difference is that with ‘Basic’ instances your  azures disks (page blobs) are limited to to 300 IOPS/disk whereas with ‘Standard’ instances you can go upto 500 IOPS/disk. This can make a big difference especially with larger instances when you can RAID the disks. Our recommendation is to use ‘Standard’ machines whenever possible to leverage the enhanced I/O. The number of disks that can be attached to a VM depends on the size of the VM. You can go upto 16 disks for  A7 machine. More details can be found here – Virtual machine sizes for Azure.

Continue reading

Three simple steps to improve the security of your MongoDB installation

iStock_000000413656SmallMongoDB Security has been in the news this week for all the wrong reasons. All the talk has been about the 40,000 or so databases that were found exposed by a group of students based in Germany. Some of the databases even contained production data. It’s egregious on several levels – not only do you have production data on a unauthenticated database but it is also left open to the internet. The only surprising thing is that it took this long to get exposed. If you don’t want your mongodb servers to be on the news here are three simple steps to improve the security of your mongodb installation

Continue reading

High performance MongoDB clusters on Amazon EC2

Performance is an important consideration when deploying MongoDB on the EC2 platform. From a hardware perspective MongoDB performance on EC2 is gated primarily by two factors – RAM and disk speed. Typically ( there are always exceptions) CPU should not be an issue.  Memory is no longer a issue – there are plenty of size options (R3, I2, C3/C4) offering a large amount of RAM. For more details on how to choose the right instance type check my other blog post – “How to choose the right EC2 Instance type“.

Continue reading

Fast paging with MongoDB

Paging through your data is one of the most common operations with MongoDB. A typical scenario is the need to display your results in chunks in your UI. If you are batch processing your data it is also important to get your paging strategy correct so that your data processing can scale. 

Lets walk through an example to see the different ways of paging through data in MongoDB. In this example we have a CRM database of user data that we need to page through and display 10 users at a time. So in effect our page size is 10. Here is the structure of our user document


Approach 1: Using skip() and limit()

MongoDB natively supports the paging operation using the skip() and limit() commands. The skip(n) directive tells mongodb that it should skip ‘n’ results and the limit(n) directive instructs mongodb that it should limit the result length to ‘n’ results. Typically you will be using the skip() and limit() directives with your cursor  - but to illustrate the scenario we provide console commands that would achieve the same results. Also for brevity of code the limits checking code is also excluded.

//Page 1
db.users.find().limit (10)
//Page 2
//Page 3

You get the idea. In general to retrieve page n the code looks like this


However as the size of your data increases this approach has serious performance problems.  The reason is that every time the query is executed the full result set is built up, then the server has to walk from the beginning of the collection to the specified offset. As your offset increases this process gets slower and slower.  Also this process does not make efficeint use of the indexes.  So typically the ‘skip()’ and ‘limit()’ approach is useful when you have small data sets. If you are working with large data sets you need to consider other approaches.

Approach 2: Using find() and limit()

The reason the previous approach does not scale very well is the skip() command. So the goal in this section is to implement paging without using the ‘skip()’ command. For this we are going to leverage the natural order in the stored data like a time stamp or an id stored in the document. In this example we are going to use the ‘_id’ stored in each document. ‘_id’ is a mongodb ObjectID structure which is a 12 byte structure containing timestamp, machined, processid, counter etc. The overall idea is as follows
1. Retrieve the _id of the last document in the current page
2. Retrieve documents greater than this “_id” in the next page

//Page 1
//Find the id of the last document in this page
last_id = ...

//Page 2
users = db.users.find({'_id'> last_id}). limit(10);
//Update the last id with the id of the last document in this page
last_id = ...

Continue reading

Enabling two factor authentication for

Enabling Two factor authentication is an important upgrade to the security of your MongoDirector account. If your password is compromised an attacker will still be unable to gain access to your account if he/she does not have access to the authentication device initialized with the two factor secret of your account.

You can enable two factor authentication in three easy steps

1. Log in to your account at and navigate to the Settings tab -> Select the ‘Two factor’ auth tab and check “Enable Two factor auth”.

Continue reading

Geographically distributed MongoDB clusters on AWS in the EU region

Amazon recently announced the public availability of its EU central (Frankfurt) region. With this new datacenter AWS now has two datacenters in the EU region – Ireland & Frankfurt. The availability of two datacenters enables you  to improve the georedudancy of your Mongodb replicas.

Here are the steps to setting up a geo redundant mongodb cluster in the EU region on AWS

1. Cluster details

Enter the cluster details – name, version & size to get started

Cluster details to deploy mongod cluster in AWS EU regions

2. Select the region for each replica set

We place the primary in EU-West (Ireland) and the secondary in EU-Central (Frankfurt). For 100% georedundancy you need to place the arbiter in a different region. If you place the arbiter in one of the EU regions and that region goes down your mongodb cluster will not have a quorum and will hence degrade to read only mode. The arbiter is a voting node and does not hold any data. Hence irrespective of where you place the arbiter all the production data and backups are stored in the EU region.

Continue reading

The role of the DBA in NoSQL

What is the role of the DBA in the rapidly evolving world of NoSQL? A majority of the early NoSQL adoption is in the fast growing world of small and medium companies based on public clouds.  In most of these companies the DBA role does not exist and this has led a lot of people to proclaim the end of the DBA.  Is the DBA going down the road of the dinosaur? I think the answer is more nuanced than that. Firstly lets examine a few trends we are seeing in the marketplace that are going to have a great downstream impact on the technology workplace.

Continue reading