MongoDB on Digital Ocean

Digital Ocean is a NY based hosting provider specializing in SSD based virtual machines. A majority of our customers choose to deploy and manage databases on Amazon AWS. However running large scale write intensive databases on AWS is a fairly difficult and time-consuming operation. If you are interested in the reasons you can read the details in my blog post – “What I would like to see in EC2..” .  We have been using Digital Ocean for several months now and have learnt a lot about the system.

What do we like about Digital ocean?

1. SSD’s are amazing – It feels so right to run a database on SSD’s. The disk throughout is fairly amazing. You can clearly see the benefits when you run an index or repair a database – operations that used to take hours on Amazon EBS finish in a few minutes. If you are currently not running your database on a SSD you should definitely reconsider your decision.

2. Simplicity – The API and the UI are simple and elegant. It is very easy to achieve what you want with just a few clicks or a few lines of code. I hope they preserve the simplicity as they continue to add new features.

3. Pricing – The pricing is great. Simple low pricing. Don’t need to worry about on-demand vs reserved instances

4. Low latency from Amazon – The latency from Amazon AWS US-East and the Digital ocean NY datacenter is about 5-8 ms. This makes it possible for customers to continue to use AWS for their front and mid tier and deploy their mongodb clusters on Digital Ocean.

What we would like to see improved on Digital ocean?

1. Availability framework
We would love to see an availability framework similar to EC2 availability zones. Currently when we need to be doubly sure of uptime we run across different data centers NY1, NY2, SFO and Amsterdam. However in the long run it would be good see the ‘Availabilty zone’ construct inside a single datacenter.

2. Online snapshots
Digital ocean does not have online snapshots – you have to shut down your machine for snapshots. This makes taking backups or snapshots ridiculously hard. ‘MongoDump/MongoRestore is not really an option for large deployments. We have had to do a lot of engineering work to be able to build a reliable and quick backup solution on Digital ocean.

3.  Support for multiple disks & LVM  support
Currently you can only have one disk attached to your virtual machine. For high performance clusters we would like to distribute different portions of the database( log, db etc) on different disks. Hence we would love to see the ability to attach/detach multiple disks. LVM is also not supported currently – it would be great if it can be supported. It is very useful to take snapshots of your system.

4. Dynamic disk resizing
If you fill up your current disk you don’t really have many options today. You need to create a new bigger machine and migrate your data over.  In the long run it is vital to be able to resize your existing disk instead of needing to migrate to a new machine when you disk is full

Overall we like the system and are rooting for them to succeed! We hope to deploy and manage many more MongoDB servers on Digital ocean this year. As always if you have any other questions please contact us at

Encrypting mongodb data at rest

MongoDB encryption

MongoDB is now the defacto database for a wide variety of applications, some storing sensitive data. When you store sensitive information in your MongoDB database it is important to encrypt the contents of your data disk. This gives you an extra layer of protection if your data disks, snapshots or backups are lost or stolen. In some scenarios encryption at rest is compulsory due to compliance requirements. E.g. If an attacker gets access to your snapshots or backups, all the data is still encrypted and he/she still cannot access your raw application data. makes it extremely simple to encrypt your MongoDB data volumes at rest. In the creation wizard, when creating a new MongoDB cluster select the option to “Encrypt your disk” – thats it! Our software will then take care of all the details of encryption including setting up the volumes for encryption, setting up keys, backup, restore etc.

Encrypt mongodb on disk

Underneath the covers we use block level encryption to ensure that the entire contents of your data disk are encrypted. We feel that is the simpler,cleaner option in the long term. Here are a few other options we considered
1. File system encryption – File system encryption makes sense when you only want to encrypt a few files. In our case we encrypt the entire Mongodb data volume.
2. Application level encryption – This is not an option we would recommend. Getting cryptography right and securing keys at the application level is a non trivial task and is best left to the platform.

Backup & Restore
Once you choose to encrypt your disks, your backups are automatically encrypted as well – no further action is needed on your part. Due to the encryption the backups can now only be recovered on the specific cluster on which they were taken.

Encrypting data in motion
Encrypting your data in motion is essential when your data is traversing unsecured networks like the internet. makes it trivial to encrypt your data in motion. This is achieved by selecting the “Enable SSL” option in the creation wizard. This will enable SSL on your mongodb servers. If you would also like to bring your own custom SSL certificate please contact our support team. For more details refer to the post on Setting up SSL.

If you have more questions about the encryption setup please email us at

When to use GridFS?

GridFS is a simple file system abstraction on top of MongoDB. If you are familiar with Amazon S3, GridFS is a very similar abstraction. Now why does a document oriented database like MongoDB provide a file layer abstraction? Turns out there are some very good reasons

1. Storing user generated file content
A large number of web applications allow users to upload files. Historically when working with relational databases these user generated files get stored on the file system separate from the database. This creates a number of problems.  How to replicate the files to all the needed servers?, How to delete all the copies when the file is deleted? How to backup the files for safety and disaster recovery? GridFS solves this problem for the user by storing the files along with the database. You can leverage your database backup to backup your files. Also due to MongoDB replication a copy of your files is stored in each replica. Deleting the file is as easy as deleting an object in the DB.

2. Accessing portions of file content
When a file is uploaded to GridFS, the file is split into chunks of 256k and stored separately.  So when you need to read only a certain range of bytes of the file, only those chunks are brought into memory and not the whole file. This is extremely useful when dealing with large media content that needs to be selectively read or edited.

3. Storing documents greater than 16MB in MongoDB
By default MongoDB document size is capped at 16MB. So if you have documents that are greater than 16MB you can use store them using GridFS.

4. Overcoming file system limitations
If you are storing a large number of files you will need to consider file system limitations like the maximum number of files/directory etc. With GridFS you don’t need to worry about the file system limits. Also with GridFS and MongoDB sharding you can distribute your files across different servers without significantly increasing the operational complexity.

Underneath the covers
GridFS uses two collections to store the data

> show collections;

The fs.files collections contains metadata about the files and the fs.chunks collections stores the actual 256k chunks. If you have a sharded collection the chunks are distributed across different servers and you might get better performance than a filesystem!

> db.fs.files.findOne();
"_id" : ObjectId("530cf1bf96038f5cb6df5f39"),
"filename" : "./conn.log",
"chunkSize" : 262144,
"uploadDate" : ISODate("2014-02-25T19:40:47.321Z"),
"md5" : "6515e95f8bb161f6435b130a0e587ccd",
"length" : 1644981

MongoDB also creates a compound index on files_id and the chunk number to help quickly access the chunks

> db.fs.chunks.getIndexes();
"v" : 1,
"key" : {
"_id" : 1
"ns" : "files.fs.chunks",
"name" : "_id_"
"v" : 1,
"key" : {
"files_id" : 1,
"n" : 1
"ns" : "files.fs.chunks",
"name" : "files_id_1_n_1"

MongoDB has a built in utility called “mongofiles” to help exercise the GridFS scenarios. Please refer to your driver documentation on how to use GridFS with your driver.

#mongofiles -h  -u  -p  --db files put /conn.log
connected to:
added file: { _id: ObjectId('530cf1009710ca8fd47d7d5d'), filename: "./conn.log", chunkSize: 262144, uploadDate: new Date(1393357057021), md5: "6515e95f8bb161f6435b130a0e587ccd", length: 1644981 }

#mongofiles -h  -u  -p  --db files get /conn.log
connected to:
done write to: ./conn.log

# mongofiles -h  -u  -p  list
connected to:
/conn.log 1644981

[root@ip-10-198-25-43 tmp]# mongofiles -h  -u  -p  --db files delete /conn.log
connected to:


If you would like to serve the file data stored in MongoDB directly from your web server or file system there are serveral GridFS plugin modules available

  • GridFS-Fuse – Plugin GridFS into the filesystem
  • GridFS-Nginx - Plugin to server GridFS files directly from Nginx


  • Working Set: Serving files along with your database content can significantly churn your memory working set. If you would not like to disturb your working set it might be best to serve your files from a different mongodb server.
  • Performance: The file serving performance will be slower than natively serving the file from your webserver and filesystem. However the added management benefits might be worth the slowdown.
  • Atomic update: GridFS does not provide a way to do an atomic update of a file. If this scenario is necessary you will need to maintain multiple versions of your files and pick the right version.

Visual statistics for your mongodb server

We are happy to announce the availability of visual statistics for your mongodb database and collections. Click on the ‘stats’ button to bring the statistics for your cluster.

Document Count

Examine the number of documents stored in each collection. In one view you can quickly compare the number of documents stored in all your collections.

MongoDB hosting statistics: Document count

Collection Size

Examine the size of each of your collections and compare them to each other. You can see both the data size and the storage size.

MongoDB hosting statistics: Collection size

Index count

Examine the number of indexes created on your collections. Eliminate unused indexes to improve the efficiency of the system.

MongoDB hosting statistics: Index count

Index size

Examine the size of your indexes. For best performance you want your frequently used indexes to be completed stored in memory. Eliminate unwanted or unused indexes since it leads to wasted memory.

MongoDB hosting statistics: Index size

The raw sizes for each of your collections is also available in our stats table below the graphs. If there are other interesting statistics that you would like to see in the console please reach out to us at

Which is the best MongoDB GUI?

A good UI is an important part of the development experience. The mongo shell works great for administrative actions but when working with larger amounts of data, the UI becomes fairly important. There are a number of options when it comes to MongoDB GUI, some good and some not so good. Our customers regularly ask us which UI we recommend – below are some of the options for MongoDB UI we have considered. Our main scenarios are data visualization, presentation and editing. As always your mileage might vary depending on your scenario and preferences.

1) MongoVue – MongoVue is a desktop GUI for the windows platform. It has a simple clean UI and the basic features are free. Data can be presented in text view, tree view or table view. You can also save your find queries for later usage – we find this rather convenient. The more advanced features requires you to purchase a license. Overall the software appears stable and well maintained.

MongoDB GUI - MongoVUE

2) MongoHub is a native Mac GUI for MongoDB. It provides you an option to connect to your MongoDB server through a SSH tunnel which is fairly convenient from a security perspective. The “StatMonitor” feature give you a live display of stats a.k.a mongostat. The query interface is a little bit more limited in that it only seems to support a tree view. Also there is no way to save a find query for later. In our experience the software appears to work fairly well but it doesn’t seem like it is being maintained – so use at your own risk.

MongoHub: MongoDB GUI

3) RockMongo is a html based MongoDB GUI. The GUI is authored in PHP and is open source. The downside of the html based approach is that you need a PHP server to run this GUI. You can also choose to run the PHP server on your local box. The UI is no frill, fairly easy to use and supports all the common options of working with collections, stats etc. The find interface only presents data in a tabular/text model – so it could be an issue when you are working with multilevel documents. Also there doesn’t seem to be a lot of checkin activity – so we suspect the project is inactive.

Rockmongo:MongoDB GUI

4) RoboMongo is shell centric MongoDB GUI that supports the Windows, MacOS and linux platforms. It’s still early days for RoboMongo with the latest version being 0.8.4. It is also one of the few GUI’s that supports SSL connections to your mongodb server. There is also support for connecting through a SSH tunnel. The query interface displays data in tree view, table view and text view. You can also save your queries for later usage. One of the coolest features is that it also has support for the shell – so you can still continue to use the shell commands that you are comfortable with. There are some quirks in the UI which I think will get worked out over time. If actively maintained I think this could be the best cross platform GUI for MongoDB.

Robomongo-MongoDB GUI

So which UI do we prefer? Our developers on windows prefer to use MongoVUE but the ones on Mac prefer RoboMongo. As always if you have any questions please reach out to us at

10 tips to improve your MongoDB security

MongoDB provides a number of constructs to improve the security of your data. The security of your data in MongoDB is paramount – so it is important to leverage these constructs to reduce your surface area. Here are 10 tips you can use to improve the security of your MongoDB servers on premise and in the cloud.

1. Enable auth – Even if you have deployed your Mongodb servers in a trusted network it is good security practice to enable auth. It provides you “Defense in depth” if your network is compromised. Edit your mongod configuration file to enable auth

auth = true

2. Don’t expose your production db to the internet – Restricting physical access to your database is an important aspect of security. If it is not necessary do not expose your production database to the internet. In case of any compromise if an attacker cannot physically connect to your MongoDB server, your data is that much more secure. If you are on AWS you can place your db’s in a VPC private subnet. Read the blog post Deploying MongoDB in a VPC for more information.

3. Use firewalls – Use firewalls to restrict which other entities are allowed to connect to your mongodb server. Best practice is to only allow your application servers access to the database. If you are hosted on AWS use ‘Security groups’ to restrict access. If you are hosted on a provider that does not support firewall constructs you can easily configure it yourself using ‘iptables’. Refer to the mongodb documentation to configure iptables for your scenario.

4. Use key files to setup the replica set – Specify a shared key file to enable communication between your mongodb instances in a replica set. To enable this add the keyfile parameter to the config file as below. The contents of the file need to be the same on all the machines.

keyFile = /srv/mongodb/keyfile

5. Disable HTTP status interface
Mongodb by default provides a http interface running by default on port 28017 which provides the “home” status page. This interface is not recommended for production use and is best disabled. Use the “nohttpinterface” configuration setting to disable the http interface.

nohttpinterface = true

6. Disable the REST interface
The monogdb REST interface is not recommended for production. It does not support any authentication. It is turned off by default. If you have turned it on using the “rest” configuration option you should turn it off for production systems.

rest = false

7. Configure Bind_ip
If your system has multiple network interfaces you can use the “bind_ip” option to restrict your mongodb server to listen only on the interfaces that are relevant. By default mongodb will bind to all the interfaces

bind_ip =,

8. Enable SSL – If you don’t use SSL your data is traveling between your Mongo client and Mongo server unencrypted and is susceptible to eavesdropping, tampering and “man in the middle” attacks. This is especially important if you are connecting to your Mongodb server over unsecure networks like the internet.

9. Role based authorization – MongoDB supports role based authentication to give you fine grained control over the actions that can be performed by each user. Use role based constructs to restrict access instead of making all your users admins. Refer to the roles documentation for more details.

10. Enterprise mongodb & Kerberos
Enterprise mongodb integrates with Kerberos for authentication. Refer to the mongodb documentation for more details. Username/password systems are inherently insecure – use kerb based authentication if possible.

At MongoDirector we strive to support best practice security configurations by default for all our deployments. We enable you to use SSL and also not expose your database to the internet. If you have any other questions please email us at

MongoDB-as-a-service in your own Amazon AWS account provides a MongoDB-as-a-service experience in your own AWS account. Experience the benefits of hosted MongoDB-as-a-service solution without giving up the control of running your own MongoDB instances.


1.  Security
Don’t expose your production database to the internet. I am a firm believer that databases should not be exposed to the internet unless absolutely needed. Restricting physical access to your database provides greater “defense in depth”. Lock down access to your database using amazon security groups
2. Cost
Now that the instances are running in your account you can buy reserved instances for them. On average this should save you about 30-50% over running with other hosted providers. If you have AWS credits that you can use you can essentially eliminate your MongoDB hosting costs.
3. Amazon VPC support
Bring your own account enables you to run your MongoDB instances in an Amazon Virtual private cloud (VPC). This has all the security advantages outlined in 1 but also enables you to connect your servers to your on premise network using VPN
4. Backups in S3
Your backups get stored in your own S3 account. This provides for easy and secure storage

In the ‘Bring your own account ‘ model you get billed separately for the machines and for mongodb management. Amazon AWS bills you for the machines and will bill you for your MongoDB management.

The example below is a cost breakdown of running a Medium 2+1 replica set in your own AWS account. A Medium 2+1 replica set uses 3 instances - 2 AWS medium instances + 1 Micro (arbiter). The cost calculation includes machine cost & the EBS provisioned IOPS volume cost across all the three machines. This cost calculation also assumes reserved instances. 

Number of instances : 3
AWS Machine cost: $156 /month
MongoDB management cost: $122/month
Total cost: $278.24.

This is a cost saving of almost 30% compared to other mongodb hosting providers!

Step by Step instructions
1) Log into the console - Log into the console at Click on the machine pools tab at the top right of the console. In the Machine pool tab click on the “create” button in the action bar.

2) AWS account – The first step in the Create Machine pool wizard is to enter your API and Secret keys from your AWS account.

3 )Region –  Select the AWS region for the Machine pool.

4 ) Access policy – This is a very important selection for the security of your MongoDB instances. It controls who can access your instances. There are two possible options
a)Internet –  This exposes your mongo instances to the internet
b) Security groups – You can pick the security groups in your account that have access to your mongo instances.
For your production instances you would only give the security group containing your front end servers access to the mongo databases. You can also change this setting later after you create the machine pool.

5) Name – Enter a name for the machine pool.

6) Once the machine pool is created it can be used as a target for the deployment of new instances.

Secure your Mongo clusters with SSL

MongoDirector now supports enabling SSL for your MongoDB servers. SSL is extremely important to maintain the privacy and validity of your data over untrusted networks. If you are deploying a production database cluster on the internet SSL is definitely something you should consider.

Enabling SSL is now as easy as checking a box in the creation wizard.

MongodB with ssl

So why use SSL with mongodb?

1. Privacy – If you are connecting to your MongoDB server over unsecured networks your data is traveling unencrypted and is susceptible to eavesdropping and tampering. SSL encrypts the data so that only the two endpoints have access to the unencrypted data.
2. Authentication – Use  PKI (Private key infrastructure) to ensure that only clients with certificates from an appropriate CA can connect to the Mongodb server. This is an additional step and you can choose to not use your custom certificates or CA – you will still have the benefits of privacy due to end to end encryption.


1. Performance overhead – There is definitely some performance overhead for using SSL. While we are yet to run comprehensive test there is definite overhead due to using SSL.
2. Lack of MongoDB UI – Most of the popular MongoDB UI’s don’t support SSL out of the box. So you might need to go for the paid version or use the mongo console.

Connecting to your SSL enabled MongoDB server
If you connecting to a server with SSL enabled there are several differences in the mongo connection code. Please refer to the documentation of your driver for more details.

1. Mongo shell
The default mongo client does not support connections to a SSL enabled server – you need the SSL enabled build of mongo. You can SSH into the SSL enabled server and then use the mongo client on the server to connect. Here is the syntax to connect using  the admin user provided by MongoDirector.

mongo --ssl -u admin -p <pass> servername/admin</p>

If you would like to download the SSL enabled mongo client to your own server please contact

2. Code:
You will need to append the “ssl=true” property to your MongoDB connection string. Also certain platforms (E.g. JDK) will require you to add the public key of the SSL certificate to the trusted path before you can connect to the server. By default a self signed certificate is generated for every server. Download the certificate from /etc/ssl on the server. For more instructions on how you can ssh into the instance refer to the “VM Credentials” section in this blog post. Once you download the public key you will need to add it to your trusted keystone.

keytool -import -alias "MongoDB-cert" -file "/etc/ssl/mongodb-cert.crt" 
-keystore "/usr/java/default/jre/lib/security/cacerts"
-noprompt -storepass "changeit"

The default password for the cacerts store is “changeit”. For security reasons you should change this password to your own. Once you have added the certificate enumerate the certs in the keystone to confirm that the certificate got added

keytool -list -keystore cacerts -storepass changeit

3. Mongo UI : Robomongo
RoboMongo is one of the few mongo UI’s that support connecting with SSL. When creating a connection to your MongoDB server select the SSL option. For the certificate use the .pem file that has both the public key and the private key. This file is located at /etc/ssl on your mongodb server.
Connect to Mongodb with ssl using Robomongo

As always if you have any questions please reach out to us at

What I would like to see in Amazon EC2…

Amazon EC2 is a a fabulous cloud computing platform. A majority of the internet runs on Amazon AWS – when users refer to “cloud computing” they are implicitly talking about Amazon AWS. My company has been running and managing databases on AWS for a couple of years now and we have learnt a lot from our experiences. While AWS is an easy platform to get up and running it is extremely difficult to run large disk intensive workloads on AWS. I’m not saying it cannot be done – however the time and expertise it requires is beyond most users. Here are a few things that I would like to see in Amazon to make it easier to run databases on Amazon.

1. Non ephemeral local disks – Network based EBS is convenient for most workloads but performance is abysmal for write heavy workloads. The introduction of provisioned IOPS eases this problem a little bit. However Provisioned IOPS is fairly expensive and the costs add up especially when you are running a big cluster with 10-20 machines. As an alternative it will be great if disk heavy workloads like databases could run off the local disk. It’s not an option today because the local disks are “ephemeral”. If you stop & restart your machine, it might move to a different host and you lose your local data. This is not an acceptable risk even when there are multiple copies of data.

2. Low cost SSD – It would be great if Amazon can take a leaf from Digital Ocean’s book and introduce low cost SSD’s for its servers. Server side computing is slowly moving to SSD and in a few years SSD servers will be the defacto storage for your server workloads. Amazon does offer SSD’s today but they are fairly expensive and not an option for most workoads. Also the SSD offering has the same “ephemeral” problem as local disks.

3. Cross region security groups – Geo distributed clusters are a reality of our times. A number of customers need to deploy servers across regions for multiple reasons ranging from availability to partitioning. The only way to secure these deployments today is by using an IP whitelist which is extremely difficult to maintain. Cross region security groups will greatly alleviate the burden for customers deploying across multiple regions. Amazon today has very little functionality that works across regions. Recently they introduced the ability to copy templates across regions which is very useful. I hope they continue to add more features that are cross-region.

4. Synchronized snapshots across multiple volumes- In some of our larger database clusters we need to backup multiple servers simultaneously. E.g. In a sharded mongodb cluster you need to backup a consistent copy of all the shards. While there are techniques to do this today they are all fairly hairy and vulnerable to failure. A ideal way to backup these servers is to kick off a synchronized snapshot across several volumes. This will ensure a consistent snapshot across all the volumes.

5. Better VPC management – I personally don’t like the idea of exposing production databases to internet. Hence I am a big fan of Virtual private cloud (VPC). The technology is great but the management interface is fairly tedious. VPC and classic EC2 are very similar till they are not. You end up switching back and forth between the EC2 console and the VPC console. Once you are managing 10+ servers the current management paradigm places a lot of burden on the user. I think there is room to simplify the concepts and make it easier to manage.

As always if you have questions please feel free to reach out to us

Geo distributed MongoDB replica sets for 100% uptime

Availability of databases is one of the most important aspects of application architecture. Datacenter dowtime is a given, it is going to happen to everybody. Even the best run datacenters are going to go down completely every now and then. E.g. The Amazon outages of 8/26 and 9/13. The important question to ask is if this is acceptable for your application? Most applications can tolerate some downtime every now and then. However certain applications require close to 100% uptime and the database architecture of these applications require a more deliberate design approach. Latencies between the datacenters tend to fairly large – so careful thought has to be put into the design.


1) Database should be up and writable even if a complete datacenter goes down
2) Database failover should be automatic in case of server/datacenter failure
3) A single server failure should not cause the primary to switch to a different datacenter


In order to satisfy our goals we came up with a three datacenter design using 4+1 replica set

1) Datacenter 1: Primary (Priority 10), Secondary 0 (Primary 9)
2) Datacenter 2: Secondary 1 (Priority 8), Secondary 2(Priority 7)
3) Datacenter 3: Arbiter

We place 2 full members in each of the first two datacenters and an arbiter in the third datacenter. We also configured the priority for each server so that we can control which member becomes a primary in case of server failure.

100 uptime architecture for MongoDB

Geographically distributed MongoDB

There are a couple of downsides to this geodistributed architecture

1. If you have a write heavy application, the secondaries in a different datacenter will always lag behind due to the larger latency. If some data is crucial you might want to use a write concern of “Majority” to make sure that all the nodes commit the data

2. The MongoDB community builds do not have SSL enabled. You might want to make a build with SSL enabled or use the Enterprise version of Mongo so that data flowing across regions is encrypted

Amazon AWS / EC2 availability

If you are deploying on Amazon AWS each datacenter in this picture corresponds to an Amazon region and not to an availability zone. Amazon does not provide availability guarantees in a single availability zone, SLA’s are for the entire region. If you deploy across availability zones your SLA is 99.95% which is still a great SLA – however if an entire region goes down your database will go down. Also certain AWS regions have only two availability zones, so special attention has to be given to place the third node in a different region so that a single region downtime does not bring the entire database down.

Lower cost availability across geographies

A simpler version of the same architecture uses only three servers. It places only one replica in each data center. The downside of this approach is that a single server failure will cause the primary to move across datacenters. However this architecture costs lesser than the first architecture. Depending on your scenario it might work for you.

100% uptime MongODB with multiple Datacenters

There are many ways of achieving high uptime with Mongo and this is the way that works for our needs. If you have other interesting architectures please email us at We would love to hear your thoughts!