Fast paging with MongoDB

Paging through your data is one of the most common operations with MongoDB. A typical scenario is the need to display your results in chunks in your UI. If you are batch processing your data it is also important to get your paging strategy correct so that your data processing can scale. 

Lets walk through an example to see the different ways of paging through data in MongoDB. In this example we have a CRM database of user data that we need to page through and display 10 users at a time. So in effect our page size is 10. Here is the structure of our user document

{
    _id,
    name,
    company,
    state
}

Approach 1: Using skip() and limit()

MongoDB natively supports the paging operation using the skip() and limit() commands. The skip(n) directive tells mongodb that it should skip ‘n’ results and the limit(n) directive instructs mongodb that it should limit the result length to ‘n’ results. Typically you will be using the skip() and limit() directives with your cursor  - but to illustrate the scenario we provide console commands that would achieve the same results. Also for brevity of code the limits checking code is also excluded.

//Page 1
db.users.find().limit (10)
//Page 2
db.users.find().skip(10).limit(10)
//Page 3
db.users.find().skip(20).limit(10)
........

You get the idea. In general to retrieve page n the code looks like this

db.users.find().skip(pagesize*(n-1)).limit(pagesize)

However as the size of your data increases this approach has serious performance problems.  The reason is that every time the query is executed the full result set is built up, then the server has to walk from the beginning of the collection to the specified offset. As your offset increases this process gets slower and slower.  Also this process does not make efficeint use of the indexes.  So typically the ‘skip()’ and ‘limit()’ approach is useful when you have small data sets. If you are working with large data sets you need to consider other approaches.

Approach 2: Using find() and limit()

The reason the previous approach does not scale very well is the skip() command. So the goal in this section is to implement paging without using the ‘skip()’ command. For this we are going to leverage the natural order in the stored data like a time stamp or an id stored in the document. In this example we are going to use the ‘_id’ stored in each document. ‘_id’ is a mongodb ObjectID structure which is a 12 byte structure containing timestamp, machined, processid, counter etc. The overall idea is as follows
1. Retrieve the _id of the last document in the current page
2. Retrieve documents greater than this “_id” in the next page

//Page 1
db.users.find().limit(pageSize);
//Find the id of the last document in this page
last_id = ...

//Page 2
users = db.users.find({'_id'> last_id}). limit(10);
//Update the last id with the id of the last document in this page
last_id = ...

Continue reading

Enabling two factor authentication for MongoDirector.com

Enabling Two factor authentication is an important upgrade to the security of your MongoDirector account. If your password is compromised an attacker will still be unable to gain access to your account if he/she does not have access to the authentication device initialized with the two factor secret of your account.

You can enable two factor authentication in three easy steps

1. Log in to your account at https://console.mongodirector.com and navigate to the Settings tab -> Select the ‘Two factor’ auth tab and check “Enable Two factor auth”.

Continue reading

Geographically distributed MongoDB clusters on AWS in the EU region

Amazon recently announced the public availability of its EU central (Frankfurt) region. With this new datacenter AWS now has two datacenters in the EU region – Ireland & Frankfurt. The availability of two datacenters enables you  to improve the georedudancy of your Mongodb replicas.

Here are the steps to setting up a geo redundant mongodb cluster in the EU region on AWS

1. Cluster details

Enter the cluster details – name, version & size to get started

Cluster details to deploy mongod cluster in AWS EU regions

2. Select the region for each replica set

We place the primary in EU-West (Ireland) and the secondary in EU-Central (Frankfurt). For 100% georedundancy you need to place the arbiter in a different region. If you place the arbiter in one of the EU regions and that region goes down your mongodb cluster will not have a quorum and will hence degrade to read only mode. The arbiter is a voting node and does not hold any data. Hence irrespective of where you place the arbiter all the production data and backups are stored in the EU region.

Continue reading

The role of the DBA in NoSQL

What is the role of the DBA in the rapidly evolving world of NoSQL? A majority of the early NoSQL adoption is in the fast growing world of small and medium companies based on public clouds.  In most of these companies the DBA role does not exist and this has led a lot of people to proclaim the end of the DBA.  Is the DBA going down the road of the dinosaur? I think the answer is more nuanced than that. Firstly lets examine a few trends we are seeing in the marketplace that are going to have a great downstream impact on the technology workplace.

Continue reading

Getting started with user management in MongoDB

One of the first tasks after getting your MongoDB database server up and running is to get your users and database configured. In this blog post we will go over some of the common scenarios  of creating and configuring users in MongoDB. MongoDB user management has improved very significantly over the previous two releases and is now a capable and functional user management model. Users can be assigned various roles and roles have privileges. There are several built in user roles or you can create your own custom roles.

The examples in this post use a 2.6.4 client and a 2.6.4 server. Considerable changes were made to the user management model from 2.4 to 2.6.  So if you are using a 2.4 client a lot of the examples in this blog post are not going to work. You can check the version of your mongodb client using the following syntax

mongo --version

Adding a user to a database

The first step after creating your user is to create your application database

use applicationdb;

Now after creating this database we want to create the user that will be used by the application to write to this database. We want this user to have read & write privileges to the database

db.createUser({'user':'appuser', 'pwd':'', roles:['readWrite']});

Sometimes we also want to add users who have read only access to the db. E.g. we might want to add a analytics user who only has read only access to the db

db.createUser({'user':'analyticsuser', 'pwd':'', roles:['read']});

Now that the users are created lets try and connect as this user from the mongodb console

mongo -u 'appuser' -p  <servername>/applicationdb
MongoDB shell version: 2.6.4
connecting to: <servername>/applicationdb
>

So we were able to successfully connect. Note the “/applicationdb” at the end of the syntax tells mongodb to authenticate the ‘appuser’ on the ‘applicationdb’ database

Adding a user to multiple databases

In many scenarios we need to create multiple databases on the server. For example in this scenario we might need to create another database ‘analyticsdb’ to store the results of the analytics. The ‘analyticsuser’ now needs ‘readonly’ access on the ‘applicationdb’ and ‘readWrite’ permissions on the ‘analyticsdb’.

So how do we achieve this? Should we add the ‘analyticsuser’ to each database? This creates a management nightmare over the long term as many users and databases are added. Fortunately there is a simple solution. We can centralize the role assignments for a user and store them in a single database. In this scenario I prefer to store these assignments in the ‘admin’ db since it is the hub of central administration in the server, but you can also store it in a separate db.

use admin
db.createUser({user:'analyticsuser', pwd:'<pass>', roles:[{'role':'read', 'db':'applicationdb'}, { 'role':'readWrite', 'db':'analyticsdb'}]});

Once it is added you can use ‘show users’ to show the details of your users. Here is what my admin db looks like

use admin
> show users
{
"_id" : "admin.admin",
"user" : "admin",
"db" : "admin",
"roles" : [{ "role" : "root","db" : "admin"},{"role" : "restore","db" : "admin"}]
}
{"_id" : "admin.analyticsuser",
"user" : "analyticsuser",
"db" : "admin",
"roles" : [{"role" : "read","db" : "applicationdb"},{"role" : "readWrite","db" : 'analyticsdb"}]
}
>

Continue reading

MongoDB Seattle 2014

Hope to see everybody at MongoDB Seattle, an annual one-day conference for developers, architects and operations professionals to deepen their knowledge and expertise of MongoDB.

MongoDB Seattle will take place on September 16th at the Bell Harbor Conference Centre. This highly productive day of learning and fun will feature advanced technical talks, partner sessions, and one-on-one time with MongoDB experts.

Come stop by our booth and register to win a free Amazon kindle fire that we are giving away!

Continue reading

MongoDB analytics series: Slamdata – Run SQL and build reports directly on MongoDB

This a guest post by John A. De Goes . John is the CTO & co-founder of SlamData. When not working on tricky compilation issues for SlamData, you can find John speaking at conferences, blogging, spending time with his family, and being active in the foothills of the Rocky Mountains. Contact John at john@slamdata.com

MongoDB has been hugely successful in the developer community, partially because it allows developers to store data structures directly in a fast, scalable, modern database.

There's no need to map those data structures to rigid, predefined, and flat tables that have to be reassembled at runtime through lots of intermediate tables. (Described that way, the relational model sounds kind of old fashioned, doesn't it?)

Unfortunately, the world's analytics and reporting software can't make sense of post-relational data. If it isn't flat, if it isn't all uniform, you can't do anything with it inside legacy analytics and reporting solutions!

Continue reading

Understanding durability & write safety in MongoDB

Durability is the “D” in the “ACID” properties popularized by traditional RDBMS. Durability is the guarantee that written data has been saved and will survive permanently. NoSQL databases like MongoDB give developers fine grained control over the durability of their write calls. This enables developers to choose different durability, safety and performance models for different classes of data. However this also places the burden on the developer to discern and understand the nuances of the different write safety options. In this blog post we will look at the different options for write safety provided in the Java driver. In MongoDB parlance this is called “Write Concern”. Write concerns vary from “weak” to “strong”. Weak writes concerns can lead to higher throughput but provide less data safety and strong write concerns are vice versa.

Continue reading

How to find a needle in a haystack?

 Needle In A Haystack Loupe DrawingThe poster child scenario for big data – you need to sift through a large amount of data to extract a tiny “nugget” of information. Also you need to do it in as short a amount of time as possible, your business depends on it. Historically using traditional RDBMS technology this sort of scenario has required a large team and a large investment of time and money. Most traditional RDBMS’s only scale vertically, so you have to keep buying larger and larger machines to reduce your turnaround time. The advent of public clouds and NoSQL databases like MongoDB has completely disrupted how teams are thinking about this scenario.

Continue reading

Implementing pagination with MongoDB, Express.js & Slush

MongoDB accepts and provides access to data in the Javascript Object notation (JSON) format. This makes MongoDB a perfect fit when dealing with javascript based REST services. In this post, we will take a look at Pagination using MongoDB. We will scaffold a simple Express/Mongojs application using slush-mongo. Then we will use skip() and limit() to fetch the required records from a set of data.

Pagination is one of the simplest ways to increase UX when dealing with average to huge data sets. We split the entire data into x records per page and the we will have (total records/x) pages and then we show a pagination with the number of page. As the user clicks on the page number, we seek and fetch the set of records for that particular view only.

Pagination

You can find a live demo of the app here and the complete code for this app here.

Setup the Project

Create a new folder named mongoDBPagination. Open terminal/prompt here. Next, we will install gulp, slush and slush-mongo modules. Run


$ [sudo] npm i -g gulp slush slush-mongo

Once this is done, run


$ slush mongo

You will be asked a few questions and you can answer it as follows


[?] Which MongoDB project would you like to generate? Mongojs/Express
[?] What is the name of your app? mongoDBPagination
[?] Database Name: myDb
[?] Database Host: localhost
[?] Database User:
[?] Database Password:
[?] Database Port: 27017
[?] Will you be using heroku? (Y/n) n

This will scaffold a simple Express/Mongojs app for us. Once the installation is done, run


$ gulp

Then open http://localhost:3000 in your favorite browser and you should see a table with list of routes configured in the application. This confirms that you have installed everything correctly.

Setup Test DB

We will create a new collection named ‘testData‘ and then populate some test data in it. Then we will show this data in a paginated table. Open a new Terminal/prompt and run


$ mongo

Then run


use myDb

to select our DB. Next copy the snippet below and paste it in the mongo shell and hit return.

for(var i = 1; i <= 999; i++) {
 db.testData.insert({

 name: Math.random()
           .toString(36)
           .substring(7),

 age: Math.floor(Math.random() * 99),

 random: Math.random()
             .toString(36)
             .substring(7)
 });
}

Continue reading