The three A’s of MongoDB security – Authentication, Authorization & Auditing

The three A's of Mongodb security - Authentication, Authorization and Auditing

MongoDB, Inc has made impressive strides over the past 18 months. One of the areas of the product that has seen the most significant improvement has been the area of Security. Security is of paramout importance for a production database. Existing relational databases provides a  number of knobs and controls to help the DB administrator manage the security of his database and MongoDB is getting to a similar place as well. In this post we will delve deeper into the security features in the areas of Authentication, Authorization & Auditing.

Continue reading

MongoDB shards and unbalanced aggregation loads

The aggregation framework is a vital cog in the mongodb infrastructure. It helps you analyze, summarize and aggregate the data stored in mongodb. Refer to this blog post for more details about the aggregation framework in MongoDB 2.6.

In the 2.6 release MongoDB made a subtle but significant change in the way the underlying aggregation pipelines execute in a sharded environment. When working with sharded collections MongoDB splits the pipeline into two stages. The first stage or the “$match” phase runs on each shard and selects the relevant documents. If the query planner determines that a shard is not relevant based on the shard keys then this phase is not executed on that shard.

Continue reading

Yeoman, Mongoose and MongoDB

In our previous post we talked about getting started with Mongoose and MongoDB. In this post, we will see how to use Yeoman and scaffold a new Mongoose/Express project.

Yeoman is a scaffolding tool, that scaffolds out projects using Grunt, Bower and Node. There are times when you end up cut ‘n pasting boilerplate code around to create a new project. This is precisely what Yeoman does, but with a single command and a few awesome generators.

Yeoman uses Grunt as the taskrunner to perform run/build/test tasks. If you want to use Gulp for the same, you can checkout Slush. Slush is also a Scaffolding tool but uses Gulp as the taskrunner.

Getting Started with Yeoman

To make our lives easy, we will be using a Super Awesome Yeoman Generator named generator-mongoose, which will help us in setting up a new project as well as help us in scaffolding schemas.

This generator uses Express js as the server, HTML for templating and a tinge of Bootstrap CSS to make things look good.

Let’s create a new folder and name it yoMongoose. CD into the folder and run the following :
To install Yeoman

[sudo] npm install -g yo

To install generator-mongoose

[sudo] npm install -g generator-mongoose

and finally run

yo mongoose

to scaffold a new project. Fill in the question like

[?] Database Name: (myDb) myTestDB
[?] Database Host: (localhost) localhost
[?] Database User: {hit return}
[?] Database Password: {hit return}
[?] Database Port: (27017) 27017
[?] Will you be using heroku? (Y/n)  n

And yeoman will go off and scaffold a new project for you. Your folder structure should consist of a /node_modules folder and a public/bower_components. If you do not see either of them, please run npm install and bower install.

To run the app, execute

grunt

This will start off the express server and launch the home page in your default browser. The default page you see is a list of routes configured in the application.

Back to the folder and let’s have a quick walkthrough of the app.

config/db.js – consist of the DB configs and some options you can mess around with

models/post.js – is an example schema of a blog post. All the other models, which we are going to scaffold with the sub generator will appear here.

public/ – consist of the Javascript and CSS needed for the UI

routes/
index.js – consist of the default route, that will dispatch the index.html
post.js – consist of 5 key endpoints you need to interact with the posts collection

test/ – consists of the test for Post route and its methods

views/ – consists of all the templates & views sent to the client.

I recommend taking a peek at the following in order

config/db.js
models/post.js
routes/post.js
app.js

to get a feel of where things go in a modular Express app. Once you are done, we will scaffold another model named articles using the sub generator.

Back to terminal/prompt and run

yo mongoose:schema "article|title:String,excerpt:String,content:String,published:Boolean,created:Date"

the above command will result in

Your creating a schema for article
With the fields: title,excerpt,content,published,created
starting request to schematic for test mock data...
create routes/article.js
create models/article.js
create test/test-article.js

Continue reading

Getting started with MongoDB and Mongoose

What is Mongoose?

Mongoose is an “elegant mongodb object modeling for node.js“. If you have used MongoDB before and tried basic database operations, you might have noticed that MongoDB is  “schema less”. When you are looking to implement a more structured database and want to leverage the power of MongoDB, Mongoose is one of the ODM (Object Data Mapping) solutions.

To quickly demonstrate, you run an insert command into a collection named users like


db.users.insert({ name : 'Arvind', gender : 'male'});

And right after that you can run


db.users.insert({ name : 'Arvind', gender : 'male', password : '!@#$'});

and MongoDB will never complain about the variation in the number of columns (key value pairs). This is very flexible. But when you want to keep your data more organized and structured, you would need to maintain that in your server code, writing validation, making sure nothing irrelevant is stored in a collection. And this is where Mongoose makes life easy.

“Mongoose provides a straight-forward, schema-based solution to modeling your application data and includes built-in type casting, validation, query building, business logic hooks and more, out of the box.”

Install Node js & MongoDB

To use Mongoose, we need to have Node js installed, you can find info here.

Start Developing

Let us first create a small playground, where we can have fun. Create a new folder named myMongooseApp. And open terminal/prompt here and run

npm init

This will help us in initializing a new node project. Fill it up as required. Next, we will install Mongoose as a dependency to our project. Run

npm install mongoose --save-dev

then start the MongoDB service by running

mongod

Next, create a new file named index.js at the root of the and then open it up in your favorite editor. And add the below code.

var mongoose = require('mongoose');
mongoose.connect('mongodb://localhost/myTestDB');

var db = mongoose.connection;

db.on('error', function (err) {
console.log('connection error', err);
});
db.once('open', function () {
console.log('connected.');
});

Here, we require the mongoose package to connect to the DB, and initialize the connection. The name of our Database is myTestDB.

Then run

node index.js

and you should see the connected message. You can also use a node package named nodemon for automatically restarting the node server on changes.
Now, our sandbox is ready to play!

Mongoose Schemas

Schemas are like skeletons. The bare bones of how your data collection will look like. If you are dealing with a collection of users, your schema would look something like this.

Name - String
Age - Number
Gender - String
Date of Birth - Date

Continue reading

MongoDB on AWS: How to choose the right EC2 instance type for your MongoDB server?

MongoDB on AWS Lets face it. AWS has gotten incredibly complicated. A simple task like picking the right instance type for your MongoDB server requires a fair bit of research. How do you know which server type to choose in the alphabet soup of options? In this blog post we will break down the different instance types and how they are applicable to your MongoDB scenarios. In order to keep things simple we are not going to talk about disk types or sizes in this post – that’s the topic of our next post.

Continue reading

MongoDB 2.6 Aggregation framework improvements

This a guest post by Vlad Mihalcea. Vlad is a software architect passionate about software integration, high scalability and concurrency challenges. Here is a link to the original post.

MongoDB is evolving rapidly. The 2.2 version introduced the aggregation framework as an alternative to the Map-Reduce query model. Generating aggregated reports is a recurrent requirement for enterprise systems and MongoDB shines in this regard. If you’re new to it you might want to check this aggregation framework introductionor the performance tuning and the data modelling guides.

Let’s reuse the data model I first introduced while demonstrating the blazing fast MongoDB insert capabilities:

{
        "_id" : ObjectId("5298a5a03b3f4220588fe57c"),
        "created_on" : ISODate("2012-04-22T01:09:53Z"),
        "value" : 0.1647851116706831
}

MongoDB 2.6 Aggregation enhancements

In the 2.4 version, if I run the following aggregation query:

db.randomData.aggregate( [
{
    $match: {
        "created_on" : {
            $gte : new Date(Date.UTC(2012, 0, 1)),
            $lte : new Date(Date.UTC(2012, 0, 10))
        }
    }
},
{
    $group: {
        _id : {
            "minute" : {
                $minute : "$created_on"
            }
        },
        "values": {
            $addToSet: "$value"
        }
    }
}]);

Continue reading

Configuring MongoDirector permissions on AWS using an IAM policy template

MongoDirector supports the ability to manage your MongoDB clusters in your AWS account. This model has several advantages as outlined in this blog post. In order to manage mongodb clusters in your own AWS account MongoDirector requires certain permissions. Our recommendation is to restrict the permissions so that you give MongoDirector enough permissions to manage your MongoDB servers and nothing more. This can be done by configuring a custom Identity and Access management (IAM) policy for the AWS keys that you input into MongoDirector. MongoDirector provides two types of IAM policies

Continue reading

MongoDB on Digital Ocean

Digital Ocean is a NY based hosting provider specializing in SSD based virtual machines. A majority of our customers choose to deploy and manage databases on Amazon AWS. However running large scale write intensive databases on AWS is a fairly difficult and time-consuming operation. If you are interested in the reasons you can read the details in my blog post – “What I would like to see in EC2..” .  We have been using Digital Ocean for several months now and have learnt a lot about the system.

Continue reading

Encrypting mongodb data at rest

MongoDB encryption

MongoDB is now the defacto database for a wide variety of applications, some storing sensitive data. When you store sensitive information in your MongoDB database it is important to encrypt the contents of your data disk. This gives you an extra layer of protection if your data disks, snapshots or backups are lost or stolen. In some scenarios encryption at rest is compulsory due to compliance requirements. E.g. If an attacker gets access to your snapshots or backups, all the data is still encrypted and he/she still cannot access your raw application data.

Continue reading

When to use GridFS?

GridFS is a simple file system abstraction on top of MongoDB. If you are familiar with Amazon S3, GridFS is a very similar abstraction. Now why does a document oriented database like MongoDB provide a file layer abstraction? Turns out there are some very good reasons

1. Storing user generated file content
A large number of web applications allow users to upload files. Historically when working with relational databases these user generated files get stored on the file system separate from the database. This creates a number of problems.  How to replicate the files to all the needed servers?, How to delete all the copies when the file is deleted? How to backup the files for safety and disaster recovery? GridFS solves this problem for the user by storing the files along with the database. You can leverage your database backup to backup your files. Also due to MongoDB replication a copy of your files is stored in each replica. Deleting the file is as easy as deleting an object in the DB.

2. Accessing portions of file content
When a file is uploaded to GridFS, the file is split into chunks of 256k and stored separately.  So when you need to read only a certain range of bytes of the file, only those chunks are brought into memory and not the whole file. This is extremely useful when dealing with large media content that needs to be selectively read or edited.

3. Storing documents greater than 16MB in MongoDB
By default MongoDB document size is capped at 16MB. So if you have documents that are greater than 16MB you can use store them using GridFS.

4. Overcoming file system limitations
If you are storing a large number of files you will need to consider file system limitations like the maximum number of files/directory etc. With GridFS you don’t need to worry about the file system limits. Also with GridFS and MongoDB sharding you can distribute your files across different servers without significantly increasing the operational complexity.

Underneath the covers
GridFS uses two collections to store the data

> show collections;
fs.chunks
fs.files
system.indexes
>

The fs.files collections contains metadata about the files and the fs.chunks collections stores the actual 256k chunks. If you have a sharded collection the chunks are distributed across different servers and you might get better performance than a filesystem!

> db.fs.files.findOne();
{
"_id" : ObjectId("530cf1bf96038f5cb6df5f39"),
"filename" : "./conn.log",
"chunkSize" : 262144,
"uploadDate" : ISODate("2014-02-25T19:40:47.321Z"),
"md5" : "6515e95f8bb161f6435b130a0e587ccd",
"length" : 1644981
}
>

MongoDB also creates a compound index on files_id and the chunk number to help quickly access the chunks

> db.fs.chunks.getIndexes();
[
{
"v" : 1,
"key" : {
"_id" : 1
},
"ns" : "files.fs.chunks",
"name" : "_id_"
},
{
"v" : 1,
"key" : {
"files_id" : 1,
"n" : 1
},
"ns" : "files.fs.chunks",
"name" : "files_id_1_n_1"
}
]
>

Continue reading