1. Indexing, Query Optimization, the Query
Optimizer — MongoSV
Richard M Kreuter
10gen Inc.
richard@10gen.com
December 3, 2010
MongoDB – Indexing and Query Optimiz(ation—er) — MongoSV
2. Indexing Basics
Indexes are tree-structured sets of references to your
documents.
The query planner can employ indexes to efficiently enumerate
and sort matching documents.
MongoDB – Indexing and Query Optimiz(ation—er) — MongoSV
3. However, indexing strikes people as a gray art
As is the case with relational systems, schema design and
indexing go hand in hand...
... but you also need to know about your actual (not just
predicted) query patterns.
MongoDB – Indexing and Query Optimiz(ation—er) — MongoSV
4. Some indexing generalities
A collection may have at most 64 indexes.
A query may only use 1 index (except that disjuncts in $or
queries can each use separate indexes).
Indexes entail additional work on inserts, updates, deletes.
MongoDB – Indexing and Query Optimiz(ation—er) — MongoSV
5. Creating Indexes
The id attribute is always indexed. Additional indexes can be
created with ensureIndex():
// Create an index on the user attribute
db.collection.ensureIndex({ user : 1 })
// Create a compound index on
// the user and email attributes
db.collection.ensureIndex({ user : 1, email : 1 })
// Create an index on the favorites
// attribute, will index all values in list
db.collection.ensureIndex({ favorites : 1 })
// Create a unique index on the user attribte
db.collection.ensureIndex({user:1}, {unique:true})
// Create an index in the background.
db.collection.ensureIndex({user:1}, {background:true})
MongoDB – Indexing and Query Optimiz(ation—er) — MongoSV
6. Index maintenance
// Drops an index on x
db.collection.dropIndex({x:1})
// drops all indexes
db.collection.dropIndexes()
// Rebuild indexes (need for this reduced in 1.6)
db.collection.reIndex()
MongoDB – Indexing and Query Optimiz(ation—er) — MongoSV
7. Indexes are smart about data types and structures
Indexes on attributes whose values are of different types in
different documents can speed up queries by skipping
documents where the relevant attribute isn’t of the
appropriate type.
Indexes on attributes whose values are lists will index each
element, speeding up queries that look into these attributes.
(You really want to do this for querying on tags.)
MongoDB – Indexing and Query Optimiz(ation—er) — MongoSV
8. When can indexes be used?
In short, if you can envision how the index might get used, it
probably is. These will all use an index on x:
db.collection.find( { x: 1 } )
db.collection.find( { x :{ $in : [1,2,3] } } )
db.collection.find( { x : { $gt : 1 } } )
db.collection.find( { x : /^a/ } )
db.collection.count( { x : 2 } )
db.collection.distinct( { x : 2 } )
db.collection.find().sort( { x : 1 } )
MongoDB – Indexing and Query Optimiz(ation—er) — MongoSV
9. Trickier cases where indexes can be used
db.collection.find({ x : 1 }).sort({ y : 1 })
will use an index on y for sorting, if there’s no index on x.
(For this sort of case, use a compound index on both x and y
in that order.)
db.collection.update( { x : 2 } , { x : 3 } )
will use an index on x (but older mongodb versions didn’t
permit $inc and other modifiers on indexed fields.)
MongoDB – Indexing and Query Optimiz(ation—er) — MongoSV
10. Some array examples
The following queries will use an index on x, and will match
documents whose x attribute is the array [2,10]
db.collection.find({ x : 2 })
db.collection.find({ x : 10 })
db.collection.find({ x : { $gt : 5 } })
db.collection.find({ x : [2,10] })
db.collection.find({ x : { $in : [2,5] }})
MongoDB – Indexing and Query Optimiz(ation—er) — MongoSV
11. Geospatial indexes
Geospatial indexes are a sort of special case; the operators that can
take advantage of them can only be used if the relevant indexes
have been created. Some examples:
db.collection.find({ a : [50, 50]}) finds a
document with this point for a.
db.collection.find({a : {$near : [50, 50]}})
sorts results by distance.
db.collection.find({
a:{$within:{$box:[[40,40],[60,60]]}}}})
db.collection.find({
a:{$within:{$center:[[50,50],10]}}}})
MongoDB – Indexing and Query Optimiz(ation—er) — MongoSV
12. When indexes cannot be used
Many sorts of negations, e.g., $ne, $not.
Tricky arithmetic, e.g., $mod.
Most regular expressions (e.g., /a/).
Expressions in $where clauses don’t take advantage of
indexes.
Of course $where clauses are mostly for complex queries that
often can’t be indexed anyway, e.g., ‘‘where a > b’’. (If
these cases matter to you, it you can precompute the match
and store that as an additional attribute, you can store that,
index it, and skip the $where clause entirely.)
JavaScript parts of map/reduce can’t take advantage of
indexes (mapping function is opaque to the query optimizer).
As a rule, if you can’t imagine how an index might be used, it
probably can’t!
MongoDB – Indexing and Query Optimiz(ation—er) — MongoSV
13. Never forget about compound indexes
Whenever you’re querying on multiple attributes, whether as
part of the selector document or in a sort(), compound
indexes can be used.
MongoDB – Indexing and Query Optimiz(ation—er) — MongoSV
14. Schema/index relationships
Sometimes, question isn’t “given the shape of these documents,
how do I index them?”, but “how might I shape the data so I can
take advantage of indexing?”
// Consider a schema that uses a list of
// attribute/value pairs:
db.c.insert({ product : "SuperDooHickey",
manufacturer : "Foo Enterprises",
catalog : [ { stock : 50,
modtime: ’2010-09-02’ },
{ price : 29.95,
modtime : ’2010-06-14’ } ] });
db.c.ensureIndex({ catalog : 1 });
// All attribute queries can use one index.
db.c.find( { catalog : { stock : { $gt : 0 } } } )
MongoDB – Indexing and Query Optimiz(ation—er) — MongoSV
15. Index sizes
Of course, indexes take up space. For many interesting databases,
real query performance will depend on index sizes; so it’s useful to
see these numbers.
db.collection.stats() shows indexSizes, the size of
each index in the collection.
db.collection.totalIndexSize() displays the size of all
indexes in the collection.
MongoDB – Indexing and Query Optimiz(ation—er) — MongoSV
16. explain()
It’s useful to be able to ensure that your query is doing what you
want it to do. For this, we have explain(). Query plans that use
an index have cursor type BtreeCursor.
db.collection.find({x:{$gt:5}}).explain()
{
"cursor" : "BtreeCursor x_1",
...
"nscanned" : 12345,
...
"n" : 100,
"millis" : 4,
...
}
MongoDB – Indexing and Query Optimiz(ation—er) — MongoSV
17. explain(), continued
If the query plan doesn’t use the index, the cursor type will be
BasicCursor.
db.collection.find({x:{$gt:5}}).explain()
{
"cursor" : "BasicCursor",
...
"nscanned" : 12345,
...
"n" : 42,
"millis" : 4,
...
}
MongoDB – Indexing and Query Optimiz(ation—er) — MongoSV
18. Really, compound indexes are important
Try this at home:
1 Create a collection with a few tens of thousands of documents
having two attributes (let’s call them a and b).
2 Create a compound index on {a : 1, b : 1},
3 Do a db.collection.find({a : constant}).sort({b :
1}).explain().
4 Note the explain result’s millis.
5 Drop the compound index.
6 Create another compound index with the attributes reversed.
(This will be a suboptimal compound index.)
7 Explain the above query again.
8 The suboptimal index should produce a slower explain result.
MongoDB – Indexing and Query Optimiz(ation—er) — MongoSV
19. The DB Profiler
MongoDB includes a database profiler that, when enabled, records
the timing measurements and result counts in a collection within
the database.
// Enable the profiler on this database.
> db.setProfilingLevel(1, 100)
{ "was" : 0, "slowms" : 100, "ok" : 1 }
> db.foo.find({a: { $mod : [3, 0] } });
...
// See the profiler info.
> db.system.profile.find()
{ "ts" : "Thu Nov 18 2010 06:46:16 GMT-0500 (EST)",
"info" : "query test.$cmd ntoreturn:1
command: { count: "foo",
query: { a: { $mod: [ 3.0, 0.0 ] } },
fields: {} } reslen:64 406ms",
"millis" : 406 }
MongoDB – Indexing and Query Optimiz(ation—er) — MongoSV
20. Query Optimizer
MongoDB’s query optimizer is empirical, not cost-based.
To test query plans, it tries several in parallel, and records the
plan that finishes fastest.
If a plan’s performance changes over time (e.g., as data
changes), the database will reoptimize (i.e., retry all possible
plans).
MongoDB – Indexing and Query Optimiz(ation—er) — MongoSV
21. Hinting the query plan
Sometimes, you might want to force the query plan. For this, we
have hint().
// Force the use of an index on attribute x:
db.collection.find({x: 1, ...}).hint({x:1})
// Force indexes to be avoided!
db.collection.find({x: 1, ...}).hint({$natural:1})
MongoDB – Indexing and Query Optimiz(ation—er) — MongoSV
22. Going forward
www.mongodb.org — downloads, docs, community
mongodb-user@googlegroups.com — mailing list
#mongodb on irc.freenode.net
try.mongodb.org — web-based shell
10gen is hiring. Email jobs@10gen.com.
10gen offers support, training, and advising services for
mongodb
MongoDB – Indexing and Query Optimiz(ation—er) — MongoSV