MongoDB在聚合查询上的性能 [英] MongoDB's performance on aggregation queries

查看:369
本文介绍了MongoDB在聚合查询上的性能的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

在听到有关MongoDB性能的许多好处之后,我们决定尝试Mongodb解决我们遇到的问题.首先,将几个mysql数据库中的所有记录移至mongodb中的单个集合.这样就产生了具有 2900万个文档(每个文档至少包含20个字段)的集合,该文件在HD中占用了约100 GB的空间.我们决定将它们全部放在一个集合中,因为所有文档都具有相同的结构,并且我们要查询和汇总所有这些文档的结果.

After hearing so many good things about MongoDB's performance we decided to give Mongodb a try to solve a problem we have. I started by moving all the records we have in several mysql databases to a single collection in mongodb. This resulted in a collection with 29 Million documents (each one of them have at least 20 fields) which takes around 100 GB of space in the HD. We decided to put them all in one collection since all the documents have the same structure and we want to query and aggregate results on all those documents.

我创建了一些索引来匹配我的查询,否则即使是简单的count()也会花费很多时间.但是,诸如distinct()和group()之类的查询仍然花费太长时间.

I created some indexes to match my queries otherwise even a simple count() would take ages. However, queries such as distinct() and group() still take way too long.

示例:

// creation of a compound index    
db.collection.ensureIndex({'metadata.system':1, 'metadata.company':1})

// query to get all the combinations companies and systems
db.collection.group({key: { 'metadata.system':true, 'metadata.company':true }, reduce: function(obj,prev) {}, initial: {} });

我看了一下mongod日志,它有很多这样的行(在执行上面的查询时):

I took a look at the mongod log and it has a lot of lines like these (while executing the query above):

Thu Apr  8 14:40:05 getmore database.collection cid:973023491046432059 ntoreturn:0 query: {}  bytes:1048890 nreturned:417 154ms
Thu Apr  8 14:40:08 getmore database.collection cid:973023491046432059 ntoreturn:0 query: {}  bytes:1050205 nreturned:414 430ms
Thu Apr  8 14:40:18 getmore database.collection cid:973023491046432059 ntoreturn:0 query: {}  bytes:1049748 nreturned:201 130ms
Thu Apr  8 14:40:27 getmore database.collection cid:973023491046432059 ntoreturn:0 query: {}  bytes:1051925 nreturned:221 118ms
Thu Apr  8 14:40:30 getmore database.collection cid:973023491046432059 ntoreturn:0 query: {}  bytes:1053096 nreturned:250 164ms
...
Thu Apr  8 15:04:18 query database.$cmd ntoreturn:1 command  reslen:4130 1475894ms

此查询花费了1475894ms,这比我预期的要长得多(结果列表有大约60个条目).首先,鉴于我的收藏中有大量文件,这是否可以预期?一般而言,聚合查询在mongodb中是否会如此缓慢?关于如何提高性能的任何想法?

This query took 1475894ms which is way longer than what I would expect (the result list has around 60 entries). First of all, is this expected given the large number of documents in my collection? Are aggregation queries in general expected to be so slow in mongodb? Any thoughts on how can I improve the performance?

我在具有双核和10GB内存的单台计算机上运行mongod.

I am running mongod in a single machine with a dual core and 10GB of memory.

谢谢.

推荐答案

该想法是,通过在分布于多台计算机上的分片数据库上使用MapReduce,可以提高聚合查询的性能.

The idea is that you improve the performance of aggregation queries by using MapReduce on a sharded database that is distributed over multiple machines.

我使用同一台计算机上Oracle中的按组选择语句对Mongo的Mapreduce的性能进行了一些比较.我确实发现Mongo慢了大约25倍.这意味着我必须在至少25台计算机上分片数据,才能获得与Oracle在单台计算机上提供的性能相同的Mongo性能.我使用了一个约有1400万个文档/行的集合/表格.

I did some comparisons of the performance of Mongo's Mapreduce with a group-by-select statement in Oracle on the same machine. I did find that Mongo was approximately 25 times slower. This means that I have to shard the data over at least 25 machines to get the same performance with Mongo as Oracle delivers on a single machine. I used a collection/table with approximately 14 million documents/rows.

通过mongoexport.exe从mongo导出数据,并将导出的数据用作Oracle中的外部表,并在Oracle中进行分组,这比使用Mongo自己的MapReduce快得多.

Exporting the data from mongo via mongoexport.exe and using the exported data as an external table in Oracle and doing a group-by in Oracle was much faster than using Mongo's own MapReduce.

这篇关于MongoDB在聚合查询上的性能的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆