可笑慢mongoDB查询小集合在简单但大数据库 [英] Ridiculously slow mongoDB query on small collection in simple but big database

查看:764
本文介绍了可笑慢mongoDB查询小集合在简单但大数据库的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

所以我有一个超级简单的数据库在mongoDB与几个集合:

So I have a super simple database in mongoDB with a few collections:

> show collections
Aggregates <-- count: 92
Users <-- count: 68222
Pages <-- count: 1728288847, about 1.1TB
system.indexes

Aggregates 集合是 Pages 集合,每个文档如下所示:

The Aggregates collection is an aggregate of the Pages collection, and each document looks like this:

> db.Aggregates.findOne()
{
        "_id" : ObjectId("50f237126ba71610eab3aaa5"),
        "daily_total_pages" : 16929799,
        "day" : 21,
        "month" : 9,
        "year" : 2011
}

很简单。但是,让我们尝试通过将所有92天每日网页加载添加到

Very simple. However, let's try and get the total page loads by adding all 92 days daily page loads together:

>>> def get_total():
...     start = datetime.now()
...     print sum([x['daily_total_pages'] for x in c.Aggregates.find()])
...     end = datetime.now()
...     print (end-start).seconds
...
>>> get_total()
1728288847
43

43秒? ?!??!?!

这92个聚合结果很小!我可能只是把它们存储在一个文本文件,这是疯了。

Those 92 aggregate results are tiny! I might as well just store them in a text file, this is crazy.

还是他们小?根据mongo它们在磁盘上有多大?

Or are they tiny? How big are they on disk according to mongo?

> db.Aggregates.stats()
{
        "ns" : "c.AggregateResults",
        "count" : 92,
        "size" : 460250104,
        "avgObjSize" : 5002718.521739131,
        "storageSize" : 729464832,
        "numExtents" : 7,
        "nindexes" : 2,
        "lastExtentSize" : 355647488,
        "paddingFactor" : 1.0690000000000066,
        "systemFlags" : 1,
        "userFlags" : 0,
        "totalIndexSize" : 16352,
        "indexSizes" : {
                "_id_" : 8176,
                "date_1" : 8176
        },
        "ok" : 1
}

这些微小的每日数字总共有438兆字节?每个字节约为280个字节,因此最多应为25〜30kb。所以存储是巨大的,查询是超慢。是否有可能在磁盘上碎片化?我在将文档插入完整的 Pages 集合后创建了聚合。

438 megabytes total for those tiny daily numbers? Each one is approximately 280 bytes, so they should be a maximum of 25~30kb total. So the storage is huge and the query is super slow. Is it possible it could be fragmented on disk? I created the aggregates after inserting the documents into the the full Pages collection.

任何人都对这种疯狂有任何洞见? :O

Anyone have any insights into this madness? :O

编辑:通过Jared解决更具体的find Sammaye提供的视频还提供了一些非常有趣的存储洞察。

Solved with more specific find() query by Jared. The video below that Sammaye provided also gives some very interesting storage insights.

编辑2:所以我发现使用sys.getsizeof()不可靠的方法来找出您的文档的大小,因为它不递归任何树。所以实际上我的文档很大,最好的办法是使用find({},{'daily_page_loads'})作为更具体的查询!

Edit 2: So I found out that using sys.getsizeof() is a really unreliable method of finding out your document's sizes since it doesn't recurse down any trees. So actually my docs were quite large, and the best thing to do was to use find({}, {'daily_page_loads'}) as a more specific query!

推荐答案

avgObjSize 与280字节估计不一致。这是说你的对象平均大约5MB和 storageSize 接近1GB。如果你是内存限制运行一个查询,需要访问所有1GB的文件会导致大量的页面错误。

The avgObjSize is out of line with the 280 byte estimate. It's saying your objects are averaging around 5MB and storageSize is near 1GB. If you're memory-constrained running a query that needs to access all 1GB of file would cause lots of page faults.

您尝试过压缩

db.runCommand({compact: 'Aggregates'})

修复

db.repairDatabase()

如果这不起作用,请尝试拉回那些字段需要的总和,而不是拉整个文档。这可能是那些文档实际上是5MB,并且花费时间在线上拉数据。

If that doesn't work try pulling back just those fields needed for the sum rather than pulling the whole document. It may be that those documents are actually 5MB and the time is spent pulling data over the wire.

def get_total():
    start = datetime.now()
    print sum([x['daily_total_pages'] for x in c.Aggregates.find({}, {"daily_total_pages": 1})])
    end = datetime.now()
    print (end-start).seconds

这篇关于可笑慢mongoDB查询小集合在简单但大数据库的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆