MongoDB查询过慢,即使查询很简单并且与索引对齐 [英] Unreasonably slow MongoDB query, even though the query is simple and aligned to indexes

查看:57
本文介绍了MongoDB查询过慢,即使查询很简单并且与索引对齐的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在运行MongoDB服务器(实际上它已经在运行).该服务器具有64GB的RAM和16个内核,外加2TB的硬盘空间可用于处理.

I'm running a MongoDB server (that's literally all it has running). The server has 64gb of RAM and 16 cores, plus 2TB of hard drive space to work with.

文档结构

该数据库的集合domains包含大约2000万个文档.每个文档中都有大量数据,但是出于我们的目的,该文档的结构如下:

The database has a collection domains with around 20 million documents. There is a decent amount of data in each document, but for our purposes, The document is structured like so:

{
    _id: "abcxyz.com",
    LastUpdated: <date>,
    ...
}

_id字段是文档引用的域名. LastUpdated上有一个升序索引. LastUpdated每天更新数十万条记录.基本上,每次有新数据可用于文档时,都会更新该文档,并将LastUpdated字段更新为当前日期/时间.

The _id field is the domain name referenced by the document. There is an ascending index on LastUpdated. LastUpdated is updated on hundreds of thousands of records per day. Basically every time new data becomes available for a document, the document is updated and the LastUpdated field updated to the current date/time.

查询

我有一种机制可以从数据库中提取数据,以便可以在Lucene索引中对其进行索引. LastUpdated字段是标记对文档所做的更改的关键驱动程序.为了搜索已更改的文档并翻阅这些文档,请执行以下操作:

I have a mechanism that extracts the data from the database so it can be indexed in a Lucene index. The LastUpdated field is the key driver for flagging changes made to a document. In order to search for documents that have been changed and page through those documents, I do the following:

{
    LastUpdated: { $gte: ISODate(<firstdate>), $lt: ISODate(<lastdate>) },
    _id: { $gt: <last_id_from_previous_page> }
}

sort: { $_id:1 }

不返回任何文档时,开始日期和结束日期会向前移动,并且会重置_id锚定"字段.此设置可容忍更改了LastUpdated值的先前页面中的文档,即,分页不会被以前技术上不再存在的先前页面中的文档数错误地抵消.

When no documents are returned, the start and end dates move forward and the _id "anchor" field is reset. This setup is tolerant to documents from previous pages that have had their LastUpdated value changed, i.e. the paging won't become incorrectly offset by the number of documents in previous pages that are now technically no longer in those pages.

问题

理想情况下,我希望一次选择大约25000个文档,但是由于某种原因,查询本身(即使仅选择< 500个文档)也非常慢.

I want to ideally select about 25000 documents at a time, but for some reason the query itself (even when only selecting <500 documents) is extremely slow.

我运行的查询是:

db.domains.find({
    "LastUpdated" : {
        "$gte" : ISODate("2011-11-22T15:01:54.851Z"),
        "$lt" : ISODate("2011-11-22T17:39:48.013Z")
    },
    "_id" : { "$gt" : "1300broadband.com" }
}).sort({ _id:1 }).limit(50).explain()

事实上,它是如此之慢,以至于解释(在撰写本文时)已经运行了10分钟以上,但尚未完成.如果此问题结束,我将更新此问题,但是当然,问题是查询速度非常慢.

It is so slow in fact that the explain (at the time of writing this) has been running for over 10 minutes and has not yet completed. I will update this question if it ever finishes, but the point of course is that the query is EXTREMELY slow.

我该怎么办?我没有最模糊的线索,查询可能出什么问题.

What can I do? I don't have the faintest clue what the problem might be with the query.

编辑 55分钟后解释完成.在这里:

EDIT The explain finished after 55 minutes. Here it is:

{
    "cursor" : "BtreeCursor Lastupdated_-1__id_1",
    "nscanned" : 13112,
    "nscannedObjects" : 13100,
    "n" : 50,
    "scanAndOrder" : true,
    "millis" : 3347845,
    "nYields" : 5454,
    "nChunkSkips" : 0,
    "isMultiKey" : false,
    "indexOnly" : false,
    "indexBounds" : {
            "LastUpdated" : [
                    [
                            ISODate("2011-11-22T17:39:48.013Z"),
                            ISODate("2011-11-22T15:01:54.851Z")
                    ]
            ],
            "_id" : [
                    [
                            "1300broadband.com",
                            {

                            }
                    ]
            ]
    }
}

推荐答案

遇到了非常相似的问题,并且索引建议和常见问题解答说,引用:

Bumped into a very similar problem, and the Indexing Advice and FAQ on Mongodb.org says, quote:

范围查询还必须是索引中的最后一列

The range query must also be the last column in an index

因此,如果您具有键a,b和c并运行db.ensureIndex({a:1,b:1,c:1}),则这些是准则",以便尽可能多地使用索引:

So if you have the keys a,b and c and run db.ensureIndex({a:1, b:1, c:1}), these are the "guidelines" in order use the index as much as possible:

好:

  • 找到(a = 1,b> 2)

  • find(a=1,b>2)

找到(a> 1和a< 10)

find(a>1 and a<10)

发现(a> 1和a< 10).sort(a)

find(a>1 and a<10).sort(a)

坏:

  • 找到(a> 1,b = 2)

仅使用范围查询或对一列进行排序. 好:

Only use a range query OR sort on one column. Good:

  • 发现(a = 1,b = 2).sort(c)

    • find(a=1,b=2).sort(c)

      找到(a = 1,b> 2)

      find(a=1,b>2)

      发现(a = 1,b> 2和b< 4)

      find(a=1,b>2 and b<4)

      发现(a = 1,b> 2).sort(b)

      find(a=1,b>2).sort(b)

      坏:

      • 找到(a> 1,b> 2)

      • find(a>1,b>2)

      发现(a = 1,b> 2).sort(c)

      find(a=1,b>2).sort(c)

      希望有帮助!

      /J

      这篇关于MongoDB查询过慢,即使查询很简单并且与索引对齐的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆