MongoDB + Python-非常慢的简单查询 [英] MongoDB + Python - very slow simple query

查看:436
本文介绍了MongoDB + Python-非常慢的简单查询的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个开源能源监控器( http://openenergymonitor.org ),它记录了我房屋的用电量每五秒钟,所以我认为这将是与MongoDB一起玩的完美应用程序.我有一个使用MongoEngine与MongoDB交互的Flask Python应用程序,该应用程序在Apache中运行.

I have an open source energy monitor (http://openenergymonitor.org) which logs the power usage of my house every five seconds, so I thought this would be a perfect application to play with MongoDB. I have a Flask Python application running in Apache using MongoEngine to interface with MongoDB.

现在,我正在RaspberryPi上运行所有这些功能,因此我并不期望获得令人难以置信的性能,但是一个简单的查询大约需要20秒,即使对于这种有限的硬件,这似乎也很慢.

Now I am running all of this on a RaspberryPi, so I'm not expecting incredible performance, but a simple query is taking around 20 seconds, which seems slow even for this limited hardware.

我有以下模型:

class Reading(db.Document):
    created_at = db.DateTimeField(default=datetime.datetime.now, required=True)
    created_at_year = db.IntField(default=datetime.datetime.now().year, required=True)
    created_at_month = db.IntField(default=datetime.datetime.now().month, required=True)
    created_at_day = db.IntField(default=datetime.datetime.now().day, required=True)
    created_at_hour = db.IntField(default=datetime.datetime.now().hour, required=True)
    battery = db.IntField()
    power = db.IntField()
    meta = {
        'indexes': ['created_at_year', 'created_at_month', 'created_at_day', 'created_at_hour']
    }

最近几天,我目前存储了约36,000个读数.以下代码运行起来非常快:

I currently have around 36,000 readings stored from the last couple of days. The following code runs super quick:

def get_readings_count():
    count = '<p>Count: %d</p>' % Reading.objects.count()
    return count

def get_last_24_readings_as_json():
    readings = Reading.objects.order_by('-id')[:24]
    result = "["
    for reading in reversed(readings):
        result += str(reading.power) + ","
    result = result[:-1]
    result += "]"
    return result

但是做一个简单的过滤器:

But doing a simple filter:

def get_today_readings_count():
    todaycount = '<p>Today: %d</p>' % Reading.objects(created_at_year=2014, created_at_month=1, created_at_day=28).count()
    return todaycount

大约需要20秒-今天大约有11,000个读数.

Takes around 20 seconds - there are around 11,000 readings for today.

我应该放弃对Pi的更多期待,还是可以做一些调整以使MongoDB获得更高的性能?

Shall I give up expecting anything more of my Pi, or is there some tuning I can do to get more performance from MongoDB?

Debian Wheezy上的Mongo 2.1.1

Mongo 2.1.1 on Debian Wheezy

更新29/1/2014:

为回答以下问题,以下是getIndexes()和explain()的结果:

In response to an answer below, here are the results of getIndexes() and explain():

> db.reading.getIndexes()
[
    {
        "v" : 1,
        "key" : {
            "_id" : 1
        },
        "ns" : "sensor_network.reading",
        "name" : "_id_"
    },
    {
        "v" : 1,
        "key" : {
            "created_at_year" : 1
        },
        "ns" : "sensor_network.reading",
        "name" : "created_at_year_1",
        "background" : false,
        "dropDups" : false
    },
    {
        "v" : 1,
        "key" : {
            "created_at_month" : 1
        },
        "ns" : "sensor_network.reading",
        "name" : "created_at_month_1",
        "background" : false,
        "dropDups" : false
    },
    {
        "v" : 1,
        "key" : {
            "created_at_day" : 1
        },
        "ns" : "sensor_network.reading",
        "name" : "created_at_day_1",
        "background" : false,
        "dropDups" : false
    },
    {
        "v" : 1,
        "key" : {
            "created_at_hour" : 1
        },
        "ns" : "sensor_network.reading",
        "name" : "created_at_hour_1",
        "background" : false,
        "dropDups" : false
    }
]

> db.reading.find({created_at_year: 2014, created_at_month: 1, created_at_day: 28 }).explain()
{
    "cursor" : "BtreeCursor created_at_day_1",
    "isMultiKey" : false,
    "n" : 15689,
    "nscannedObjects" : 15994,
    "nscanned" : 15994,
    "scanAndOrder" : false,
    "indexOnly" : false,
    "nYields" : 5,
    "nChunkSkips" : 0,
    "millis" : 25511,
    "indexBounds" : {
        "created_at_day" : [
            [
                28,
                28
            ]
        ]
    },
    "server" : "raspberrypi:27017"
}

更新2月4日

好的,所以我删除了索引,在created_at上设置了一个新索引,删除了所有记录,并留了一天的时间来收集新数据.我刚刚对今天的数据进行了查询,却花费了更长的时间(48秒):

Okay, so I deleted the indexes, set a new one on created_at, deleted all the records and left it a day to collect new data. I've just run a query for today's data and it took longer (48 seconds):

> db.reading.find({'created_at': {'$gte':ISODate("2014-02-04")}}).explain()
{
    "cursor" : "BtreeCursor created_at_1",
    "isMultiKey" : false,
    "n" : 14189,
    "nscannedObjects" : 14189,
    "nscanned" : 14189,
    "scanAndOrder" : false,
    "indexOnly" : false,
    "nYields" : 9,
    "nChunkSkips" : 0,
    "millis" : 48653,
    "indexBounds" : {
        "created_at" : [
            [
                ISODate("2014-02-04T00:00:00Z"),
                ISODate("292278995-12-2147483314T07:12:56.808Z")
            ]
        ]
    },
    "server" : "raspberrypi:27017"
}

该数据库中只有16,177条记录,并且只有一个索引.大约有111MB的可用内存,因此索引适合内存应该不会有问题.我想我将不得不注销此内容,因为Pi的功能不足以完成这项工作.

That's with only 16,177 records in the database and only one index. There's around 111MB of free memory, so there shouldn't be an issue with the index fitting in memory. I guess I'm going to have to write this off as the Pi not being powerful enough for this job.

推荐答案

确定要创建索引吗?您能否提供集合getIndexes()的输出

Are you sure that your index is created? could you provide the output of getIndexes() of your collection

例如:db.my_collection.getIndexes()

以及查询说明

db.my_collection.find({created_at_year: 2014, created_at_month: 1, created_at_day: 28 }).explain()

PS:当然,我必须同意@Aesthete的事实,即您存储的存储量远远超过了需要存储的数量...

PS: of course I must agree with @Aesthete about the fact that you store much more than you need to...

2014年1月29日更新

完美!如您所见,创建一个包含所有索引的复合索引时,您有四个不同的索引.

Perfect! As you see you have four different indexes when you can create ONE compound index which will include all of them.

定义

db.my_collection.ensureIndex({created_at_year: 1, created_at_month: 1, created_at_day: 1, created_at_hour: 1 })

将为您提供更精确的索引,使您可以查询:

will provide you a more precise index that will enable you to query for:

  • year
  • yearmonth
  • yearmonthday
  • yearmonthdayhour
  • year
  • year and month
  • year and month and day
  • year and month and day and hour

这将使您的查询(使用四个键)更快,因为索引数据将满足您所有的条件!

This will make your queries (with the four keys) much faster, because all your criteria will be met in the index data!

请注意,ensureIndex()中的键顺序至关重要,该顺序实际上定义了上述查询列表!

please note that that the order of keys in ensureIndex() is crucial, that order actually defines the above mentioned list of queries!

还请注意,如果您只需要这4个字段,那么与您指定正确的投影相比,
例如:
db.my_collection.find({created_at_year: 2014, created_at_month: 1, created_at_day: 28}, { created_at_year: 1, created_at_month: 1, created_at_day: 1 })

Also note that if all you need is these 4 fields, than if you specify a correct projection
eg:
db.my_collection.find({created_at_year: 2014, created_at_month: 1, created_at_day: 28}, { created_at_year: 1, created_at_month: 1, created_at_day: 1 })

然后将仅使用索引,这是最高性能!

then only the index will be used, which is the maximum performance!

这篇关于MongoDB + Python-非常慢的简单查询的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆