MongoDB记录太多? [英] MongoDB too many records?

查看:77
本文介绍了MongoDB记录太多?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个与MongoDB交互的PHP应用程序.直到最近,该应用程序仍能正常工作,但几天前,我发现该应用程序开始响应速度非常慢.其中一个收藏集已突破50万条记录.因此,对于该集合中的任何查询,MongCursor都会保持超时.

我认为50万条记录不会太多.使用mongodb的其他页面也开始变慢,但不及使用具有50万条记录的集合的页面慢.不与MongoDB交互的静态页面仍然可以快速响应.

我不确定这里可能是什么问题.我已经索引了这些集合,因此这似乎不是问题.还要注意的一点是,服务器上的RAM规格为512 MB,并且当PHP执行Mongo时,top命令显示15000k可用内存.

任何帮助将不胜感激.

解决方案

总结聊天室的跟进工作,该问题实际上与find()查询有关,该查询正在扫描所有〜500k文档以查找15:

db.tweet_data.find({ 
    $or: 
    [ 
        { in_reply_to_screen_name: /^kunalnayyar$/i, handle: /^kaleycuoco$/i, id: { $gt: 0 } }, 
        { in_reply_to_screen_name: /^kaleycuoco$/i, handle: /^kunalnayyar$/i, id: { $gt: 0 } } 
    ], 
    in_reply_to_status_id_str: { $ne: null }
} ).explain() 
{ 
    "cursor" : "BtreeCursor id_1", 
    "nscanned" : 523248, 
    "nscannedObjects" : 523248, 
    "n" : 15, 
    "millis" : 23682, 
    "nYields" : 0, 
    "nChunkSkips" : 0, 
    "isMultiKey" : false, 
    "indexOnly" : false, 
    "indexBounds" : { 
        "id" : [ 
            [ 
                0, 
                1.7976931348623157e+308 
            ] 
        ] 
    } 
}

此查询使用的是不区分大小写的正则表达式不会有效利用索引(尽管在这种情况下实际上并没有定义索引).

建议的方法:

  • 创建小写的handle_lcinreply_lc字段用于搜索目的

  • 在这些标签上添加化合物索引:

    db.tweet.ensureIndex({handle_lc:1, inreply_lc:1})

  • 复合索引的顺序允许通过handle或(handle,in_reply_to)

  • 有效查找所有推文
  • 通过完全匹配而不是正则表达式进行搜索:

db.tweet_data.find({ $or: [ { in_reply_to_screen_name:'kunalnayyar', handle:'kaleycuoco', id: { $gt: 0 } }, { in_reply_to_screen_name:'kaleycuoco', handle:'kunalnayyar', id: { $gt: 0 } } ], })

I have a PHP app that interacts with MongoDB. Until recently, the app was working fine but a few days back I found that the app is starting to respond REALLY slow. One of the collections has shot up to 500K+ records. So the MongCursor for any query on that collection keeps timing out.

I don't think 500K records is WAY too much. Other pages using mongodb are beginning to slow down as well, but not as much as the one which uses the collection with 500k records. Static pages which don't interact with MongoDB are still fast to respond.

I am not sure what could be the issue here. I have indexed the collections, so that does not seem to be a problem. Another point to note is that the RAM spec on the server is 512 MB and when PHP executes Mongo, top command show 15000k memory free.

Any help will be greatly appreciated.

解决方案

To summarize followup from the chat room, the issue is actually related to a find() query which is doing a scan of all ~500k documents to find 15:

db.tweet_data.find({ 
    $or: 
    [ 
        { in_reply_to_screen_name: /^kunalnayyar$/i, handle: /^kaleycuoco$/i, id: { $gt: 0 } }, 
        { in_reply_to_screen_name: /^kaleycuoco$/i, handle: /^kunalnayyar$/i, id: { $gt: 0 } } 
    ], 
    in_reply_to_status_id_str: { $ne: null }
} ).explain() 
{ 
    "cursor" : "BtreeCursor id_1", 
    "nscanned" : 523248, 
    "nscannedObjects" : 523248, 
    "n" : 15, 
    "millis" : 23682, 
    "nYields" : 0, 
    "nChunkSkips" : 0, 
    "isMultiKey" : false, 
    "indexOnly" : false, 
    "indexBounds" : { 
        "id" : [ 
            [ 
                0, 
                1.7976931348623157e+308 
            ] 
        ] 
    } 
}

This query is using case-insensitive regular expressions which won't make efficient use of an index (though there wasn't actually one defined, in this case).

Suggested approach:

  • create lowercase handle_lc and inreply_lc fields for search purposes

  • add a compound index on those:

    db.tweet.ensureIndex({handle_lc:1, inreply_lc:1})

  • the order of the compound index allows efficient find of all tweets either by handle or by (handle,in_reply_to)

  • search by exact match instead of regex:

db.tweet_data.find({ $or: [ { in_reply_to_screen_name:'kunalnayyar', handle:'kaleycuoco', id: { $gt: 0 } }, { in_reply_to_screen_name:'kaleycuoco', handle:'kunalnayyar', id: { $gt: 0 } } ], })

这篇关于MongoDB记录太多?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆