MongoDB不区分大小写的索引“以...开头"性能问题 [英] MongoDB case insensitive index "starts with" performance problems

查看:232
本文介绍了MongoDB不区分大小写的索引“以...开头"性能问题的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

在发现3.3.11 支持不区分大小写的索引(使用排序规则)后,我已经重建了我的4000万条记录的数据库,以便与此配合使用.替代方法是添加例如专用于不区分大小写的搜索的小写字段,并将它们编入索引.

After finding out that 3.3.11 supports case insensitive index (using collation) I have rebuilt my database of 40 million records to play with this. Alternative was to add e.g. lowercase fields specific to case insensitive search and index those.

我要做的是在创建时要求MongoDB以我的收藏集的身份支持

What I did was to ask MongoDB to support collation on my collection at the time of creation as suggested here. So I did this to enable case insensitivity for the entire collection:

db.createCollection("users", {collation:{locale:"en",strength:1}})

加载集合后,我尝试了以下直接查询:

After loading the collection I have tried direct queries like:

db.users.find({full_name:"john doe"})

...,这些结果在约10ms内返回50个结果.它不区分大小写-一切都很好.但后来我尝试了类似的方法:

...and those return in ~10ms with 50 results. It's case insensitive - so all is great. But then I try something like:

db.users.find({full_name:/^john/})

...或...

db.users.find({full_name:/^john/i})

...这需要5分钟以上的时间.我很失望.在执行explain()之后,事实证明显然正在使用索引,但是查询仍然花费太长时间来执行.可以归因于错误的或不完整的开发版本,还是我做的是根本上错误的事情?

...and this takes more than 5 minutes. I was so disappointed. After doing explain() it turns out that the index was apparently being used but the query still takes way too long to execute. Can this be attributed to buggy or incomplete development release or am I doing something fundamentally wrong?

由于我正在做以...开头"的正则表达式搜索,因此查询应该很快.有什么想法吗?

As I am doing a "starts with" regex search, the query should be lightning fast. Any ideas?

推荐答案

有一个可行的解决方法.基本上,如果要搜索的单词是"bob",则可以搜索$ lt:"boc"(将最后一个字符加1)和$ gte"bob".这将使用索引.您可以使用以下我编写的以下功能(警告它不一定没有错误,但是可以正常工作),如下所示:

there is a workable workaround. Basically if the word you are searching for is "bob", you can search for $lt:"boc", (where you increment the last character by one), and $gte "bob". This will use the index. You can use the following function I made below (warning its not necessarily bug free but pretty much works) like this:

var searchCriteria = {};
addStartsWithQuery(searchCriteria, "firstName", "bo");
People.find(searchCriteria).then(...);

//searchCriteria will be
/*
{
    $and:[
         {firstName:{$gte:"bo"}},
         {firstName:{$lt:"bp"}}
    ]
}
*/


//now library functions that will automatically generate the correct query and add it to `searchCriteria`.  Of course for complicated queries you may have to modifiy it a bit.
function getEndStr(str) {
    var endStrArr = str.toLocaleLowerCase('en-US').split("");
    for (var i = endStrArr.length - 1; i >= 0; --i) {
        var lastChar = endStrArr[i];
        if(lastChar === "z"){
            return endStrArr.join("") + "zzzzzzzzzzzz";
        }
        var nextChar = String.fromCharCode(lastChar.charCodeAt(0) + 1);
        if (nextChar === ":")
            nextChar = "a";
        if (nextChar !== false) {
            endStrArr[i] = nextChar;
            return endStrArr.join("");
        }
        endStrArr.pop();
    }
}
function addStartsWithQuery(searchCriteria, propertyName, str) {
    if (!(typeof str === 'string') || !str.length)
        return;
    var endStr = getEndStr(str);
    if (endStr) {
        if (!searchCriteria.$and)
            searchCriteria.$and = [];
        searchCriteria.$and.push({
            [propertyName]: {
                $gte: str
            }
        });
        searchCriteria.$and.push({
            [propertyName]: {
                $lt: endStr
            }
        });
    } else {
        searchCriteria[propertyName] = {
            $gte: str
        }
    }
}

结果证明MongoDB正式不支持它!我已经链接到JIRA中的一个问题,他们对此进行了明确说明.不幸的是,这使排序规则的实用性大大降低.让我们尽快解决这个问题!从技术上讲,我注意到,即使索引正在使用索引,索引也会使用"[\"\", {})",作为其索引范围之一,它始终返回索引中的所有项目,因此索引扫描是无用的.查询的下一阶段会像正常一样过滤这些结果.

Well it turns out MongoDB officially doesn't support it! I've linked to an issue in JIRA where they make this clear. This makes collations significantly less useful, unfortunately. Let's get on them to fix this soon! Technically speaking, I noticed that even though it is using the index, the index uses "[\"\", {})", as one of it's index bounds, which always returns all items in the index, so the index scan is useless. The next stage of the query filters through those results like normal.

https://jira.mongodb.org/browse/DOCS-9933

投票解决此问题,以帮助他们解决! https://jira.mongodb.org/browse/SERVER-29865

Vote for this issue to get them to fix it! https://jira.mongodb.org/browse/SERVER-29865

这篇关于MongoDB不区分大小写的索引“以...开头"性能问题的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆