MongoDB:如何进行文本搜索和按日期排序 [英] MongoDB: How to do a text search and sort by a date

查看:257
本文介绍了MongoDB:如何进行文本搜索和按日期排序的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

上下文:我有一个装有大量电子邮件的MongoDB.我想搜索以下任意字段中包含给定电子邮件地址的所有电子邮件:收件人",发件人",抄送"和密件抄送".结果需要按日期"字段排序.我们目前正在尝试以下查询:

Context: I have a MongoDB populated with large number of emails. I'd like to do a search for all emails that include a given email address within any of the following fields: To, From, CC and BCC. The result needs to be sorted by the field Date. We're currently trying the following query:

db.collection.find({ $text : {$search: "\"email@domain.com\""}}).sort({Date:1})

我尝试做一个包含日期的复合索引,但是它不起作用.

I've tried doing a compound index including the date but it does not work.

有了这个索引...

db.collection.createIndex({Date: 1, From:"text", To:"text", CC:"text", BCC:"text"})

它给出错误17007,因为Date作为前缀应该具有相等的匹配项.这不是一个选择,因为我们希望所有电子邮件都与日期无关.

it gives error 17007 as Date should have an equality match as it's a prefix. This is not an option as we'd like all emails regardless of the date.

还有其他索引...

db.collection.createIndex({From:"text", To:"text", CC:"text", BCC:"text", Date:1})

然后,当它超过排序的内部限制时,它会给出错误17144.

Then it gives error 17144 as it goes over the internal limit for the sort.

我们已阅读以下内容:

Stackoverflow引用

Stackoverflow引用

有关复合索引的mongoDB文档

在这些参考文献和其他参考文献中,我意识到这是不可能的,但我认为我们要尝试的工作不是典型的或非常规的.

In these references and others I'm getting the idea that this is not possible but I don't think what we're trying to do is atypical or so much out of the box.

我们在做错什么吗?有没有办法使用复合索引或任何其他MongoDB功能进行此查询?

Are we doing something wrong? Is there a way to do this query with compound index or any other MongoDB feature?

谢谢!

推荐答案

无论其他复合索引键如何,您都需要包含"$meta "作为"textScore",以获得正确的排序:

Regardless of other compound index keys, you need to include the $meta for the "textScore" in order to get the correct sorting:

db.collection.find(
    { "$text": { "$search": "\"email@domain.com\""}},
    { "score": { "$meta": "textScore" } }
).sort({
    "score": { "$meta": "textScore" }, "Date": 1
})

因此,您自然希望先对分数"进行排序,然后再对日期"进行排序,以便根据搜索的相关性对事物进行正确排名.

So naturally you want that "score" to sort first, and then by "Date" in order for things to be correctly ranked by relevance of the search.

索引的顺序无关紧要,但是您当然可以只具有一个"文本索引.因此,请确保在创建之前先删除所有其他对象:

The order of index does not matter, but of course you can ony have "one" text index. So make sure you drop all others before creating:

db.collection.createIndex({ 
   "From": "text",
   "To": "text",
   "CC":"text", 
   "BCC": "text", 
   "Date":1
})

查找当前具有以下内容的索引:

Look for indexes that are current with:

db.collection.getIndicies()

或者只是放下所有东西,然后重新开始:

Or just drop everything and start fresh:

db.collection.dropIndexes()

对于您似乎正在搜索的数据,我认为每个字段上的常规复合索引应该更适合您.寻找电子邮件"地址应该是完全匹配",如果您希望每个字段有多个项目,那么它们应该是字符串数组,如下所示:

For the data you appear to be searching on though, I would have thought a regular compound index on each field should suit you better. Looking for "email" addresses should be an "exact match", and if you expect multiple items for each field then they should be arrays of strings, like so:

{
    "TO": ["bill@example.com"],
    "FROM": ["ted@example.com"],
    "CC": ["marty@example.com","sarah@example.com"],
    "BCC": [],
    "Date": ISODate("2015-07-27T13:42:05.535Z")
}

然后,您需要在每个字段上使用单独的索引,可能与"Date"一起使用,如下所示:

Then you need seperate indexes on each field, possibly in compound with "Date" like so:

db.email.createIndex({ "TO": 1, "Date": 1 })
db.email.createIndex({ "FROM": 1, "Date": 1 })
db.email.createIndex({ "CC": 1, "Date": 1 })
db.email.createIndex({ "BCC": 1, "Date": 1 })

并使用 $or 条件:

db.email.find({
    "$or": [
        { "TO": "sarah@example.com" },
        { "FROM": "sarah@example.com" },
        { "CC": "sarah@example.com" },
        { "BCC": "sarah@example.com" }
    ],
    "Date": { "$lt": new Date() }
})

如果您查看 .explain(true) (详细)从中输出,您应该看到获胜计划是所有指定索引的索引交集".由于每个字段(和选定的索引)都有一个精确匹配值,并且在索引日期上有一个范围匹配,因此这非常有效.

If you look at the .explain(true) (verbose) output from that, you should see that the winning plan is an "index intersection" of all the specified indexes. This works out to be very efficient as every field ( and index selected ) has an exact match value, and a range match on the indexed date.

对您而言,这将比文本搜索的模糊匹配"好得多.通常,即使在这里(对于电子邮件地址),正则表达式也应能更好地工作,尤其是如果将它们锚定"到字符串的开头^.

That's going to be a lot better for you than the "fuzzy matching" of text searches. Even regular expressions should work better here in general ( for e-mail addresses ) and especially if they are "anchored" ^ to the start of the string.

文本索引用于匹配单词之类的标记",但这不应该是您的数据. $or看起来不太好,但是应该做得更好.

Text indexes are meant for "word like tokens" to match, but this should not be your data. The $or does not look at nice, but it should do a much better job.

这篇关于MongoDB:如何进行文本搜索和按日期排序的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆