如何在字段中存在所有关键字的 mongodb (pymongo) 中查询文档? [英] How to query documents in mongodb (pymongo) where all keywords exist in a field?

查看:130
本文介绍了如何在字段中存在所有关键字的 mongodb (pymongo) 中查询文档?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个关键字列表:

keywords = ['word1', 'word2', 'word3']

现在我只查询 1 个这样的关键字:

For now I query for only 1 keyword like this:

collection.find({'documenttextfield': {'$regex': ' '+keyword+' '}})

我绝不是正则表达式的大师,所以我用关键字旁边的空格来做雷鬼以找到完全匹配的内容.

I'm in no way a guru in regex so i do the reggae with spaces on the side of the keyword to find exact match.

但我现在想要的是,拥有 keywords 列表,以查询文档并找到那些具有 documenttextfield 列表中的每个关键字的文档.

But what i want now is, having that keywords list, to query the documents and find those which have each of the keywords from the list in the documenttextfield.

我有一些关于如何做到这一点的想法,但它们都太复杂了,我觉得我错过了一些东西......

I have some ideas of how to do this, but they are all a bit too complex and I feel I'm missing something...

推荐答案

考虑使用 带有 $text 搜索的文本索引.这可能是比使用正则表达式更好的解决方案.但是,文本搜索会根据评分算法返回文档,因此您可能会得到一些不包含您要查找的所有关键字的结果.

Consider using a text index with a $text search. It might be a far better solution than using regular expressions. However, text search returns documents based on a scoring-algorithm, so you might get some results which don't have all the keywords you are looking for.

如果您不能或不想向该字段添加文本索引,则使用单个正则表达式将非常痛苦,因为您不知道这些单词出现的顺序.我并不是说不可能写,但即使对于正则表达式标准,你最终也会得到可怕的可憎.通过使用 $and 运算符多次使用正则表达式运算符会容易得多.

If you can't or don't want to add a text index to this field, using a single regular expression would be quite a pain because you don't know the order in which these words appear. I don't claim it is impossible to write, but you will end up with a horrible abomination even for regex standards. It would be far easier to use the regex operator multiple time by using the $and operator.

此外,当单词位于字符串的开头或结尾或后跟句点或逗号时,使用空格作为分隔符也会失败.改用词边界标记 (\b).

Also, using a space as delimeter is going to fail when the word is at the beginning or end of the string or followed by a period or comma. Use the word-boundary token (\b) instead.

collection.find(
    { $and : [
              {'documenttextfield': {'$regex': '\b' +keyword1+'\b'}},
              {'documenttextfield': {'$regex': '\b' +keyword2+'\b'}},
              {'documenttextfield': {'$regex': '\b' +keyword3+'\b'}},
         ]
    });

请记住,这是一个非常慢的查询,因为它将在集合的每个文档上运行这三个正则表达式.当这是一个对性能至关重要的查询时,请认真考虑文本索引是否真的行不通.如果做不到这一点,要抓住的最后一根稻草是从 documenttextfield 字段中提取任何人可以搜索的关键字(可能是其中的每个唯一单词)到一个新的数组字段 documenttextfield_keywords,在该字段上创建一个普通索引,并使用 $all 运算符(在这种情况下不需要正则表达式).

Keep in mind that this is a really slow query, because it will run these three regular expressions on every single document of the collection. When this is a performance-critical query, seriously consider if a text index really won't do. Failing this, the last straw to grasp would be to extract any keywords from the documenttextfield field someone could search for (which might be every unique word in it) into a new array-field documenttextfield_keywords, create a normal index on that field, and search on that field with the $all operator (no regular expression required in that case).

这篇关于如何在字段中存在所有关键字的 mongodb (pymongo) 中查询文档?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆