具有语言支持的 MongoDb 文本搜索 [英] MongoDb text search with language support
问题描述
我在使用 MongoDB 的语言文本搜索时遇到问题.对于某些记录,搜索效果很好,而对于某些记录,它根本不起作用.
我有一份要搜索的成分列表.成分有多种语言,我喜欢照顾单数和复数.
这是我的例子
架构
{翻译: [{语: {类型:字符串,要求:真实},名称: {类型:字符串,要求:真实}}],卡路里:{类型":数字},蛋白质: {类型":数字},碳水化合物:{类型":数字},胖的: {类型":数字}}
索引
foodSchema.index( { "translation.name": "text" }, { default_language: "german" } )
从数据库读取索引
<预><代码>[{v": 2,钥匙":{_id": 1},姓名":_id_"},{v": 2,钥匙":{_fts":文本",_ftsx": 1},姓名":translation.name_text",默认语言":德国",背景": 真的,权重":{translation.name": 1},语言覆盖":语言",文本索引版本": 3}]记录
<代码>{卡路里:1,蛋白质:2,碳水化合物:3,脂肪:4,翻译: [{_id: ObjectId('5fba87d13ad6404108191670'),语言:德语",名称:'古克'},{_id: ObjectId('5fba87d13ad6404108191671'),英语语言',名称:'黄瓜'},{_id: ObjectId('5fba87d13ad6404108191672'),语言:'西班牙语',名称:'佩皮诺'}]}//----{卡路里:4,蛋白质:3,碳水化合物:2,脂肪:1,翻译: [{_id: ObjectId('5fba87d13ad6404108191674'),语言:德语",名称:'哈恩'},{_id: ObjectId('5fba87d13ad6404108191675'),英语语言',名称:'鸡'},{_id: ObjectId('5fba87d13ad6404108191676'),语言:'西班牙语',名称:'pollo'}]}
搜索数据
db.getCollection('foods').find({$text: { $search: "gurke" }})//有效db.getCollection('foods').find({$text: { $search: "gurken" }})//有效db.getCollection('foods').find({$text: { $search: "cucumber"; }})//有效db.getCollection('foods').find({$text: { $search: "cucumbers"; }})//有效db.getCollection('foods').find({$text: { $search: "huhn" }})//有效db.getCollection('foods').find({$text: { $search: "hühner" }})//有效db.getCollection('foods').find({$text: { $search: "chicken"; }})//没有结果db.getCollection('foods').find({$text: { $search: "chickens"; }})//没有结果db.getCollection('foods').find({$text: { $search: "pepino"; }})//没有结果
MongoDb 的文档说:https://docs.mongodb.com/manual/tutorial/specify-language-for-text-index/
<块引用>与索引数据关联的默认语言决定了解析词根(即词干提取)并忽略停用词的规则.
- 这是否意味着仅支持默认语言?
- 为什么它适用于黄瓜而不适用于鸡肉?
我还在检查任何鸡的停用词.https://github.com/mongodb/mongo/blob/master/src/mongo/db/fts/stop_words_english.txt
感谢您的帮助!
问题不是索引,没错,但是你需要添加 $language
或者它使用默认语言(at至少在使用 $text
时).试试
db.collection.find({$text:{$search:"pollo", $language:"spanish"}})
<块引用>如果未指定,搜索将使用索引的默认语言.
另外,如果你运行
db.collection.find({$text:{$search:"pollo"}}).explain()
您会发现查询使用的是默认语言.
I have problems with the language text search of MongoDB. For some records the search works great and for some records it does not work at all.
I have a list of ingredients that I would like to search. The ingredients are in several languages and I like to take care of singular and plural.
Here is my example
Schema
{
translation: [
{
language: {
type: String,
required: true
},
name: {
type: String,
required: true
}
}
],
calories: {
"type": Number
},
protein: {
"type": Number
},
carbohydrate: {
"type": Number
},
fat: {
"type": Number
}
}
Index
foodSchema.index( { "translation.name": "text" }, { default_language: "german" } )
Read Index from DB
[
{
"v" : 2,
"key" : {
"_id" : 1
},
"name" : "_id_"
},
{
"v" : 2,
"key" : {
"_fts" : "text",
"_ftsx" : 1
},
"name" : "translation.name_text",
"default_language" : "german",
"background" : true,
"weights" : {
"translation.name" : 1
},
"language_override" : "language",
"textIndexVersion" : 3
}
]
Records
{
calories: 1,
protein: 2,
carbohydrate: 3,
fat: 4,
translation: [
{
_id: ObjectId('5fba87d13ad6404108191670'),
language: 'german',
name: 'gurke'
},
{
_id: ObjectId('5fba87d13ad6404108191671'),
language: 'english',
name: 'cucumber'
},
{
_id: ObjectId('5fba87d13ad6404108191672'),
language: 'spanish',
name: 'pepino'
}
]
}
// ----
{
calories: 4,
protein: 3,
carbohydrate: 2,
fat: 1,
translation: [
{
_id: ObjectId('5fba87d13ad6404108191674'),
language: 'german',
name: 'huhn'
},
{
_id: ObjectId('5fba87d13ad6404108191675'),
language: 'english',
name: 'chicken'
},
{
_id: ObjectId('5fba87d13ad6404108191676'),
language: 'spanish',
name: 'pollo'
}
]
}
Searching data
db.getCollection('foods').find({$text: { $search: "gurke" }}) //works
db.getCollection('foods').find({$text: { $search: "gurken" }}) //works
db.getCollection('foods').find({$text: { $search: "cucumber" }}) //works
db.getCollection('foods').find({$text: { $search: "cucumbers" }}) //works
db.getCollection('foods').find({$text: { $search: "huhn" }}) //works
db.getCollection('foods').find({$text: { $search: "hühner" }}) //works
db.getCollection('foods').find({$text: { $search: "chicken" }}) // no result
db.getCollection('foods').find({$text: { $search: "chickens" }}) //no result
db.getCollection('foods').find({$text: { $search: "pepino" }}) //no result
The documentation from MongoDb says: https://docs.mongodb.com/manual/tutorial/specify-language-for-text-index/
The default language associated with the indexed data determines the rules to parse word roots (i.e. stemming) and ignore stop words.
- Does it means that only the default language is supported?
- Why is it working for cucumber but not for chicken?
I was also checking the stop words for any chicken. https://github.com/mongodb/mongo/blob/master/src/mongo/db/fts/stop_words_english.txt
Thank you for your help!
The problem isn't the index, that's correct, but you need to add $language
or it uses the default language (at least when using $text
).
Try
db.collection.find({$text:{$search:"pollo", $language:"spanish"}})
If not specified, the search uses the default language of the index.
Also, if you run
db.collection.find({$text:{$search:"pollo"}}).explain()
You'll find out the query is using the default language.
这篇关于具有语言支持的 MongoDb 文本搜索的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!