具有语言支持的 MongoDb 文本搜索 [英] MongoDb text search with language support

查看:122
本文介绍了具有语言支持的 MongoDb 文本搜索的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我在使用 MongoDB 的语言文本搜索时遇到问题.对于某些记录,搜索效果很好,而对于某些记录,它根本不起作用.

我有一份要搜索的成分列表.成分有多种语言,我喜欢照顾单数和复数.

这是我的例子

架构

{翻译: [{语: {类型:字符串,要求:真实},名称: {类型:字符串,要求:真实}}],卡路里:{类型":数字},蛋白质: {类型":数字},碳水化合物:{类型":数字},胖的: {类型":数字}}

索引

foodSchema.index( { "translation.name": "text" }, { default_language: "german" } )

从数据库读取索引

<预><代码>[{v": 2,钥匙":{_id": 1},姓名":_id_"},{v": 2,钥匙":{_fts":文本",_ftsx": 1},姓名":translation.name_text",默认语言":德国",背景": 真的,权重":{translation.name": 1},语言覆盖":语言",文本索引版本": 3}]

记录

<代码>{卡路里:1,蛋白质:2,碳水化合物:3,脂肪:4,翻译: [{_id: ObjectId('5fba87d13ad6404108191670'),语言:德语",名称:'古克'},{_id: ObjectId('5fba87d13ad6404108191671'),英语语言',名称:'黄瓜'},{_id: ObjectId('5fba87d13ad6404108191672'),语言:'西班牙语',名称:'佩皮诺'}]}//----{卡路里:4,蛋白质:3,碳水化合物:2,脂肪:1,翻译: [{_id: ObjectId('5fba87d13ad6404108191674'),语言:德语",名称:'哈恩'},{_id: ObjectId('5fba87d13ad6404108191675'),英语语言',名称:'鸡'},{_id: ObjectId('5fba87d13ad6404108191676'),语言:'西班牙语',名称:'pollo'}]}

搜索数据

db.getCollection('foods').find({$text: { $search: "gurke" }})//有效db.getCollection('foods').find({$text: { $search: "gurken" }})//有效db.getCollection('foods').find({$text: { $search: "cucumber"; }})//有效db.getCollection('foods').find({$text: { $search: "cucumbers"; }})//有效db.getCollection('foods').find({$text: { $search: "huhn" }})//有效db.getCollection('foods').find({$text: { $search: "hühner" }})//有效db.getCollection('foods').find({$text: { $search: "chicken"; }})//没有结果db.getCollection('foods').find({$text: { $search: "chickens"; }})//没有结果db.getCollection('foods').find({$text: { $search: "pepino"; }})//没有结果

MongoDb 的文档说:https://docs.mongodb.com/manual/tutorial/specify-language-for-text-index/

<块引用>

与索引数据关联的默认语言决定了解析词根(即词干提取)并忽略停用词的规则.

  • 这是否意味着仅支持默认语言?
  • 为什么它适用于黄瓜而不适用于鸡肉?

我还在检查任何鸡的停用词.https://github.com/mongodb/mongo/blob/master/src/mongo/db/fts/stop_words_english.txt

感谢您的帮助!

解决方案

问题不是索引,没错,但是你需要添加 $language 或者它使用默认语言(at至少在使用 $text 时).试试

 db.collection.find({$text:{$search:"pollo", $language:"spanish"}})

$language 文档

<块引用>

如果未指定,搜索将使用索引的默认语言.

另外,如果你运行

 db.collection.find({$text:{$search:"pollo"}}).explain()

您会发现查询使用的是默认语言.

I have problems with the language text search of MongoDB. For some records the search works great and for some records it does not work at all.

I have a list of ingredients that I would like to search. The ingredients are in several languages and I like to take care of singular and plural.

Here is my example

Schema

{
  translation: [
    {
      language: {
        type: String,
        required: true
      },
      name: {
        type: String,
        required: true
      }
    }
  ],
  calories: {
    "type": Number
  },
  protein: {
    "type": Number
  },
  carbohydrate: {
    "type": Number
  },
  fat: {
    "type": Number
  }
}

Index

foodSchema.index( { "translation.name": "text" }, { default_language: "german" } )

Read Index from DB

[
    {
        "v" : 2,
        "key" : {
            "_id" : 1
        },
        "name" : "_id_"
    },
    {
        "v" : 2,
        "key" : {
            "_fts" : "text",
            "_ftsx" : 1
        },
        "name" : "translation.name_text",
        "default_language" : "german",
        "background" : true,
        "weights" : {
            "translation.name" : 1
        },
        "language_override" : "language",
        "textIndexVersion" : 3
    }
]

Records

{
  calories: 1,
  protein: 2,
  carbohydrate: 3,
  fat: 4,
  translation: [
    {
      _id: ObjectId('5fba87d13ad6404108191670'),
      language: 'german',
      name: 'gurke'
    },
    {
      _id: ObjectId('5fba87d13ad6404108191671'),
      language: 'english',
      name: 'cucumber'
    },
    {
      _id: ObjectId('5fba87d13ad6404108191672'),
      language: 'spanish',
      name: 'pepino'
    }
  ]
}

// ----

{    
  calories: 4,
  protein: 3,
  carbohydrate: 2,
  fat: 1,
  translation: [
    {
      _id: ObjectId('5fba87d13ad6404108191674'),
      language: 'german',
      name: 'huhn'
    },
    {
      _id: ObjectId('5fba87d13ad6404108191675'),
      language: 'english',
      name: 'chicken'
    },
    {
      _id: ObjectId('5fba87d13ad6404108191676'),
      language: 'spanish',
      name: 'pollo'
    }
  ]
}

Searching data

db.getCollection('foods').find({$text: { $search: "gurke" }}) //works
db.getCollection('foods').find({$text: { $search: "gurken" }}) //works
db.getCollection('foods').find({$text: { $search: "cucumber" }}) //works
db.getCollection('foods').find({$text: { $search: "cucumbers" }}) //works
db.getCollection('foods').find({$text: { $search: "huhn" }}) //works
db.getCollection('foods').find({$text: { $search: "hühner" }}) //works
db.getCollection('foods').find({$text: { $search: "chicken" }}) // no result
db.getCollection('foods').find({$text: { $search: "chickens" }}) //no result
db.getCollection('foods').find({$text: { $search: "pepino" }}) //no result

The documentation from MongoDb says: https://docs.mongodb.com/manual/tutorial/specify-language-for-text-index/

The default language associated with the indexed data determines the rules to parse word roots (i.e. stemming) and ignore stop words.

  • Does it means that only the default language is supported?
  • Why is it working for cucumber but not for chicken?

I was also checking the stop words for any chicken. https://github.com/mongodb/mongo/blob/master/src/mongo/db/fts/stop_words_english.txt

Thank you for your help!

解决方案

The problem isn't the index, that's correct, but you need to add $language or it uses the default language (at least when using $text). Try

 db.collection.find({$text:{$search:"pollo", $language:"spanish"}})

$language Docs

If not specified, the search uses the default language of the index.

Also, if you run

 db.collection.find({$text:{$search:"pollo"}}).explain()

You'll find out the query is using the default language.

这篇关于具有语言支持的 MongoDb 文本搜索的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆