MongoDB - 全文索引 - 全文搜索 - 词干 [英] MongoDB - Full Text Index - Full Text Search - stemming
问题描述
我注意到,如果我在某个集合的全文搜索字符串字段中输入值seasons,那么当查询季节时,MongoDB会查找此值。但是如果我输入更复杂的东西,比如'老鼠'或'标准',当我分别查询'鼠标'或'标准'时,它没有找到这些值。这是否正常,并且MongoDB能够干什么还有什么不明确的规则?
I noticed that if I enter the value 'seasons' in a full text search enabled string field of some collection, then MongoDB finds this value when I query for 'season'. But if I enter something more complex like e.g. 'mice' or 'criteria', it does not find these values when I query for 'mouse' or 'criterion' respectively. Is that normal and are there any clear rules what MongoDB is able to stem and what not?
[test] 2014-03-30 18:25:09.551 >>> db.TestFullText7.find();
{
"_id" : ObjectId("53389720063ab25d2d55c94c"),
"dt" : ISODate("2014-03-30T22:13:52.717Z"),
"title" : "mice",
"txt" : "mice"
}
{
"_id" : ObjectId("5338994c063ab25d2d55c94d"),
"dt" : ISODate("2014-03-30T22:23:08.259Z"),
"title" : "criteria",
"txt" : "criteria"
}
{
"_id" : ObjectId("533899c5063ab25d2d55c94e"),
"dt" : ISODate("2014-03-30T22:25:09.551Z"),
"title" : "seasons",
"txt" : "seasons"
}
[test] 2014-03-30 18:25:13.295 >>> db.runCommand({"text" : "TestFullText7", "search" : "season"});
{
"queryDebugString" : "season||||||",
"language" : "english",
"results" : [
{
"score" : 2,
"obj" : {
"_id" : ObjectId("533899c5063ab25d2d55c94e"),
"dt" : ISODate("2014-03-30T22:25:09.551Z"),
"title" : "seasons",
"txt" : "seasons"
}
}
],
"stats" : {
"nscanned" : 1,
"nscannedObjects" : 0,
"n" : 1,
"nfound" : 1,
"timeMicros" : 148
},
"ok" : 1
}
[test] 2014-03-30 18:25:22.406 >>> db.runCommand({"text" : "TestFullText7", "search" : "mouse"});
{
"queryDebugString" : "mous||||||",
"language" : "english",
"results" : [ ],
"stats" : {
"nscanned" : 0,
"nscannedObjects" : 0,
"n" : 0,
"nfound" : 0,
"timeMicros" : 110
},
"ok" : 1
}
[test] 2014-03-30 18:25:30.986 >>> db.TestFullText7.getIndexes();
[
{
"v" : 1,
"key" : {
"_id" : 1
},
"ns" : "test.TestFullText7",
"name" : "_id_"
},
{
"v" : 1,
"key" : {
"_fts" : "text",
"_ftsx" : 1
},
"ns" : "test.TestFullText7",
"name" : "$**_text",
"weights" : {
"$**" : 1
},
"default_language" : "english",
"language_override" : "language",
"textIndexVersion" : 1
}
]
[test] 2014-03-30 18:25:45.228 >>>
推荐答案
MongoDB使用Snowball词干库。不幸的是,这看起来是这个库的一个局限性。
MongoDB uses the Snowball stemming library. Unfortunately, this looks to be one of the limitations of this library.
你可以看到英文的词干
here 。比较词汇+词干等价页面,你可以看到鼠标变成Mous和Mice仍然是Mice。
You can see the pages for the english stemmer here. Compare the vocabulary + stemmed equivalent page and you can see "Mouse" becoming "Mous" and "Mice" still remaining "Mice".
你可以看到MongoDB使用了Snowball在其代码库此处和这里
You can see MongoDB's use of Snowball in their codebase here and here
这篇关于MongoDB - 全文索引 - 全文搜索 - 词干的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!