MongoDB - 全文索引 - 全文搜索 - 词干 [英] MongoDB - Full Text Index - Full Text Search - stemming

查看:142
本文介绍了MongoDB - 全文索引 - 全文搜索 - 词干的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我注意到,如果我在某个集合的全文搜索字符串字段中输入值seasons,那么当查询季节时,MongoDB会查找此值。但是如果我输入更复杂的东西,比如'老鼠'或'标准',当我分别查询'鼠标'或'标准'时,它没有找到这些值。这是否正常,并且MongoDB能够干什么还有什么不明确的规则?

I noticed that if I enter the value 'seasons' in a full text search enabled string field of some collection, then MongoDB finds this value when I query for 'season'. But if I enter something more complex like e.g. 'mice' or 'criteria', it does not find these values when I query for 'mouse' or 'criterion' respectively. Is that normal and are there any clear rules what MongoDB is able to stem and what not?

[test] 2014-03-30 18:25:09.551 >>> db.TestFullText7.find();
{
        "_id" : ObjectId("53389720063ab25d2d55c94c"),
        "dt" : ISODate("2014-03-30T22:13:52.717Z"),
        "title" : "mice",
        "txt" : "mice"
}
{
        "_id" : ObjectId("5338994c063ab25d2d55c94d"),
        "dt" : ISODate("2014-03-30T22:23:08.259Z"),
        "title" : "criteria",
        "txt" : "criteria"
}
{
        "_id" : ObjectId("533899c5063ab25d2d55c94e"),
        "dt" : ISODate("2014-03-30T22:25:09.551Z"),
        "title" : "seasons",
        "txt" : "seasons"
}
[test] 2014-03-30 18:25:13.295 >>> db.runCommand({"text" : "TestFullText7", "search" : "season"});
{
        "queryDebugString" : "season||||||",
        "language" : "english",
        "results" : [
                {
                        "score" : 2,
                        "obj" : {
                                "_id" : ObjectId("533899c5063ab25d2d55c94e"),
                                "dt" : ISODate("2014-03-30T22:25:09.551Z"),
                                "title" : "seasons",
                                "txt" : "seasons"
                        }
                }
        ],
        "stats" : {
                "nscanned" : 1,
                "nscannedObjects" : 0,
                "n" : 1,
                "nfound" : 1,
                "timeMicros" : 148
        },
        "ok" : 1
}
[test] 2014-03-30 18:25:22.406 >>> db.runCommand({"text" : "TestFullText7", "search" : "mouse"});
{
        "queryDebugString" : "mous||||||",
        "language" : "english",
        "results" : [ ],
        "stats" : {
                "nscanned" : 0,
                "nscannedObjects" : 0,
                "n" : 0,
                "nfound" : 0,
                "timeMicros" : 110
        },
        "ok" : 1
}
[test] 2014-03-30 18:25:30.986 >>> db.TestFullText7.getIndexes();
[
        {
                "v" : 1,
                "key" : {
                        "_id" : 1
                },
                "ns" : "test.TestFullText7",
                "name" : "_id_"
        },
        {
                "v" : 1,
                "key" : {
                        "_fts" : "text",
                        "_ftsx" : 1
                },
                "ns" : "test.TestFullText7",
                "name" : "$**_text",
                "weights" : {
                        "$**" : 1
                },
                "default_language" : "english",
                "language_override" : "language",
                "textIndexVersion" : 1
        }
]
[test] 2014-03-30 18:25:45.228 >>>


推荐答案

MongoDB使用Snowball词干库。不幸的是,这看起来是这个库的一个局限性。

MongoDB uses the Snowball stemming library. Unfortunately, this looks to be one of the limitations of this library.

你可以看到英文的词干
here 。比较词汇+词干等价页面,你可以看到鼠标变成Mous和Mice仍然是Mice。

You can see the pages for the english stemmer here. Compare the vocabulary + stemmed equivalent page and you can see "Mouse" becoming "Mous" and "Mice" still remaining "Mice".

你可以看到MongoDB使用了Snowball在其代码库此处这里

You can see MongoDB's use of Snowball in their codebase here and here

这篇关于MongoDB - 全文索引 - 全文搜索 - 词干的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆