在整个集合的字符串字段中查找最常用的单词 [英] Finding most commonly used word in a string field throughout a collection

查看：73 发布时间：2020/5/5 15:45:09 string mongodb mapreduce aggregation-framework mongodb-aggregation

本文介绍了在整个集合的字符串字段中查找最常用的单词的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

假设我有一个类似于以下内容的Mongo集合:

Let's say I have a Mongo collection similar to the following:

[
  { "foo": "bar baz boo" },
  { "foo": "bar baz" },
  { "foo": "boo baz" }
]

是否可以确定哪个词在foo字段中出现频率最高(理想情况下是带有计数)?

Is it possible to determine which words appear most often within the foo field (ideally with a count)?

例如，我很喜欢这样的结果集:

For instance, I'd love a result set of something like:

[
  { "baz" : 3 },
  { "boo" : 2 },
  { "bar" : 2 }
]

推荐答案

最近关闭了 JIRA问题有关要在聚合框架的$project阶段中使用的$split运算符.
在此位置上，您可以创建这样的管道

There was recently closed a JIRA issue about a $split operator to be used in the $project stage of the aggregation framework.
With that in place you could create a pipeline like this

db.yourColl.aggregate([
    {
        $project: {
            words: { $split: ["$foo", " "] }
        }
    },
    {
        $unwind: {
            path: "$words"
        }
    },
    {
        $group: {
            _id: "$words",
            count: { $sum: 1 }
        }
    }
])

结果看起来像这样

/* 1 */
{
    "_id" : "baz",
    "count" : 3.0
}

/* 2 */
{
    "_id" : "boo",
    "count" : 2.0
}

/* 3 */
{
    "_id" : "bar",
    "count" : 2.0
}

这篇关于在整个集合的字符串字段中查找最常用的单词的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

在整个集合的字符串字段中查找最常用的单词 [英] Finding most commonly used word in a string field throughout a collection

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录关闭

在整个集合的字符串字段中查找最常用的单词 [英] Finding most commonly used word in a string field throughout a collection

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录 关闭

登录关闭