如何在文档字段 MongoDB 中找到相似性? [英] How to find similarity in document field MongoDB?

查看:68
本文介绍了如何在文档字段 MongoDB 中找到相似性?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

给定的数据如下所示:

{'_id': 'foobar1',
 'about': 'similarity in comparison',
 'categories': ['one', 'two', 'three']}
{'_id': 'foobar2',
 'about': 'perfect similarity in comparison',
 'categories': ['one']}
{'_id': 'foobar3',
 'about': 'partial similarity',
 'categories': ['one', 'two']}
{'_id': 'foobar4',
 'about': 'none',
 'categories': ['one', 'two']}

我想找到一种方法来获取单个项目与集合中所有其他项目之间的相似度,然后按相似度最高的顺序返回它们.相似度是基于共同词的数量,已经有一个函数int similar(String one, String two)

I would like to find a way to get a similarity between a single item and all other items in the collection then return them in order of highest similarity. Similarity is based on number of words in common, there is already a function int similar(String one, String two)

例如:如果我想要foobar1about字段的相似度列表,它将返回

For example: if I want the similarity list for about field of foobar1, it would return

[{'_id': 'foobar2'}, {'_id': 'foobar3'}, {'_id': 'foobar4'}]

我用 morphia 来做这件事,但只用 mongoDB 实现,我可以弄清楚其余的

I am doing this with morphia, but with just the mongoDB implementation, I could figure the rest out

推荐答案

如果您需要计算 about 字段上的文本相似度,实现此目的的一种方法是使用 文本索引.

If you need to compute text similarity on the about field, one way to achieve this is to use text index.

例如(在 mongo shell 中),如果您在 about 字段上创建文本索引:

For example (in the mongo shell), if you create a text index on the about field:

db.collection.createIndex({about: 'text'})

您可以执行一个查询,例如(取自 https://docs.mongodb.com/manual/reference/operator/query/text/#sort-by-text-search-score):

you could execute a query such as (example taken from https://docs.mongodb.com/manual/reference/operator/query/text/#sort-by-text-search-score):

db.collection.find({$text: {$search: 'similarity in comparison'}}, {score: {$meta: 'textScore'}}).sort({score: {$meta: 'textScore'}})

对于您的示例文档,查询应返回如下内容:

With your example documents, the query should return something like:

{
  "_id": "foobar1",
  "about": "similarity in comparison",
  "score": 1.5
}
{
  "_id": "foobar2",
  "about": "perfect similarity in comparison",
  "score": 1.3333333333333333
}
{
  "_id": "foobar3",
  "about": "partial similarity",
  "score": 0.75
}

其通过降低相似性得分排序.请注意,与您的示例结果不同,文档 foobar4 不会返回,因为 foobar4 中不存在任何查询词.

which are sorted by decreasing similarity score. Please note that unlike your example result, document foobar4 is not returned because none of the queried words are present in foobar4.

文本索引在 MongoDB 中被认为是一种特殊类型的索引,因此对其使用有一些特定的规则.详情请见:

Text indexes are considered a special type of index in MongoDB, and thus comes with some specific rules on its usage. For more details, please see:

这篇关于如何在文档字段 MongoDB 中找到相似性?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆