猫鼬-根据得分或权重在三个字段中搜索文本 [英] Mongoose - Search for text in three fields based on score or weightage

查看:67
本文介绍了猫鼬-根据得分或权重在三个字段中搜索文本的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我在MongoDB之上使用Mongoose.这就是我的模型的外观.

I am using Mongoose on top of MongoDB. This is how my model looks.

var BookSchema = new Schema({
  name: String,
  viewCount: { type: Number, default: 0 },
  description: {
    type: String,
    default: 'No description'
  },
  body: {
    type: String,
    default: ''
  }
    }
});

我需要在Name, Description, Body字段上搜索一些文本.到目前为止,这就是我正在做的事情.其工作原理:

I need to search for some text on over Name, Description, Body fields. So far this is what I am doing & its working:

Book.find().or([{ 'name': { $regex: term, $options: "$i" }}, { 'description': { $regex: term, $options: "$i" }}, { 'body': { $regex: term, $options: "$i" }}]).exec(
    function (err, topics) {
      if (err) {
        return handleError(res, err);
      }
      return res.status(200).json(books);
    });

问题:我需要提出一些机制,为权重最高的所有字段(Name,Description,Body)分配权重/分数,其中name权重最高,description权重较小比名称和body的权重最小.结果出来后,我想按分数/权重对结果进行排序.

Problem: I need to come up with some mechanism where I assign weightage/score to all the fields (Name,Description,Body) with name having highest weightage, description having little less weightage than name and body having the least weightage. When the results comes, I want to sort the result by the score/weight.

到目前为止,我已经研究了这个链接& 权重,但不确定什么是获得预期结果的最佳方法.我也想了解一下,是否每次搜索或每次活动都需要创建权重?如何用猫鼬实现体重?

So far I have looked into this link & weights, but not sure what is the best way to get the desired result. I also wants to understand, do I need to create weights every time bebore I search or its a one time activity & how to implemt weights with Mongoose ?

推荐答案

A 文本索引" 搜索确实是只要您要搜索整个单词,这可能是这里的最佳选择.

A "text index" and search is indeed likely the best option here as long as you are searching for whole words.

在架构定义中添加文本索引非常简单:

Adding a text index to your schema definition is quite simple:

BookSchema.index(
    {
         "name": "text",
         "description": "text",
         "body": "text"
    },
    {
        "weights": {
            "name": 5,
            "description": 2
        }
    }
)

这使您可以通过对字段的设置"权重执行简单的搜索:

This allows you to perform simple searches with "set" weighting to the fields:

Book.find({ "$text": { "$search": "Holiday School Year" } })
    .select({ "score": { "$meta": "textScore" } })
    .sort({ "score": { "$meta": "textScore" } })
    .exec(function(err,result) {

    }
);

匹配的每个词项都将在发现最大权重和出现次数的字段中进行考虑.

Where each term matched will be considered against the field it was found in which gives the most weight and the number of occurances.

分配权重将附加到索引",因此定义一次即可完成,无法更改.另一个限制是,在文本搜索"中,它不会查看部分"字词.例如,"ci"与"City"或"Citizen"不匹配,因此,您需要使用正则表达式.

Assigning the weights is attached to the "index", so the definition is done once and cannot be changed. Another limitation is that at "text search" does not look at "partial" words. For example "ci" does not match "City" or "Citizen", and for such a thing you would need a regular expression instead.

如果您需要的灵活性更高,或者通常必须能够动态更改结果的权重,则需要使用聚合框架或mapReduce之类的东西.

If you needed more flexibilty than that or generally must be able to dynamically change the weighting of results then you need something like the aggregation framework or mapReduce.

但是,聚合框架无法执行逻辑"匹配操作(它可以通过$match运算符进行过滤,但不能过滤逻辑"匹配项).您可以使用单个单词,但如果合适的话,可以完全匹配".

The aggregation framework however cannot perform a "logical" match operation ( it can filter though the $match operator, but not a "logical" match ) of a "regular expression" to your terms. You can work with single words and "exact" matches though if this suits.

Book.aggregate(
    [
        { "$match": {
            "$or": [
                { "name": /Holiday/ },
                { "description": /Holiday/ },
                { "body": /Holiday/ }
            ]
        }},
        { "$project": {
            "name": 1,
            "description": 1,
            "body": 1,
            "score": {
                "$add": [
                    { "$cond": [{ "$eq": [ "$name", "Holiday" ] },5,0 ] },
                    { "$cond": [{ "$eq": [ "$description", "Holiday" ] },2,0 ] },
                    { "$cond": [{ "$eq": [ "$body", "Holiday" ] },1,0 ] }
                ]
            }
        }},
        { "$sort": { "score": -1 } }
    ],
    function(err,results) {

    }
)

由于聚合管道使用数据结构来查询,您可以在此处将每次执行的权重参数更改为当前所需的值.

As an aggregation pipeline uses a data structure to query where you can change the parameters for weight on each exection to whatever you presently need.

MapReduce具有相似的原理,您可以在作为前导元素发出的主键的一部分中包含计算出的分数". MapReduce会自然排序此键发出的所有输入,以此作为优化后馈给reduce函数的方法.但是,您无法进一步对此类结果进行排序或限制".

MapReduce shares a similar principle, where you can include a calculated "score" in part of the primary key emitted as the leading element. MapReduce naturally sorts all input emitted by this key as an optimization for feeding to a reduce function. However you cannot further sort or "limit" such a result.

通常,您可以选择这些选项并决定最适合您的情况.

Those are generally your options to look at and decide which best suits your case.

这篇关于猫鼬-根据得分或权重在三个字段中搜索文本的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆