mongodb中匹配的OR条件的百分比 [英] Percentage of OR conditions matched in mongodb

查看:487
本文介绍了mongodb中匹配的OR条件的百分比的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有以下格式的数据。

  {
_id:ObjectId( 534fd4662d22a05415000000 ),
product_id: 50862224,
ean: 8808992479390,
brand: LG,
model: 37LH3000 ,
功能:[{
{
key:屏幕格式,
value: 16:9,
},{
key: DVD播放机/刻录机,
value:否,
},
key:重量(kg),
值: 12.6,
}
...等等
]
}

我需要将一种产品的功能与其他产品进行比较,然后根据功能匹配的百分比将结果划分为不同的类别(100%匹配,50-99%匹配)。 p>

我最初的想法是为每个功能准备一个带有条件的动态查询,并在php中执行百分比操作,但是那意味着mongodb甚至会给我退回那些只有1个功能的产品功能匹配。 II认为类别中的几乎所有产品可能都有某些共同点,所以我担心我可能正在使用php开发许多产品。



我基本上有两个问题


  1. 还有其他替代方法吗?

  2. 我使用的数据结构是否足以支持我要寻找的功能,还是应该考虑更改它?


解决方案

好,您的解决方案确实应该是MongoDB特定的,否则您最终将在客户端进行计算和可能的匹配,而这不会



因此,当然,您真正想要的是一种在服务器端进行处理的方法:

  db.products.aggregate([

//匹配文档满足您条件的
{ $ match:{
$ or:[
{
功能:{
$ elemMatch:{
键:屏幕格式,
值: 16:9
}
}
},
{
功能:{
$ elemMatch:{
键:重量(公斤),
值:{ $ gt: 5 , $ lt: 8}
}
}
},
]
}},

//保持文档和功能部件数组
{ $ project:{
_id:{
_id: $ _id,
product_id: $ product_id,
ean: $ ean,
brand: $ brand,
model: $ model,
功能: $ features
},
features:1
}},

//解开数组
{ $ unwind : $ features},

//查找符合条件的实际元素
{ $ match:{
$ or:[
{
features.key:屏幕格式,
features.value: 16:9
},
{
features.key:重量(公斤),
features.value:{ $ gt: 5, $ lt: 8 }
},
]
}},

//计算匹配的元素
{ $ group:{
_id: $ _id,
count:{ $ sum:1}
}},

//恢复文档并将匹配元素除以
//或条件中的元素数量
{ $ project:{
_id: $ _id._id,
product_id: $ _id.product_id,
ean: $ _id.ean,
brand: $ _id.brand,
model: $ _id.model ,
功能: $ _id.features,
匹配:{ $ divide:[ $ count,2]}
}},

//按匹配百分比排序
{ $ sort:{ matched:-1}}

])

所以您知道 $ or 的长度 条件,那么您只需要找出功能数组中有多少个元素与这些条件匹配即可。因此,这就是管道中第二个$ match的全部内容。



一旦有了该计数,您只需除以作为条件传递的条件数即可 $ or 。这样做的好处是,您现在可以执行一些有用的操作,例如按相关性排序,然后甚至分页结果服务器端。



当然,如果您需要一些额外的分类,您需要做的就是在管道的末尾添加另一个 $ project 阶段:

  { $ project:{
product_id:1
ean :1
品牌:1
型号:1,
功能:1,
匹配:1,
类别:{ $ cond:[
{ $ eq:[ $ matched,1]},
100,
{ $ cond:[
{ $ gte:[ $ matched,.7]},
70-99,
{ $ cond:[
$ gte:[ $ matched ,.4]},
40-69,
40岁以下
]}
]}
]}
}}

或者类似。但是 $ cond 运算符可以在这里为您提供帮助。



体系结构应该很好,因为您可以在功能数组中条目的键和值,这应该可以很好地用于查询。



当然,如果您实际需要的还不止这些,例如多面的搜索和结果,您可以查看Solr或弹性搜索之类的解决方案。但这的完整实现在这里会有些冗长。


I have got my data in following format..

{
  "_id" : ObjectId("534fd4662d22a05415000000"),
  "product_id" : "50862224",
  "ean" : "8808992479390",
  "brand" : "LG",
  "model" : "37LH3000",
  "features" : [{
      {
      "key" : "Screen Format",
      "value" : "16:9",
    }, {
      "key" : "DVD Player / Recorder",
      "value" : "No",
    }, 
      "key" : "Weight in kg",
      "value" : "12.6",
    }
    ... so on
    ]
}

I need to compare features of one product with others and divide the result into separate categories ( 100% match, 50-99 % match) based on % of feature matches..

My initial thought was to prepare a dynamic query with or condition for each feature and do the percentage thing in php but then that means mongodb will return me even those product which only have 1 feature matching. And I I think nearly all products of a category might have some feature in common, so I fear I might be working on lot of products in php.

I have two questions basically.

  1. is there any alternate ways?
  2. And is the data structure I am using is good enough to support the functionality I am looking for, Or should I consider changing it

解决方案

Well your solution really should be MongoDB specific otherwise you will end up doing your calculations and possible matching on the client side, and that is not going to be good for performance.

So of course what you really want is a way for that to have that processing on the server side:

db.products.aggregate([

    // Match the documents that meet your conditions
    { "$match": {
        "$or": [
            { 
                "features": { 
                    "$elemMatch": {
                       "key": "Screen Format",
                       "value": "16:9"
                    }
                }
            },
            { 
                "features": { 
                    "$elemMatch": {
                       "key" : "Weight in kg",
                       "value" : { "$gt": "5", "$lt": "8" }
                    }
                }
            },
        ]
    }},

    // Keep the document and a copy of the features array
    { "$project": {
        "_id": {
            "_id": "$_id",
            "product_id": "$product_id",
            "ean": "$ean",
            "brand": "$brand",
            "model": "$model",
            "features": "$features"
        },
        "features": 1
    }},

    // Unwind the array
    { "$unwind": "$features" },

    // Find the actual elements that match the conditions
    { "$match": {
        "$or": [
            { 
               "features.key": "Screen Format",
               "features.value": "16:9"
            },
            { 
               "features.key" : "Weight in kg",
               "features.value" : { "$gt": "5", "$lt": "8" }
            },
        ]
    }},

    // Count those matched elements
    { "$group": {
        "_id": "$_id",
        "count": { "$sum": 1 }
    }},

    // Restore the document and divide the mated elements by the
    // number of elements in the "or" condition
    { "$project": {
        "_id": "$_id._id",
        "product_id": "$_id.product_id",
        "ean": "$_id.ean",
        "brand": "$_id.brand",
        "model": "$_id.model",
        "features": "$_id.features",
        "matched": { "$divide": [ "$count", 2 ] }
    }},

    // Sort by the matched percentage
    { "$sort": { "matched": -1 } }

])

So as you know the "length" of the $or condition being applied, then you simply need to find out how many of the elements in the "features" array match those conditions. So that is what the second $match in the pipeline is all about.

Once you have that count, you simply divide by the number of conditions what were passed in as your $or. The beauty here is that now you can do something useful with this like sort by that relevance and then even "page" the results server side.

Of course if you want some additional "categorization" of this, all you would need to do is add another $project stage to the end of the pipeline:

    { "$project": {
        "product_id": 1
        "ean": 1
        "brand": 1
        "model": 1,
        "features": 1,
        "matched": 1,
        "category": { "$cond": [
            { "$eq": [ "$matched", 1 ] },
            "100",
            { "$cond": [ 
                { "$gte": [ "$matched", .7 ] },
                "70-99",
                { "$cond": [
                   "$gte": [ "$matched", .4 ] },
                   "40-69",
                   "under 40"
                ]} 
            ]}
        ]}
    }}

Or as something similar. But the $cond operator can help you here.

The architecture should be fine as you have it as you can have a compound index on the "key" and "value" for the entries in your features array and this should scale well for queries.

Of course if you actually need something more than that, such as faceted searching and results, you can look at solutions like Solr or elastic search. But the full implementation of that would be a bit lengthy for here.

这篇关于mongodb中匹配的OR条件的百分比的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆