在动态添加字段上排序 [英] Sort on dynamic added field

查看:96
本文介绍了在动态添加字段上排序的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我通过以下方式在数据库中拥有2000万个文档.

I have 20 Millions documents in my database with the following manner.

 {
    "_id": ObjectId("5bb84e931cb3d25a3b21d14e"),
    "merchant": "menswearhouse.com",
    "category": "Fashion > Clothing > Men's Clothing",
    "feature": [
      "-0.899652959529",
      "-0.02401520125567913",
      "0.08394625037908554",
      "0.06319021433591843",
      "-0.015963224694132805"
    ]
  }

现在我有一个下面的数组,我需要用它来查找文档.

Now I have below array with which I need to find documents.

const dummy = [
  "-0.899652959529",
  "-0.02401520125567913",
  "0.08394625037908554",
  "0.06319021433591843",
  "-0.015963224694132805"
];

我需要

  1. 查找所有值的差,即需要用我的虚拟数组的第一个索引减去feature的第一个索引,以此类推,对于所有5个值,依此类推.
  2. 取所有值的平方
  3. 添加所有5个值
  4. 取平方根.
  5. 对该字段中的所有值进行排序,仅获得5个文档.
  1. Find difference of all the values i.e need to subtract first index of feature with the first index of my dummy array and so on for the all 5 values.
  2. Take square of all values
  3. Add all 5 values
  4. Take square root.
  5. Sort all the values with that field and get only 5 documents.

我正在使用此查询,当我使用$limit时,该查询将$project用作字段.但是我需要在$sort中加上$project ed字段,并且需要处理前5个文档.但是有两千万文档没有返回任何内容,并且可以永远持续下去.

I am using this query which $projects the field when I use $limit. But I need to $sort with the $projected field and need to take top 5 documents. But there are 20 millions document it doesn't return anything and last forever.

db.collection.aggregate([
  { $project: {
    field: {
      $sqrt: {
        $sum: {
          $map: {
            input: { $range: [0, { $size: '$feature' }] },
            as: "d",
            in: {
              $pow: [
                {
                  $subtract: [
                    { $toDouble: { $arrayElemAt: [dummy, "$$d"] }},
                    { $toDouble: { $arrayElemAt: ["$feature", "$$d"] }}
                  ]
                },
                2
              ]
            }
          }
        }
      }
    }
  }}
])

我可以在运行时创建的字段上使用索引吗?

Can I use index on the field which is being created at the runtime?

谢谢!

推荐答案

简短的回答是否".您不能在运行时创建的字段上创建索引.在撰写本文时,MongoDB无法实现您想要的.但是您可以并行计算它们.假设服务器具有适当的资源(CPU和内存),则可以在应用程序中划分作业并并行执行.为了进行简单的数学计算,假设您有20,000,000(mil)个文档,并将它们划分为20个任务.对于每个任务,它将处理1,000,000个文档并返回前5个结果.第一个任务的管道将是

The short answer is no. You can NOT create index on the fields created at runtime. MongoDB, at this writing, can't achieve what you want. But you can calculate them in parallel. Assuming your server has proper resources (CPU and memory), you can, in your application, divide your jobs and execute them in parallel. For simple math, let's assume you have 20,000,000 (mil) docs and you divide them into 20 tasks. For each task, it'll process 1,000,000 docs and return top 5 results. The pipeline for the first task will be

[
    {
        '$sort': {
            '_id': 1
        }
    }, {
        '$skip': 0
    }, {
        '$limit': 1000000
    }, {
        '$project': {
            'field': {
                '$sqrt': {
                    <do your thing>
                }
            }
        }
    }, {
        '$limit': 5
    }
]

返回所有线程(任务)后,将结果(仅100个文档)合并到您的应用程序中,按field对其进行排序,最后得到前5个文档.请注意,您必须考虑硬件资源以达到最佳数量的分割任务.

After all threads (tasks) returned, merge the results (only 100 docs) in your application, sort them by field, and finally get your top 5 documents. Note that you have to consider your hardware resources to come up the optimal number of divided tasks.

这篇关于在动态添加字段上排序的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆