使用猫鼬的Mongo自定义排序策略 [英] Mongo Custom Sort Strategy using mongoose

查看:88
本文介绍了使用猫鼬的Mongo自定义排序策略的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

首先:我正在使用Mongo 2.6和Mongoose 3.8.8

First of all: I'm using Mongo 2.6 and Mongoose 3.8.8

我有以下架构:

var Link = new Schema({

  title: { type: String, trim: true },
  owner: { id: { type: Schema.ObjectId }, name: { type: String } },
  url:   { type: String, default: '', trim: true},
  stars: { users: [ { name: { type: String }, _id: {type: Schema.ObjectId} }] },
  createdAt: { type: Date, default: Date.now }

});

我的收藏集已经有50万个文档.

And my collection already have 500k documents.

我需要使用自定义策略对文档进行排序.我最初的解决方案是使用聚合框架.

What I need is sort the documents using a custom strategy. My initial solution was use the aggregate framework.

 var today = new Date();
 //fx = (TodayDay * TodayYear) - ( DocumentCreatedDay * DocumentCreatedYear)
 var relevance = { $subtract: [
    { $multiply: [ { $dayOfYear: today },  { $year: today } ]  },
    { $multiply: [ { $dayOfYear: '$createdAt' }, { $year: '$createdAt' } ]  }
   ]}


 var projection = {
    _id: 1,
    url: 1,
    title: 1,
    createdAt: 1,
    thumbnail: 1,
    stars: { $size: '$stars.users'}
    ranking: { $multiply: [ relevance, { $size: '$stars.users' } ] }
  }

var sort = {
    $sort: { ranking: 1, stars: 1 }
  }

var page = 1;
var limit = { $limit: 40 }
var skip = { $skip: ( 40 * (page - 1) ) }
var project = { $project: projection }

Link.aggregate([project, sort, limit, skip]).exec(resultCallback);

直到100k,它一直运行良好,此后查询变得越来越慢. 我该怎么做到?
重新设计?
我在使用投影吗?

It works nicely until 100k, after that the query is getting slow and slow. How I could accomplish that ?
Redesign ?
Wrong use of projection Am I doing ?

感谢您的时间!

推荐答案

您可以在更新时完成所有这些操作,然后可以对排名进行索引并使用范围查询来实现分页.比 $skip $limit 的使用要好得多,对于任何大数据来说,这无论如何都是坏消息.您应该能够找到许多来源,以确认跳过和限制是分页的不良做法.

You can do all of this as you update and then you can actually index on ranking and use range queries in order to implement your paging. Much better than the use of $skip and $limit which in any form is bad news for large data. You should be able to find many sources that confirm that skip and limit is a poor practice for paging.

这里唯一要注意的是,因为您不能使用.update()类型的语句来实际引用另一个字段的现有值,所以在更新时并发问题必须要小心.这需要滚动"一些可以使用.findOneAndUpdate()方法执行的自定义锁处理:

The only catch here is since you cannot use an .update() type of statement to actually refer to the existing value of another field, you have to be careful with concurrency issues on updates. This required "rolling in" some custom lock handling which you can do with the .findOneAndUpdate() method:

Link.findOneAndUpdate(
    { "_id": docId, "locked": false },
    { "locked": true },
    function(err,doc) {

        if ( doc.locked.true ) {
            // then update your document

            // I would just use the epoch date difference per day
            var relevance = (
               ( Date.now.valueOf() - ( Date.now().valueOf() % 1000 * 60 * 60 * 24) )
             - ( doc.createdAt.valueOf() - ( doc.createdAt.valueOf() % 1000 * 60 * 60 * 24 ))
            );

            var update = { "$set": { "locked": false } };

            if ( actionAdd ) {
              update["$push"] = { "stars.users": star };
              update["$set"]["score"] = relevance * ( doc.stars.users.length +1 );
            } else {
              update["$pull"] = { "stars.users": star };
              update["$set"]["score"] = relevance * ( doc.stars.users.length -1 );
            }

            // Then update
            Link.findOneAndUpdate(
                { "_id": doc._id, "locked":  update,function(err,newDoc) {

               // possibly check that new "locked" is false, but really
               // that should be okay
            });

        } else {
          // some mechanism to retry "n" times at interval 
          // or report that you cannot update
        }

    }

)

这里的想法是,您只能获取状态为false的锁定"状态的文档才能进行实际更新,并且第一个更新"操作只是将该值设置为true,这样其他任何操作都无法进行.操作可以更新文档,直到完成为止.

The idea there is that you can only grab a document with a "locked" status equal to false in order to actually update, and the first "update" operation just sets that value to true so that no other operation could update the document until this completes.

根据代码注释,您可能希望尝试进行一些操作,而不是仅使更新失败,因为可能还会有其他操作在数组中添加或减去.

As per the code comments, you probably want to have a few tries at doing this rather than just failing the update as there could be another operation adding or subtracting from the array.

然后,根据当前更新的模式",如果要添加到数组中或从数组中删除某项,则只需更改要发出的update语句即可执行任一操作并设置适当的得分"值在您的文档中.

Then depending on the "mode" of your current update if you are either adding to the array or taking an item off of there you simply alter the update statement to be issued to do either operation and set the appropriate "score" value in your document.

然后,更新当然会将锁定"状态设置为false,并且检查当前状态是否不是true是有意义的,尽管此时确实应该可以.但这为您提供了一些引发异常的空间.

The update will then of course set the "locked" status to false and it makes sense to check that the current status is not true though it really should be okay at this point. But this gives you some room on being able to raise exceptions.

这可以管理一般的更新情况,但是您仍然无法在此处整理排名"顺序,因为跳过和限制仍然不是您想要的性能.最好通过定期更新另一个字段(可以用于确定的范围"查询)来最好地解决这一问题,但是您可能真的只想关心页面范围内最相关"的得分范围,而不是更新整个集合.

That manages the general update situation but you still have a problem with sorting out your "ranking" order here as skip and limit are still not what you want for performance. That is probably best handled by a periodic update of yet another field which you can use for a definitive "range" query, but you probably only really want to be concerned with the the most "relevant" score range in a set range of pages, rather than update the whole collection.

更新需要定期进行,因为如果您尝试更改单个更新中多个文档的排序"顺序,则会遇到并发问题.因此,您需要确保此过程不会与其他此类更新重叠.

The update needs to be periodic as you will have concurrency problems if you try to change the "ranking" order of multiple documents in individual updates. So you need to make sure this process does not overlap with another such update.

最后,请考虑您的得分"计算,因为您真正想要的是顶部的最新和明星最多"的内容.当前的计算存在一些缺陷,例如同一天和0个星",但我将其留给您解决.

As a final note consider your "score" calculation as what you really want is the newest and "most starred" content at the top. The current calculation has some flaws there such as on the same day and 0 "stars", but I'll leave that to you to work out.

这实质上是您需要为解决方案做的事情.尝试使用聚合框架在大型集合上动态地执行此操作不会为您的应用程序体验带来良好的性能.因此,这里没有什么指针可以用来更有效地保持结果的顺序.

This is essentially what you need to do for your solution. Trying to do this dynamically on a large collection using the aggregation framework is not going to produce favorable performance for your application experience. So there are few pointers here to things you can do to more efficiently maintain the order of your results.

这篇关于使用猫鼬的Mongo自定义排序策略的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆