在聚合管道、MapReduce 或 runCommand 中使用存储的 JavaScript 函数 [英] Using stored JavaScript functions in the Aggregation pipeline, MapReduce or runCommand

查看:21
本文介绍了在聚合管道、MapReduce 或 runCommand 中使用存储的 JavaScript 函数的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

有没有办法在管道或 mapreduce 中使用保存为 db.system.js.save(...) 的用户定义函数?

Is there a way to use a user-defined function saved as db.system.js.save(...) in pipeline or mapreduce?

推荐答案

您保存到 system.js 的任何函数都可供JavaScript"处理语句使用,例如 $where 运算符和 mapReduce 并且可以被分配的 _id 值引用.

Any function you save to system.js is available for usage by "JavaScript" processing statements such as the $where operator and mapReduce and can be referenced by the _id value is was asssigned.

db.system.js.save({ 
   "_id": "squareThis", 
   "value": function(a) { return a*a } 
})

还有一些数据插入到样本"集合中:

And some data inserted to "sample" collection:

{ "_id" : ObjectId("55aafd2bacbed38e06f9eccf"), "a" : 1 }
{ "_id" : ObjectId("55aafea6acbed38e06f9ecd0"), "a" : 2 }
{ "_id" : ObjectId("55aafeabacbed38e06f9ecd1"), "a" : 3 }

那么:

db.sample.mapReduce(
    function() {
       emit(null, squareThis(this.a));
    },
    function(key,values) {
        return Array.sum(values);
    },
    { "out": { "inline": 1 } }
 );

给出:

   "results" : [
            {
                    "_id" : null,
                    "value" : 14
            }
    ],

或者用 $where:

db.sample.find(function() { return squareThis(this.a) == 9 })
{ "_id" : ObjectId("55aafeabacbed38e06f9ecd1"), "a" : 3 }

但在两者都不是"的情况下,您可以使用全局变量,例如数据库 db 引用或其他函数.$wheremapReduce 文档都包含您可以在此处执行的操作的限制信息.因此,如果您认为要执行在另一个集合中查找数据"之类的操作,那么您可以忘记它,因为它是不允许的".

But in "neither" case can you use globals such as the database db reference or other functions. Both $where and mapReduce documentation contain information of the limits of what you can do here. So if you thought you were going to do something like "look up data in another collection", then you can forget it because it is "Not Allowed".

每个 MongoDB 命令操作实际上无论如何都是对幕后"的runCommand"操作的调用.但是,除非该命令实际执行的是调用 JavaScript 处理引擎",否则用法就变得无关紧要了.无论如何,只有少数命令可以执行此操作,例如 mapReducegroupeval,当然还有 $ 的查找操作.

Every MongoDB command action is actually a call to a "runCommand" action "under the hood" anyway. But unless what that command is actually doing is "calling a JavaScript processing engine" then the usage becomes irrelevant. There are only a few commands anyway that do this, being mapReduce, group or eval, and of course the find operations with $where.

聚合框架根本以任何方式使用 JavaScript.您可能会误会,就像其他人所做的这样的陈述一样,这与您认为的不同:

The aggregation framework does not use JavaScript in any way at all. You might be mistaking just as others have done a statement like this, which does not do what you think it does:

db.sample.aggregate([
    { "$match": {
        "a": { "$in": db.sample.distinct("a") }
    }}
])

所以这是不在内部"运行聚合管道,而是.distinct()调用的结果"是在管道发送到服务器之前评估".无论如何,就像使用外部变量一样:

So that is "not running inside" the aggregation pipeline, but rather the "result" of that .distinct() call is "evaluated" before the pipeline is sent to the server. Much as with an external variable is done anyway:

var items = [1,2,3];
db.sample.aggregate([
    { "$match": {
        "a": { "$in": items }
    }}
])

两者本质上都以相同的方式发送到服务器:

Both essentially send to the server in the same way:

db.sample.aggregate([
    { "$match": {
        "a": { "$in": [1,2,3] }
    }}
])

因此不可能"调用"聚合管道中的任何 JavaScript 函数,也没有任何意义传入"结果通常来自 system.js.代码"需要加载到客户端",只有 JavaScript 引擎才能真正用它做任何事情.

So it is "not possible" to "call" any JavaScript function in the aggregation pipeline, nor is there really any point is "passing in" results in general from something saved in system.js. The "code" needs to be "loaded to the client" and only a JavaScript engine can actually do anything with it.

使用聚合框架,所有可用的运算符"实际上都是本机编码的函数,而不是为 mapReduce 提供的自由形式"JavaScript 解释.因此,您不必编写JavaScript",而是使用运算符本身:

With the aggregation framework, all of the "operators" available are actually natively coded functions as opposed to the "free form" JavaScript interpretation provided for mapReduce. So instead of writing "JavaScript", you use the operators themselves:

db.sample.aggregate([
    { "$group": {
        "_id": null,
        "sqared": { "$sum": {
           "$multiply": [ "$a", "$a" ]
        }}
    }}
])

{ "_id" : null, "sqared" : 14 }

因此,您可以对 system.js 中保存的函数执行的操作存在限制,并且您可能想做的是:

So there are limitations on what you can do with functions saved in system.js, and the chances are that what you want to do is either:

  • 不允许,例如访问另一个集合中的数据
  • 实际上并不需要,因为逻辑通常是自包含的
  • 或者可能更好地以客户端逻辑或其他不同形式实现

我能真正想到的唯一实际用途是,您有许多无法以任何其他方式完成的mapReduce"操作,并且您有各种共享"功能,您宁愿将它们存储在服务器上而不是在每个 mapReduce 函数调用中维护.

Just about the only practical use I can really think of is that you have a number of "mapReduce" operations that cannot be done any other way and you have various "shared" functions that you would rather just store on the server than maintain within every mapReduce function call.

但话说回来,mapReduce 超过聚合框架的 90% 的原因通常是集合的文档结构"选择不当,并且需要"JavaScript 功能来遍历文档以进行搜索和分析.

But then again, the 90% reason for mapReduce over the aggregation framework is usually that the "document structure" of the collections has been poorly chosen and the JavaScript functionality is "required" to traverse the document for search and analysis.

因此,您可以在允许的约束下使用它,但在大多数情况下,您可能根本不应该使用它,而是首先解决导致您认为自己需要此功能的其他问题.

So you can use it under the allowed constraints, but in most cases you probably should not be using this at all, but fixing the other issues that caused you to believe you needed this feature in the first place.

这篇关于在聚合管道、MapReduce 或 runCommand 中使用存储的 JavaScript 函数的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆