CouchDB中的链接地图/约简 [英] Chained map/reduce in couchDB
问题描述
在ouchDB中,我有一组类似于以下的项目(例如,为简化起见):
In couchDB, I have a set of items like the following (simplified for example's sake):
{_id: 1, date: "Jul 1", user: "user1"}
{_id: 2, date: "Jul 2", user: "user1"}
{_id: 3, date: "Jul 3", user: "user2"}
...etc...
d想获得按日期排序的最新活动列表,且没有重复的用户_id。我可以使用如下结果创建视图:
I'd like to get a list of "most recent activity", sorted by date, with no duplicate user _ids. I can create a view with results like so:
{key: "July 3", _id: 3, user: "user2"}
{key: "July 2", _id: 2, user: "user1"}
{key: "July 1", _id: 1, user: "user1"}
但这包含同一用户的重复条目。或者,我可以创建一个映射{key:user,value:date}并缩小为
but this contains duplicate entries for the same user. Or I can create a view that maps {key: user, value: date} and reduces to
{key: "user1", mostRecentDate: "July 2"}
{key: "user2", mostRecentDate: "July 3"}
,但不是按最新排序。
我知道显而易见的解决方案-不支持缩小另一个视图的结果。 BigCouch 支持链式地图/归约,但似乎已经过时/不受支持(2012年最新版本)。
I know that the obvious solution - reducing over the results of another view isn't supported. BigCouch supports chained map/reduce, but appears to be rather out of date / unsupported (last release 2012).
这似乎是一个非常普遍的问题-现有的解决方案有哪些(交换数据库之外)?
This seems like a rather common problem - what are some existing solutions (beyond "switch databases")?
推荐答案
以下是有关如何使用ouchdb 1.xxx进行链式地图约简的一般思路。我们想要的是一种将一个映射/归约结果传递给另一个的能力。
Here is a general idea of how you can do chained map reduce with couchdb 1.xxx. What we want is the ability to pass the the results of one map/reduce to another.
-
订阅按视图过滤的_changes提要。这将为您提供由map函数实际发出的文档列表。
Subscribe to the _changes feed filtered by the view. This will give you a list of docs that will actually be emitted by the map function.
接下来,我们需要为这些过滤后的文档调用view函数。这很简单,因为我们可以将键列表传递给视图,因此我们只需传递键并获得所需的视图结果子集即可。
Next we need to call the view function for these filtered docs. It's simple since we can pass a list of keys to the view so we simply pass the keys and get the desired result subset of the view.
接下来,我们将结果推送到一个单独的数据库中或将其放入同一个数据库中。我们可以使用批量插入来更快地执行插入。如果您使用单独的数据库,则甚至可以从视图结果中重用 _id的
,这样批量更新会容易得多。
Next we push this result either in a separate database or in the same one. We can use bulk inserts to perform the inserts faster. If you use a separate database you can even reuse the _id's
from the view results so the bulk updates would be a lot easier.
在此数据库中,我们定义了另一个视图,该视图根据值对结果进行排序。
Within this database we define another view that sorts our results based on value.
{键: user1,mostRecentDate: 7月2日}
{键: user2,mostRecentDate: 7月3日}
{key: "user1", mostRecentDate: "July 2"} {key: "user2", mostRecentDate: "July 3"}
由于您已经执行了此步骤,因此所需要做的就是在 mostRecentDate
在第二个数据库中,您将获得按日期排序的用户活动。
since you have already gotten to this step all you need to do is create a view on mostRecentDate
in the second database and you will get user activity sorted by date.
我希望您使用的是虚拟缩小。返回null且仅用于 group = true
的一个。
I hope you are using a dummy reduce though. One that returns null and is only used for group=true
.
在第4步中使用列表功能可以使您的生活更轻松。由于批量更新要求文档列表的格式为 { docs:[....]}
,因此您可以轻松地通过列表功能一次性获得它
using a list function in step 4 can make your life easier. As bulk updates require the list of docs to be in the form {"docs":[....]}
you can easily get it in one go with a list function.
这篇关于CouchDB中的链接地图/约简的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!