CouchDB:根据时间戳返回类型的最新文档 [英] CouchDB: Return Newest Documents of Type Based on Timestamp

查看:107
本文介绍了CouchDB:根据时间戳返回类型的最新文档的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个系统可以接受来自各种唯一来源的状态更新,并且每个状态更新都会按以下结构创建一个新文档:

I have a system that accepts status updates from a variety of unique sources, and each status update creates a new document in the following structure:

{
 "type": "status_update",
 "source_id": "truck1231",
 "timestamp": 13023123123,
 "location": "Boise, ID"
}

纯粹是数据示例,但可以理解这一点.

Data purely example, but gets the idea across.

现在,这些文档每隔一个小时左右生成一次.一个小时后,我们可能会插入:

Now, these documents are generated at interval, once an hour or so. An hour later, we might the insert:

{
 "type": "status_update",
 "source_id": "truck1231",
 "timestamp": 13023126723,
 "location": "Madison, WI"
}

我唯一想做的就是查看每个唯一来源的最新更新.我目前正在通过拍摄以下地图来做到这一点:

All I'm interested in doing is seeing the latest update from each unique source. I'm currently doing this by taking a map of:

function(doc) {
  if (doc.type == "status_update") {
    emit(doc.source_id, doc);
  }
}

并减少:

function(keys, values, rereduce) {
  var winner = values[0];
  var i = values.length;
  while (i--) {
    var val = values[i];
    if (val.timestamp > winner.timestamp) winner = val;
  }
  return winner;
}

并使用group=true查询数据以归约.这可以按预期工作,并且仅提供最新更新的关键结果.

And querying the data as a reduce with group=true. This works as expected and provides a keyed result of only the latest updates.

问题是它的运行速度非常慢,需要我在CouchDB配置中reduce_limit=false.

The problem is it's terribly slow and requires me to reduce_limit=false in the CouchDB config.

感觉必须有一种更有效的方法来执行此操作.更新同一文档不是一种选择-历史记录很重要,即使在这种情况下我也不需要.处理数据客户端也不是一种选择,因为这是CouchApp,并且系统中的文档数量实际上很大,并且无法通过有线方式发送它们.

It feels like there has to be a more efficient way of doing this. Updating the same document is not an option-- the history is important even though I don't require it in this instance. Processing the data client-side isn't an option either as this is a CouchApp and the amount of documents in the system is actually quite large and not practical to send them all over the wire.

谢谢.

推荐答案

您可以使用 _stats内置的reduce函数,然后执行另一个查询以获取文档.这是视图:

You can get the latest timestamp for every source using the _stats built-in reduce function, then do another query to get the documents. Here's the views:

"views": {
  "latest_update": {
    "map": "function(doc) { if (doc.type == 'status_update') emit(doc.source_id, doc.timestamp); }",
    "reduce": "_stats"
  },
  "status_update": {
    "map": "function(doc) { if (doc.type == 'status_update') emit([doc.source_id, doc.timestamp], 1); }"
  }
}

使用group=true首先查询latest_update,然后使用类似(正确的网址编码)的查询status_update:

First query latest_update with group=true, then status_update with something like (properly url-encoded):

keys=[["truck123",TS123],["truck234",TS234],...]&include_docs=true

其中TS123和TS234是latest_update返回的max的值.

where TS123, and TS234 are the values of max returned by latest_update.

这篇关于CouchDB:根据时间戳返回类型的最新文档的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆