CouchDB:根据时间戳返回最新类型的文档 [英] CouchDB: Return Newest Documents of Type Based on Timestamp

查看:32
本文介绍了CouchDB:根据时间戳返回最新类型的文档的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我的系统接受来自各种不同来源的状态更新,每个状态更新都会创建一个具有以下结构的新文档:

I have a system that accepts status updates from a variety of unique sources, and each status update creates a new document in the following structure:

{
 "type": "status_update",
 "source_id": "truck1231",
 "timestamp": 13023123123,
 "location": "Boise, ID"
}

数据纯粹是示例,但可以理解.

Data purely example, but gets the idea across.

现在,这些文档每隔一小时左右生成一次.一小时后,我们可能会插入:

Now, these documents are generated at interval, once an hour or so. An hour later, we might the insert:

{
 "type": "status_update",
 "source_id": "truck1231",
 "timestamp": 13023126723,
 "location": "Madison, WI"
}

我感兴趣的只是查看来自每个独特来源的最新更新.我目前正在通过以下地图来做到这一点:

All I'm interested in doing is seeing the latest update from each unique source. I'm currently doing this by taking a map of:

function(doc) {
  if (doc.type == "status_update") {
    emit(doc.source_id, doc);
  }
}

还有一个减少:

function(keys, values, rereduce) {
  var winner = values[0];
  var i = values.length;
  while (i--) {
    var val = values[i];
    if (val.timestamp > winner.timestamp) winner = val;
  }
  return winner;
}

并使用 group=true 将数据作为 reduce 查询.这可以按预期工作,并提供仅包含最新更新的键控结果.

And querying the data as a reduce with group=true. This works as expected and provides a keyed result of only the latest updates.

问题是它非常慢,需要我在 CouchDB 配置中 reduce_limit=false.

The problem is it's terribly slow and requires me to reduce_limit=false in the CouchDB config.

感觉必须有一种更有效的方法来做到这一点.更新同一个文档不是一种选择——即使在这种情况下我不需要它,历史也很重要.在客户端处理数据也不是一种选择,因为这是一个 CouchApp,而且系统中的文档数量实际上非常大,通过网络发送它们并不实际.

It feels like there has to be a more efficient way of doing this. Updating the same document is not an option-- the history is important even though I don't require it in this instance. Processing the data client-side isn't an option either as this is a CouchApp and the amount of documents in the system is actually quite large and not practical to send them all over the wire.

提前致谢.

推荐答案

您可以使用 _stats内置reduce函数,然后再做一次查询来获取文档.以下是观点:

You can get the latest timestamp for every source using the _stats built-in reduce function, then do another query to get the documents. Here's the views:

"views": {
  "latest_update": {
    "map": "function(doc) { if (doc.type == 'status_update') emit(doc.source_id, doc.timestamp); }",
    "reduce": "_stats"
  },
  "status_update": {
    "map": "function(doc) { if (doc.type == 'status_update') emit([doc.source_id, doc.timestamp], 1); }"
  }
}

首先使用 group=true 查询 latest_update,然后使用类似(正确的 url 编码)查询 status_update:

First query latest_update with group=true, then status_update with something like (properly url-encoded):

keys=[["truck123",TS123],["truck234",TS234],...]&include_docs=true

其中 TS123 和 TS234 是 latest_update 返回的 max 的值.

where TS123, and TS234 are the values of max returned by latest_update.

这篇关于CouchDB:根据时间戳返回最新类型的文档的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆