CouchDB:根据时间戳返回类型的最新文档 [英] CouchDB: Return Newest Documents of Type Based on Timestamp
问题描述
我有一个系统可以接受来自各种唯一来源的状态更新,并且每个状态更新都会按以下结构创建一个新文档:
I have a system that accepts status updates from a variety of unique sources, and each status update creates a new document in the following structure:
{
"type": "status_update",
"source_id": "truck1231",
"timestamp": 13023123123,
"location": "Boise, ID"
}
纯粹是数据示例,但可以理解这一点.
Data purely example, but gets the idea across.
现在,这些文档每隔一个小时左右生成一次.一个小时后,我们可能会插入:
Now, these documents are generated at interval, once an hour or so. An hour later, we might the insert:
{
"type": "status_update",
"source_id": "truck1231",
"timestamp": 13023126723,
"location": "Madison, WI"
}
我唯一想做的就是查看每个唯一来源的最新更新.我目前正在通过拍摄以下地图来做到这一点:
All I'm interested in doing is seeing the latest update from each unique source. I'm currently doing this by taking a map of:
function(doc) {
if (doc.type == "status_update") {
emit(doc.source_id, doc);
}
}
并减少:
function(keys, values, rereduce) {
var winner = values[0];
var i = values.length;
while (i--) {
var val = values[i];
if (val.timestamp > winner.timestamp) winner = val;
}
return winner;
}
并使用group=true
查询数据以归约.这可以按预期工作,并且仅提供最新更新的关键结果.
And querying the data as a reduce with group=true
. This works as expected and provides a keyed result of only the latest updates.
问题是它的运行速度非常慢,需要我在CouchDB配置中reduce_limit=false
.
The problem is it's terribly slow and requires me to reduce_limit=false
in the CouchDB config.
感觉必须有一种更有效的方法来执行此操作.更新同一文档不是一种选择-历史记录很重要,即使在这种情况下我也不需要.处理数据客户端也不是一种选择,因为这是CouchApp,并且系统中的文档数量实际上很大,并且无法通过有线方式发送它们.
It feels like there has to be a more efficient way of doing this. Updating the same document is not an option-- the history is important even though I don't require it in this instance. Processing the data client-side isn't an option either as this is a CouchApp and the amount of documents in the system is actually quite large and not practical to send them all over the wire.
谢谢.
推荐答案
您可以使用 _stats
内置的reduce函数,然后执行另一个查询以获取文档.这是视图:
You can get the latest timestamp for every source using the _stats
built-in reduce function, then do another query to get the documents. Here's the views:
"views": {
"latest_update": {
"map": "function(doc) { if (doc.type == 'status_update') emit(doc.source_id, doc.timestamp); }",
"reduce": "_stats"
},
"status_update": {
"map": "function(doc) { if (doc.type == 'status_update') emit([doc.source_id, doc.timestamp], 1); }"
}
}
使用group=true
首先查询latest_update
,然后使用类似(正确的网址编码)的查询status_update
:
First query latest_update
with group=true
, then status_update
with something like (properly url-encoded):
keys=[["truck123",TS123],["truck234",TS234],...]&include_docs=true
其中TS123和TS234是latest_update
返回的max
的值.
where TS123, and TS234 are the values of max
returned by latest_update
.
这篇关于CouchDB:根据时间戳返回类型的最新文档的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!