CouchDB:根据时间戳返回最新类型的文档 [英] CouchDB: Return Newest Documents of Type Based on Timestamp
问题描述
我的系统接受来自各种不同来源的状态更新,每个状态更新都会创建一个具有以下结构的新文档:
I have a system that accepts status updates from a variety of unique sources, and each status update creates a new document in the following structure:
{
"type": "status_update",
"source_id": "truck1231",
"timestamp": 13023123123,
"location": "Boise, ID"
}
数据纯粹是示例,但可以理解.
Data purely example, but gets the idea across.
现在,这些文档每隔一小时左右生成一次.一小时后,我们可能会插入:
Now, these documents are generated at interval, once an hour or so. An hour later, we might the insert:
{
"type": "status_update",
"source_id": "truck1231",
"timestamp": 13023126723,
"location": "Madison, WI"
}
我感兴趣的只是查看来自每个独特来源的最新更新.我目前正在通过以下地图来做到这一点:
All I'm interested in doing is seeing the latest update from each unique source. I'm currently doing this by taking a map of:
function(doc) {
if (doc.type == "status_update") {
emit(doc.source_id, doc);
}
}
还有一个减少:
function(keys, values, rereduce) {
var winner = values[0];
var i = values.length;
while (i--) {
var val = values[i];
if (val.timestamp > winner.timestamp) winner = val;
}
return winner;
}
并使用 group=true
将数据作为 reduce 查询.这可以按预期工作,并提供仅包含最新更新的键控结果.
And querying the data as a reduce with group=true
. This works as expected and provides a keyed result of only the latest updates.
问题是它非常慢,需要我在 CouchDB 配置中 reduce_limit=false
.
The problem is it's terribly slow and requires me to reduce_limit=false
in the CouchDB config.
感觉必须有一种更有效的方法来做到这一点.更新同一个文档不是一种选择——即使在这种情况下我不需要它,历史也很重要.在客户端处理数据也不是一种选择,因为这是一个 CouchApp,而且系统中的文档数量实际上非常大,通过网络发送它们并不实际.
It feels like there has to be a more efficient way of doing this. Updating the same document is not an option-- the history is important even though I don't require it in this instance. Processing the data client-side isn't an option either as this is a CouchApp and the amount of documents in the system is actually quite large and not practical to send them all over the wire.
提前致谢.
推荐答案
您可以使用 _stats
内置reduce函数,然后再做一次查询来获取文档.以下是观点:
You can get the latest timestamp for every source using the _stats
built-in reduce function, then do another query to get the documents. Here's the views:
"views": {
"latest_update": {
"map": "function(doc) { if (doc.type == 'status_update') emit(doc.source_id, doc.timestamp); }",
"reduce": "_stats"
},
"status_update": {
"map": "function(doc) { if (doc.type == 'status_update') emit([doc.source_id, doc.timestamp], 1); }"
}
}
首先使用 group=true
查询 latest_update
,然后使用类似(正确的 url 编码)查询 status_update
:
First query latest_update
with group=true
, then status_update
with something like (properly url-encoded):
keys=[["truck123",TS123],["truck234",TS234],...]&include_docs=true
其中 TS123 和 TS234 是 latest_update
返回的 max
的值.
where TS123, and TS234 are the values of max
returned by latest_update
.
这篇关于CouchDB:根据时间戳返回最新类型的文档的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!