使用 CouchDB 视图,我可以同时计数组和按键范围过滤吗? [英] Using a CouchDB view, can I count groups and filter by key range at the same time?

查看:18
本文介绍了使用 CouchDB 视图,我可以同时计数组和按键范围过滤吗?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在使用 CouchDB.我希望能够在查询时指定的日期范围内计算特定字段值的出现次数.我似乎能够完成其中的部分工作,但我无法理解将它们整合在一起的最佳方式.

I'm using CouchDB. I'd like to be able to count occurrences of values of specific fields within a date range that can be specified at query time. I seem to be able to do parts of this, but I'm having trouble understanding the best way to pull it all together.

假设文档有一个时间戳字段和另一个字段,例如:

Assuming documents that have a timestamp field and another field, e.g.:

{ date: '20120101-1853', author: 'bart' }
{ date: '20120102-1850', author: 'homer'}
{ date: '20120103-2359', author: 'homer'}
{ date: '20120104-1200', author: 'lisa'}
{ date: '20120815-1250', author: 'lisa'}

我可以轻松创建一个按灵活日期范围过滤文档的视图.这可以通过如下视图来完成,使用关键范围参数调用,例如_view/all-docs?startkey=20120101-0000&endkey=20120201-0000.

I can easily create a view that filters documents by a flexible date range. This can be done with a view like the one below, called with key range parameters, e.g. _view/all-docs?startkey=20120101-0000&endkey=20120201-0000.

所有文档/map.js:

function(doc) {
    emit(doc.date, doc);
}

使用上面的数据,这将返回一个仅包含前 4 个文档(日期范围内唯一的文档)的 CouchDB 视图.

With the data above, this would return a CouchDB view containing just the first 4 docs (the only docs in the date range).

我还可以创建一个计算给定字段的出现次数的查询,就像这样,通过分组调用,即_view/author-count?group=true:

I can also create a query that counts occurrences of a given field, like this, called with grouping, i.e. _view/author-count?group=true:

作者计数/map.js:

function(doc) {
  emit(doc.author, 1);
}

作者计数/reduce.js:

function(keys, values, rereduce) {
  return sum(values);
}

这会产生类似的结果:

{
    "rows": [
        {"key":"bart","value":1},
        {"key":"homer","value":2}
        {"key":"lisa","value":2}
     ]
}

但是,我找不到按日期过滤和计数出现次数的最佳方法.例如,使用上面的数据,我希望能够指定像 startkey=20120101-0000&endkey=20120201-0000 这样的范围参数并得到这样的结果,其中最后一个文档是由于超出指定日期范围而被排除在计数之外:

However, I can't find the best way to both filter by date and count occurrences. For example, with the data above, I'd like to be able to specify range parameters like startkey=20120101-0000&endkey=20120201-0000 and get a result like this, where the last doc is excluded from the count because it is outside the specified date range:

{
    "rows": [
        {"key":"bart","value":1},
        {"key":"homer","value":2}
        {"key":"lisa","value":1}
     ]
}

最优雅的方法是什么?这可以通过单个查询来实现吗?我应该使用另一个 CouchDB 构造,还是一个视图就足够了?

What's the most elegant way to do this? Is this achievable with a single query? Should I be using another CouchDB construct, or is a view sufficient for this?

推荐答案

你可以通过列表获得非常接近想要的结果:

You can get pretty close to the desired result with a list:

{
  _id: "_design/authors",
  views: {
    authors_by_date: {
      map: function(doc) {
        emit(doc.date, doc.author);
      }
    }
  },
  lists: {
    count_occurrences: function(head, req) {
      start({ headers: { "Content-Type": "application/json" }});

      var result = {};
      var row;
      while(row = getRow()) {
        var val = row.value;
        if(result[val]) result[val]++;
        else result[val] = 1;
      }
      return result;
    }
  }
}

这个设计可以这样申请:

This design can be requested as such:

http://<couchurl>/<db>/_design/authors/_list/count_occurrences/authors_by_date?startkey=<startDate>&endkey=<endDate>

这将比普通的 map-reduce 慢,并且是一种解决方法.不幸的是,这是进行多维查询的唯一方法,"CouchDB 不适合".

This will be slower than a normal map-reduce, and is a bit of a workaround. Unfortunately, this is the only way to do a multi-dimensional query, "which CouchDB isn’t suited for".

请求这个设计的结果是这样的:

The result of requesting this design will be something like this:

{
  "bart": 1,
  "homer": 2,
  "lisa": 2
}

我们所做的基本上是发出很多元素,然后使用列表将它们分组为我们想要的.列表可用于以您想要的任何方式显示结果,但通常也会较慢.一个普通的 map-reduce 可以被缓存,并且只能根据差异进行更改,而每次请求时都必须重新构建列表.

What we do is basically emit a lot of elements, then using a list to group them as we want. A list can be used to display a result in any way you want, but will also often be slower. Whereas a normal map-reduce can be cached and only change according to the diffs, the list will have to be built anew every time it is requested.

它几乎和从 map 中获取所有元素一样慢(编排数据的开销几乎可以忽略不计):比获取 reduce 的结果要慢得多.

It is pretty much as slow as getting all the elements resulting from the map (the overhead of orchestrating the data is mostly negligible): a lot slower than getting the result of a reduce.

如果您想将列表用于不同的视图,只需在您请求的 URL 中进行交换即可:

If you want to use the list for a different view, you can simply exchange it in the URL you request:

http://<couchurl>/<db>/_design/authors/_list/count_occurrences/<view>

详细了解 couchdb wiki 上的列表.

这篇关于使用 CouchDB 视图,我可以同时计数组和按键范围过滤吗?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆