如何在CouchDB中编写SELECT COUNT(DISTINCT字段)查询? [英] How do you write a SELECT COUNT(DISTINCT field) query in CouchDB?

查看:226
本文介绍了如何在CouchDB中编写SELECT COUNT(DISTINCT字段)查询?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

在CouchDB中是否有一种很好的方法来模仿SELECT COUNT(DISTINCT字段)的行为?

Is there a good way to mimic the behavior of SELECT COUNT(DISTINCT field) in CouchDB?

想象一下我们有以下文档,它记录了时间用户播放了某首歌曲:

Imagine we have the following document, which records the time at which a user played a certain song:

{
  song_id: "happy birthday",
  user_id: "boris",
  date_played: [2011, 11, 14, 00, 12, 55],
  _id: ...
}

我想知道用户 boris曾经播放过的不同歌曲的数量。如果我们的用户已经听了20次生日快乐,那首歌仍然只占总歌曲数的+1。

I'd like to know the number of distinct songs ever played by our user "boris". If our user has listened to "happy birthday" 20 times, that song should still contribute just +1 to the overall song count.

在MySQL中,我只需执行 SELECT COUNT(DISTINCT song_id)FROM播放WHERE user_id = boris ,但是在CouchDB中编写此代码时我空白。

In MySQL, I'd simply execute SELECT COUNT(DISTINCT song_id) FROM plays WHERE user_id = "boris", but I'm drawing a blank when it comes to writing this in CouchDB.

工作环境1:如果我更改了架构,而是将所有歌曲播放存储在单个用户文档中的 boris,那么我可以写一个映射到仅发出不同的值。但是,如果我想在last.fm的规模上构建某些东西,我担心的是,随着 boris文档大小(播放次数)的持续增长,更新将花费很长时间。 (可能还会有一个最终要达到的最大文档大小。)

Work-Around 1: If I changed my schema and instead stored all the song-plays inside a single user document for "boris" I could then write a map to emit only distinct values. However, if I wanted to build something on the scale of last.fm, my fear is that updates would start taking a very long time as the "boris" document size (number of plays) continued to grow. (There might also be a maximum document size that I would eventually hit).

工作环境2:我也可以编写一个map函数返回 all 的不同记录,我的python脚本可以总结这些记录;但是再加上成千上万首不同的歌曲,这也会变得非常慢。

Work-Around 2: I could also write a map function to return all of the distinct records, which my python script could sum up itself; but again with hundreds of thousands of distinct songs this would become very slow as well.

我还缺少其他选择吗?

推荐答案

此答案由Zachary Zolton在沙发上的邮件列表中提供:

This answer was provided by Zachary Zolton on the couchdb mailing list:

http:// mail-archives。 apache.org/mod_mbox/couchdb-user/201111.mbox/%3CCAGnHtbJ-1-YeLWMLivKzWub98HZY7%2BesnPOHU4pEYgWAsxaszA%40mail.gmail.com%3E

开始您已经有了一个视图,可以为您提供鲍里斯(Boris)的5万首独特的
歌曲,您可以使用_list函数返回行数。

Since you've already got a view that'll give you Boris's 50k unique songs, you could use a _list function to return the number of rows.

像这样应该可以解决问题:

Something like this should do the trick:

function() {
 var count = 0;
 while(getRow()) count++;
 return JSON.stringify({count: count});
}

如果使用相同的视图,键范围和$查询此列表函数b $ b组级别,它将仅使用一些JSON进行响应,例如: { count: 50612}

If you query this list function, with the same view, key range and group level, it'll just respond with a bit of JSON, such as: {"count":"50612"}

您可以在此处阅读更多内容:

You can read up more here:

  • http://guide.couchdb.org/draft/transforming.html
  • http://wiki.apache.org/couchdb/Formatting_with_Show_and_List

这篇关于如何在CouchDB中编写SELECT COUNT(DISTINCT字段)查询?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆