在CouchDB中创建分页索引? [英] Creating a pagination index in CouchDB?

查看:330
本文介绍了在CouchDB中创建分页索引?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试在CouchDB中创建一个分页索引视图,该视图列出找到的每N个文档的doc._id.

我编写了以下映射函数,但是 pageIndex 变量不能可靠地从1开始-实际上,它似乎根据发射的值或索引长度(例如50、55)任意改变,10、25-都以不同的文件开头,尽管我似乎得到了正确数量的发出文件).

I wrote the following map function, but the pageIndex variable doesn't reliably start at 1 - in fact it seems to change arbitrarily depending on the emitted value or the index length (e.g. 50, 55, 10, 25 - all start with a different file, though I seem to get the correct number of files emitted).

function(doc) {
  if (doc.type == 'log') {
    if (!pageIndex || pageIndex > 50) {
      pageIndex = 1;
      emit(doc.timestamp, null);
    }
    pageIndex++;
  }
}

我在这里做错了什么? CouchDB专家将如何构建此视图?

What am I doing wrong here? How would a CouchDB expert build this view?

请注意,我不想使用

Note that I don't want to use the "startkey + count + 1" method that's been mentioned elsewhere, since I'd like to be able to jump to a particular page or the last page (user expectations and all), I'd like to have a friendly "?page=5" URI instead of "?startkey=348ca1829328edefe3c5b38b3a1f36d1e988084b", and I'd rather CouchDB did this work instead of bulking up my application, if I can help it.

谢谢!

推荐答案

视图函数(mapreduce)纯粹是函数.不支持诸如设置全局变量的副作用. (当您将应用程序移动到 BigCouch 时,具有数据的任意子集的多个独立服务器如何知道pageIndex是吗?)

View functions (map and reduce) are purely functional. Side-effects such as setting a global variable are not supported. (When you move your application to BigCouch, how could multiple independent servers with arbitrary subsets of the data know what pageIndex is?)

因此,答案必须涉及传统的map函数,可能需要时间戳记.

Therefore the answer will have to involve a traditional map function, perhaps keyed by timestamp.

function(doc) {
  if (doc.type == 'log') {
    emit(doc.timestamp, null);
  }
}

如何获取第50个文档?最简单的方法是添加skip=0skip=50skip=100参数.但这并不理想(请参见下文).

How can you get every 50th document? The simplest way is to add a skip=0 or skip=50, or skip=100 parameter. However that is not ideal (see below).

一种预取第50个文档的确切ID的方法是_list函数,该函数仅每第50行输出一次. (实际上,您可以使用Mustache.JS或其他模板库来构建HTML.)

A way to pre-fetch the exact IDs of every 50th document is a _list function which only outputs every 50th row. (In practice you could use Mustache.JS or another template library to build HTML.)

function() {
  var ddoc = this,
      pageIndex = 0,
      row;

  send("[");
  while(row = getRow()) {
    if(pageIndex % 50 == 0) {
      send(JSON.stringify(row));
    }
    pageIndex += 1;
  }
  send("]");
}

这在许多情况下都可以使用,但是并不完美.我正在考虑一些注意事项-不一定要显示排行榜,但这取决于您的具体情况.

This will work for many situations, however it is not perfect. Here are some considerations I am thinking--not showstoppers necessarily, but it depends on your specific situation.

不建议使用漂亮的URL.如果我加载第1页,然后在前50个文档中插入一堆文档,然后单击第2页,这是什么意思?如果数据变化很大,就没有完美的用户体验,用户必须以某种方式感觉到数据正在变化.

There is a reason the pretty URLs are discouraged. What does it mean if I load page 1, then a bunch of documents are inserted within the first 50, and then I click to page 2? If the data is changing a lot, there is no perfect user experience, the user must somehow feel the data changing.

skip参数和示例_list函数具有相同的问题:它们无法缩放.使用skip,您仍然从头开始触摸视图中的每一行:在数据库文件中找到它,从磁盘中读取它,然后一遍又一遍地逐行忽略它,直到您按下skip值.对于较小的值,这很方便,但是由于您将页面分为50组,因此我不得不想象您将有数千行或更多行.这可能会使页面浏览速度变慢,因为数据库大多数时候都在运转.

The skip parameter and example _list function have the same problem: they do not scale. With skip you are still touching every row in the view starting from the beginning: finding it in the database file, reading it from disk, and then ignoring it, over and over, row by row, until you hit the skip value. For small values that's quite convenient but since you are grouping pages into sets of 50, I have to imagine that you will have thousands or more rows. That could make page views slow as the database is spinning its wheels most of the time.

_list示例有一个类似的问题,但是您要预先加载所有工作,从头到尾遍历整个视图,然后(大概)将相关文档ID发送给客户端,以便它可以快速跳转页面.但是对于成千上万的文档(您称它们为日志",所以我认为您将拥有大量文件),这将是一个非常慢的查询,不会被缓存.

The _list example has a similar problem, however you front-load all the work, running through the entire view from start to finish, and (presumably) sending the relevant document IDs to the client so it can quickly jump around the pages. But with hundreds of thousands of documents (you call them "log" so I assume you will have a ton) that will be an extremely slow query which is not cached.

总而言之,对于小型数据集,您可以使用page=1page=2表格,但是随着数据集变大,您将遇到麻烦.随着BigCouch的发布,CouchDB甚至可以更好地进行日志存储和分析,因此(如果您正在这样做),您肯定会考虑扩展规模.

In summary, for small data sets, you can get away with the page=1, page=2 form however you will bump into problems as your data set gets big. With the release of BigCouch, CouchDB is even better for log storage and analysis so (if that is what you are doing) you will definitely want to consider how high to scale.

这篇关于在CouchDB中创建分页索引?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆