CouchDB延迟构建索引(Windows Server 2008 R2上的CouchDB 1.5.0) [英] CouchDB delay building index (CouchDB 1.5.0 on Windows Server 2008 R2)

查看:91
本文介绍了CouchDB延迟构建索引(Windows Server 2008 R2上的CouchDB 1.5.0)的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我知道CouchDB会根据索引文件的名称对每个设计文档的源进行哈希处理。每当我更改源代码时,都需要重建索引。首次请求文档时,CouchDB会执行此操作。



我期望发生和想要发生的事情



每次我更改了设计文档,对视图的第一次调用将比通常花费更长的时间,并且可能会超时。索引将继续建立。完成此操作后,视图将仅处理更改并且将非常快。



实际发生的情况


  1. 首次运行修改后的视图时,我会在状态窗口中看到该过程,慢慢达到100%。这大约需要2个小时。在这段时间内,所有CPU都被充分利用。

  2. 一旦过程达到99%,它会在那里停留约一个小时,然后消失。 CPU利用率下降到仅一个cpu。

  3. 该过程消失后,该视图的数据文件将持续增长约半小时到一个小时。 CPU利用率接近0%

  4. 索引文件突然停止增加大小。

如果在到达状态4)时再次请求视图,则3)的特征再次开始。我必须重复此过程5至50次,直到最终可以检索视图值。



如果在第1或第2阶段再次请求视图,则肯定会耗尽内存,因此我必须重新启动CouchDB服务。尽管我的数据库在仅运行一项作业时很少使用超过2 GB的字节,而在正常操作中却很少使用超过4 GB的字节。



我试图调整配置设置,添加更多内存,但是似乎没有影响。



我的问题



我是否误解了运行视图的概念?我的设置有问题吗?
如果可以预期的话,我有什么办法可以减少重新运行的次数?



上下文



我的文档非常大(1到20 MB)。它们包含的数据结构良好,通常是Web分析报告,并且将在关系数据库中存储为几万行数据。



我的地图函数提取了这些行。它返回尺寸作为键数组。键数组有时超过20列。大多数视图的列数少于10。



reduce函数将汇总(加和)具有相同键的行中的所有值。度量标准存储在字典中,并且可以包含不同的键。 reduce函数可识别一个文档中缺少的键,并将这些键添加为0。存储器。



我的设计文档通常由几个视图组成,带有一个 _lib视图,该视图不发出任何数据,但包含一个由实际视图访问的详尽函数库。

解决方案

这是一个已知问题,但以防万一:如果您有千兆字节的文档,则可以忽略reduce函数。只有内置功能才能足够快地工作。


I understand that CouchDB hashes the source of each design documents against the name of the index file. Whenever I change the source code, the index needs to be rebuild. CouchDB does this when the document is requested for the first time.

What I'd expect to happen and want to happen

Each time I change a design doc, the first call to a view will take significantly longer than usual and may time out. The index will continue to build. Once this is completed, the view will only process changes and will be very fast.

What actually happens

  1. When running an amended view for the first time, I see the process in the status window, slowly reach 100%. This takes about 2 hours. During this time all CPU's are fully utilized.
  2. Once process reaches 99% it remains there for about an hour and then disappears. CPU utilization drops to just one cpu.
  3. When the process has disappeared, the data file for the view keeps growing for about half an hour to an hour. CPU utilization is near 0%
  4. The index file suddenly stops to increase in size.

If I request the view again when I've reached state 4), the characteristics of 3) start again. I have to repeat this process between 5 to 50 times until I can finally retrieve the view values.

If the view get's requested a second time whilst till in stage 1 or 2, it will most definitely run out of memory and I have to restart the CouchDB service. This is despite my DB rarely using more than 2 GByte when runninng just one job and more than 4 GByte free in usual operation.

I have tried to tweak configuration settings, add more memory, but nothing seems to have an impact.

My Question

Do I misunderstand the concept of running views or is something wrong with my setup? If this is expected, is there anything I can tweak to reduce the number of reruns?

Context

My documents are pretty large (1 to 20 MByte). The data they contain is well structured, they are usually web-analytics reports and would in a relational database be stored as several 10k rows of data.

My map function extracts these rows. It returns the dimensions as key array. The key array sometimes exceeds 20 columns. Most views will only have less than 10 columns.

The reduce function will aggregate (sum) all values in rows with identical keys. The metrics are stored in a dictionary and may contain different keys. The reduce function identifies missing keys in one document and adds these to the aggregate as 0.

I am using CouchDB 1.5.0 on Windows Server 2008 R2 with 2CPUs and 8 GByte memory.

The views are written in javascript using the couchjs query server.

My designs documents usually consist of several views, with a '_lib' view that does not emit any data, but contains an exhaustive library of functions accessed by the actual views.

解决方案

It is a known issue, but just in case: if you have gigabytes of docs, you can forget about reduce functions. Only build-in ones will work fast enough.

这篇关于CouchDB延迟构建索引(Windows Server 2008 R2上的CouchDB 1.5.0)的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆