Django Haystack / ElasticSearch索引进程中止 [英] Django Haystack/ElasticSearch indexing process aborted

查看:194
本文介绍了Django Haystack / ElasticSearch索引进程中止的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在使用django 1.4,Haystack 2 beta和ElasticSearch .20进行安装。我的数据库是postgresql 9.1,它有几百万条记录。当我尝试用haystack / elasticsearch对所有的数据进行索引时,这个过程超时,我收到一条刚刚说出杀死的消息。到目前为止,我注意到了以下内容:


  1. 我确实获得索引的文档数量,所以我没有收到错误喜欢0文件索引。

  2. 索引一个小集,例如1000,工作正常。

  3. 我试过在 haystack中硬编码超时/ backend / __ init __。py ,这似乎没有任何效果。

  4. 我已经尝试在elasticsearch.yml中更改选项也没有用。 li>

如果硬编码超时不起作用,那么我还可以如何延长索引的时间?有没有其他方法可以直接在ElasticSearch中进行更改?或者一些批处理方法?



提前感谢

解决方案

这个haystack版本是错误的。导致问题的代码行在以下行中的文件haystack / management / commands / update_index.py中找到:

  pks_seen = set([smart_str(pk)for pk in qs.values_list('pk',flat = True)] 

导致服务器内存不足。然而,对于索引,似乎不需要它。所以,我只是把它改为:

  pks_seen = set([])

现在它正在批量运行。谢谢大家答复!


I'm running a setup with django 1.4, Haystack 2 beta, and ElasticSearch .20. My database is postgresql 9.1, which has several million records. When I try to index all of my data with haystack/elasticsearch, the process times out and I get a message that just says "Killed". So far I've noticed the following:

  1. I do get the number of documents to get indexed, so I'm not getting an error like, "0 documents to index".
  2. Indexing a small set, for example 1000, works just fine.
  3. I've tried hardcoding the timeout in haystack/backends/__init__.py and that seems to have no effect.
  4. I've tried changing options in the elasticsearch.yml also to no avail.

If hardcoding the timeout doesn't work, then how else can I extend the time for indexing? Is there another way to change this directly in ElasticSearch? Or perhaps some batch processing method?

Thanks in advance!

解决方案

This version of haystack is buggy. The line of code causing the problem was found in the file haystack/management/commands/update_index.py in the following line:

pks_seen = set([smart_str(pk) for pk in qs.values_list('pk', flat=True)])

Is causing the server to run out of memory. However, for indexing, it does not seem to be needed. So, I just changed it to:

pks_seen = set([])

Now it's running through the batches. Thank you everyone that answered!

这篇关于Django Haystack / ElasticSearch索引进程中止的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆