Django Haystack / ElasticSearch索引进程中止 [英] Django Haystack/ElasticSearch indexing process aborted
问题描述
- 我确实获得索引的文档数量,所以我没有收到错误喜欢0文件索引。
- 索引一个小集,例如1000,工作正常。
- 我试过在
haystack中硬编码超时/ backend / __ init __。py
,这似乎没有任何效果。 - 我已经尝试在elasticsearch.yml中更改选项也没有用。 li>
如果硬编码超时不起作用,那么我还可以如何延长索引的时间?有没有其他方法可以直接在ElasticSearch中进行更改?或者一些批处理方法?
提前感谢
这个haystack版本是错误的。导致问题的代码行在以下行中的文件haystack / management / commands / update_index.py中找到:
pks_seen = set([smart_str(pk)for pk in qs.values_list('pk',flat = True)]
导致服务器内存不足。然而,对于索引,似乎不需要它。所以,我只是把它改为:
pks_seen = set([])
现在它正在批量运行。谢谢大家答复!
I'm running a setup with django 1.4, Haystack 2 beta, and ElasticSearch .20. My database is postgresql 9.1, which has several million records. When I try to index all of my data with haystack/elasticsearch, the process times out and I get a message that just says "Killed". So far I've noticed the following:
- I do get the number of documents to get indexed, so I'm not getting an error like, "0 documents to index".
- Indexing a small set, for example 1000, works just fine.
- I've tried hardcoding the timeout in
haystack/backends/__init__.py
and that seems to have no effect. - I've tried changing options in the elasticsearch.yml also to no avail.
If hardcoding the timeout doesn't work, then how else can I extend the time for indexing? Is there another way to change this directly in ElasticSearch? Or perhaps some batch processing method?
Thanks in advance!
This version of haystack is buggy. The line of code causing the problem was found in the file haystack/management/commands/update_index.py in the following line:
pks_seen = set([smart_str(pk) for pk in qs.values_list('pk', flat=True)])
Is causing the server to run out of memory. However, for indexing, it does not seem to be needed. So, I just changed it to:
pks_seen = set([])
Now it's running through the batches. Thank you everyone that answered!
这篇关于Django Haystack / ElasticSearch索引进程中止的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!