Neo4j:使用node_auto_index / lucene索引对32k以上的属性进行索引 [英] Neo4j: indexing properties that are longer then 32k with node_auto_index / lucene index

查看:307
本文介绍了Neo4j:使用node_auto_index / lucene索引对32k以上的属性进行索引的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我试图建立一个小文件和电子邮件搜索引擎。我还想为全文搜索使用更高级的搜索查询。因此我正在查看lucene索引。从我所看到的,有两种方法 - node_auto_index和apoc.index.addNode。

设置索引可以正常工作,索引具有小属性的节点可以工作。当尝试为大于32k的属性索引节点时,neo4j失败(并进入不可用状态)。



错误消息归结为:


$ b


警告:无法调用过程 apoc.index.addNode :原因:
java.lang.IllegalArgumentException:Document在field =text_e(其UTF8编码长于
最大长度32766)中至少包含一个
的巨大术语,所有这些都被忽略。请更正
分析器,以避免产生这些条款。第一个庞大的
项的前缀是:'[110,101,111,32,110,101,111,32,110,101,111,32,
110,101,111,32 ,110,101,111,32,110,101,111,32,110,101,
111,32,110,101] ...',原始消息:字节最多可以是32766
长度;得到40000


我已经在3.1.2和3.1.0+ apoc 3.1.0.3上检查了这一点



有关此问题的更详细描述,请参阅 https://baach.de/Members/jhb/neo4j-full-text-indexing



有什么方法可以解决这个问题吗?例如。我做错了什么,或者有什么配置?

很多!

解决方案

由于潜在的lucene限制,neo4j不支持比〜32k更长的索引值。
有关该区域的详细信息,您可以查看:
https:/ /github.com/neo4j/neo4j/pull/6213 https:// github .COM / Neo4j的/ Neo4j的/拉/ 8404
您需要将这些较长的值分成多个项。


I am trying to build a little file and email search engine. I'd like also to use more advanced search queries for the full text search. Hence I am looking at lucene indexes. From what I have seen, there are two approaches - node_auto_index and apoc.index.addNode.

Setting the index up works fine, and indexing nodes with small properties works. When trying to index nodes with properties that are larger then 32k, neo4j fails (and get's into an unusable state).

The error message boils down to:

WARNING: Failed to invoke procedure apoc.index.addNode: Caused by: java.lang.IllegalArgumentException: Document contains at least one immense term in field="text_e" (whose UTF8 encoding is longer than the max length 32766), all of which were skipped. Please correct the analyzer to not produce such terms. The prefix of the first immense term is: '[110, 101, 111, 32, 110, 101, 111, 32, 110, 101, 111, 32, 110, 101, 111, 32, 110, 101, 111, 32, 110, 101, 111, 32, 110, 101, 111, 32, 110, 101]...', original message: bytes can be at most 32766 in length; got 40000

I have checked this on 3.1.2 and 3.1.0+ apoc 3.1.0.3

A much longer description of the problem can be found at https://baach.de/Members/jhb/neo4j-full-text-indexing.

Is there any way to fix this? E.g. have I done anything wrong, or is there something to configure?

Thx a lot!

解决方案

neo4j does not support index values that are longer then ~32k because of underlying lucene limitation. For some details around that area You can look at: https://github.com/neo4j/neo4j/pull/6213 and https://github.com/neo4j/neo4j/pull/8404. You need to split such longer values into multiple terms.

这篇关于Neo4j:使用node_auto_index / lucene索引对32k以上的属性进行索引的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆