将文档添加到 SOLR 中的索引:文档至少包含一个巨大的术语 [英] Adding a document to the index in SOLR: Document contains at least one immense term

查看:110
本文介绍了将文档添加到 SOLR 中的索引:文档至少包含一个巨大的术语的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在添加(通过 Java 程序)用于索引,SOLR 索引中的文档,但是在 add(inputDoc) 方法之后有一个例外.登录solr web界面包含以下内容:

I am adding (by a Java program) for indexing, a document in SOLR index, but after add(inputDoc) method there is an exception. The log in solr web interface contains the following:

Caused by: java.lang.IllegalArgumentException: Document contains at least one immense term in field="text" (whose UTF8 encoding is longer than the max length 32766), all of which were skipped.  Please correct the analyzer to not produce such terms.  The prefix of the first immense term is: '[99, 111, 112, 101, 114, 116, 105, 110, 97, 32, 105, 110, 102, 111, 114, 109, 97, 122, 105, 111, 110, 105, 32, 113, 117, 101, 115, 116, 111, 32]...', original message: bytes can be at most 32766 in length; got 226781
    at org.apache.lucene.index.DefaultIndexingChain$PerField.invert(DefaultIndexingChain.java:687)
    at org.apache.lucene.index.DefaultIndexingChain.processField(DefaultIndexingChain.java:359)
    at org.apache.lucene.index.DefaultIndexingChain.processDocument(DefaultIndexingChain.java:318)
    at org.apache.lucene.index.DocumentsWriterPerThread.updateDocument(DocumentsWriterPerThread.java:239)
    at org.apache.lucene.index.DocumentsWriter.updateDocument(DocumentsWriter.java:457)
    at org.apache.lucene.index.IndexWriter.updateDocument(IndexWriter.java:1511)
    at org.apache.solr.update.DirectUpdateHandler2.addDoc0(DirectUpdateHandler2.java:240)
    at org.apache.solr.update.DirectUpdateHandler2.addDoc(DirectUpdateHandler2.java:164)
    ... 40 more
Caused by: org.apache.lucene.util.BytesRefHash$MaxBytesLengthExceededException: bytes can be at most 32766 in length; got 226781
    at org.apache.lucene.util.BytesRefHash.add(BytesRefHash.java:284)
    at org.apache.lucene.index.TermsHashPerField.add(TermsHashPerField.java:151)
    at org.apache.lucene.index.DefaultIndexingChain$PerField.invert(DefaultIndexingChain.java:663)
    ... 47 more

请问我应该怎么做才能解决这个问题?

Please what should I do to solve this problem?

推荐答案

我遇到了和你一样的问题,终于解决了我的问题.请检查您的文本"字段的类型,我怀疑它必须是字符串".

I had the same problem as yours, finally I solved my problem. Please check the type of your "text" field, I suspect it must be "strings".

您可以在核心的托管架构中找到它:

You can find it in the managed-schema of the core:

<field name="text" type="strings"/>

或者你可以去Solr Admin,访问:http://localhost:8983/solr/CORE_NAME/schema/fieldtypes?wt=json 然后搜索text",如果是下面这样的,你就知道你定义了你的text"字段为字符串类型:

Or you can go to Solr Admin, access: http://localhost:8983/solr/CORE_NAME/schema/fieldtypes?wt=json and then search for "text", if it is something like the follow, you know you defined your "text" field as strings type:

  {
  "name":"strings",
  "class":"solr.StrField",
  "multiValued":true,
  "sortMissingLast":true,
  "fields":["text"],
  "dynamicFields":["*_ss"]},

然后我的解决方案适合您,您可以在托管架构中将类型从字符串"更改为text_general".(确保 schema.xml 中的text"类型也是text_general")

Then my solution works for you, you can change the type from "strings" to "text_general" in managed-schema. (make sure type of "text" in schema.xml is also "text_general")

   <field name="text" type="text_general">

这将解决您的问题.strings 是字符串字段,而 text_general 是文本字段.

This will solve your problem. strings is string field, but text_general is text field.

这篇关于将文档添加到 SOLR 中的索引:文档至少包含一个巨大的术语的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆