自定义令牌生成器上的索引崩溃 [英] Indexing crashes on custom tokenizer

查看:78
本文介绍了自定义令牌生成器上的索引崩溃的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我们正在构建一个Solr插件来链接我们的专有引擎.预期用途将完全替代标准标记器. (这是背景:混合搜索和索引:单词和Solr中的令牌元数据)

We are building a Solr plug-in to link our proprietary engine. The intended use is replacing the standard tokenizer altogether. (This is the background: Hybrid search and indexing: words and token metadata in Solr)

尝试在Solr Admin中为测试文档建立索引时:

When trying to index a test document in the Solr Admin:

id,title
12345,A test title

我遇到了一个例外,我想我的标记生成器正在启动.

I am getting an exception where, I suppose, my tokenizer is kicking in.

配置更改(schema.xml)为:

The configuration changes (schema.xml) are:

    <fieldType name="text_general" class="solr.TextField" positionIncrementGap="100">
      <analyzer type="index">
        <tokenizer class="com.linguasys.carabao.ViaWebTokenizerFactory" url="http://blahblah/carabao/?wsdl"/>
        <filter class="solr.StopFilterFactory" ignoreCase="true" words="stopwords.txt" />
         <filter class="solr.LowerCaseFilterFactory"/>
      </analyzer>
<!--
     <analyzer type="query">
        <tokenizer class="com.linguasys.carabao.ViaWebTokenizerFactory"/>
        <filter class="solr.StopFilterFactory" ignoreCase="true" words="stopwords.txt" />
         <filter class="solr.LowerCaseFilterFactory"/>
      </analyzer> 
     <analyzer type="index">
        <tokenizer class="solr.StandardTokenizerFactory"/>
        <filter class="solr.StopFilterFactory" ignoreCase="true" words="stopwords.txt" />
         <filter class="solr.LowerCaseFilterFactory"/>
      </analyzer>
      <analyzer type="query">
        <tokenizer class="solr.StandardTokenizerFactory"/>
        <filter class="solr.StopFilterFactory" ignoreCase="true" words="stopwords.txt" />
        <filter class="solr.SynonymFilterFactory" synonyms="synonyms.txt" ignoreCase="true" expand="true"/>
        <filter class="solr.LowerCaseFilterFactory"/>
      </analyzer>
-->
    </fieldType>
    <fieldType name="family_id_space_delimited_list" class="solr.TextField" positionIncrementGap="100">
          <analyzer type="index">
            <tokenizer class="com.linguasys.carabao.ViaWebTokenizerFactory" url="http://blahblah/carabao/?wsdl"/>
            <!--
            <filter class="com.linguasys.carabao.FamilyIDFilterFactory" />
            -->
          </analyzer>
    </fieldType>

    <fieldType name="role_space_delimited_list" class="solr.TextField" positionIncrementGap="100">
          <analyzer type="index">
            <tokenizer class="com.linguasys.carabao.ViaWebTokenizerFactory" url="http://blahblah/carabao/?wsdl"/>
            <!--
            <filter class="com.linguasys.carabao.RoleFilterFactory" />
            -->
          </analyzer>
    </fieldType>

Web服务本身可以工作. (过滤器已被注释掉,因为它们因某种类型的不匹配错误而崩溃,但这是以后使用的.)

The web service itself works. (The filters are commented out because they were crashing with some kind of type mismatch error, but that's for later.)

以下是例外.不仅是做错了什么",还是我从哪里可以获得更多信息?"

The exception is below. It's not just "what am doing wrong", it's "where do I get more info?"

org.apache.solr.common.SolrException: Exception writing document id 12345 to the index; possible analysis error.
  at org.apache.solr.update.DirectUpdateHandler2.addDoc(DirectUpdateHandler2.java:168)
  at org.apache.solr.update.processor.RunUpdateProcessor.processAdd(RunUpdateProcessorFactory.java:69)
  at org.apache.solr.update.processor.UpdateRequestProcessor.processAdd(UpdateRequestProcessor.java:51)
  at org.apache.solr.update.processor.DistributedUpdateProcessor.doLocalAdd(DistributedUpdateProcessor.java:870)
  at org.apache.solr.update.processor.DistributedUpdateProcessor.versionAdd(DistributedUpdateProcessor.java:1024)
  at org.apache.solr.update.processor.DistributedUpdateProcessor.processAdd(DistributedUpdateProcessor.java:693)
  at org.apache.solr.update.processor.LogUpdateProcessor.processAdd(LogUpdateProcessorFactory.java:100)
  at org.apache.solr.handler.loader.CSVLoaderBase.doAdd(CSVLoaderBase.java:395)
  at org.apache.solr.handler.loader.SingleThreadedCSVLoader.addDoc(CSVLoader.java:44)
  at org.apache.solr.handler.loader.CSVLoaderBase.load(CSVLoaderBase.java:364)
  at org.apache.solr.handler.loader.CSVLoader.load(CSVLoader.java:31)
  at org.apache.solr.handler.UpdateRequestHandler$1.load(UpdateRequestHandler.java:92)
  at org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:74)
  at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:135)
  at org.apache.solr.core.SolrCore.execute(SolrCore.java:1962)
  at org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:777)
  at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:418)
  at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:207)
  at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:239)
  at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:206)
  at org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:219)
  at org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:106)
  at org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:136)
  at org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:79)
  at org.apache.catalina.valves.AbstractAccessLogValve.invoke(AbstractAccessLogValve.java:610)
  at org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:88)
  at org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:526)
  at org.apache.coyote.http11.AbstractHttp11Processor.process(AbstractHttp11Processor.java:1078)
  at org.apache.coyote.AbstractProtocol$AbstractConnectionHandler.process(AbstractProtocol.java:655)
  at org.apache.coyote.http11.Http11NioProtocol$Http11ConnectionHandler.process(Http11NioProtocol.java:222)
  at org.apache.tomcat.util.net.NioEndpoint$SocketProcessor.doRun(NioEndpoint.java:1566)
  at org.apache.tomcat.util.net.NioEndpoint$SocketProcessor.run(NioEndpoint.java:1523)
  at java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source)
  at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
  at org.apache.tomcat.util.threads.TaskThread$WrappingRunnable.run(TaskThread.java:61)
  at java.lang.Thread.run(Unknown Source)
Caused by: java.lang.IllegalArgumentException: input AttributeSource must not be null
  at org.apache.lucene.util.AttributeSource.<init>(AttributeSource.java:94)
  at org.apache.lucene.analysis.TokenStream.<init>(TokenStream.java:106)
  at org.apache.lucene.analysis.TokenFilter.<init>(TokenFilter.java:33)
  at org.apache.lucene.analysis.util.FilteringTokenFilter.<init>(FilteringTokenFilter.java:70)
  at org.apache.lucene.analysis.core.StopFilter.<init>(StopFilter.java:60)
  at org.apache.lucene.analysis.core.StopFilterFactory.create(StopFilterFactory.java:127)
  at org.apache.solr.analysis.TokenizerChain.createComponents(TokenizerChain.java:67)
  at org.apache.lucene.analysis.AnalyzerWrapper.createComponents(AnalyzerWrapper.java:102)
  at org.apache.lucene.analysis.Analyzer.tokenStream(Analyzer.java:180)
  at org.apache.lucene.document.Field.tokenStream(Field.java:554)
  at org.apache.lucene.index.DefaultIndexingChain$PerField.invert(DefaultIndexingChain.java:597)
  at org.apache.lucene.index.DefaultIndexingChain.processField(DefaultIndexingChain.java:342)
  at org.apache.lucene.index.DefaultIndexingChain.processDocument(DefaultIndexingChain.java:301)
  at org.apache.lucene.index.DocumentsWriterPerThread.updateDocument(DocumentsWriterPerThread.java:222)
  at org.apache.lucene.index.DocumentsWriter.updateDocument(DocumentsWriter.java:450)
  at org.apache.lucene.index.IndexWriter.updateDocument(IndexWriter.java:1507)
  at org.apache.solr.update.DirectUpdateHandler2.addDoc0(DirectUpdateHandler2.java:240)
  at org.apache.solr.update.DirectUpdateHandler2.addDoc(DirectUpdateHandler2.java:164)
  ... 35 more
",

推荐答案

您需要验证调用yourTokenizer.create(java.io.Reader reader)时会发生什么.从堆栈跟踪中可以看出,此方法正在返回null,并且此值一直传播到AttributeSource.<init>(AttributeSource.java:94).此时返回null是非法的,因此是例外.

You need to verify what happens when yourTokenizer.create(java.io.Reader reader) is invoked. From the stack trace it looks like this method is returning null, and this value is propagated all the way up to AttributeSource.<init>(AttributeSource.java:94). At this point returning null is illegal hence the exception.

找出正在发生的事情的最好方法是启用调试器并在上述行处停止.

The best way for you to find out what's going on is to enable debugger and stop at the above mentioned line.

这篇关于自定义令牌生成器上的索引崩溃的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆