仅在第一个调用自定义令牌生成器solr [英] Custom tokenizer solr only is invoked at the first

查看：79 发布时间：2020/5/4 7:46:58 java plugins solr lucene tokenize

本文介绍了仅在第一个调用自定义令牌生成器solr的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我创建了一个自定义令牌生成器，通过与admin/analysis.jsp和system.out日志进行检查，看来工作正常.但是，当我在使用此自定义标记生成器的字段中执行查询时，我看到仅对第一个查询字符串调用了自定义标记生成器solr(由system.out日志检查). 您能通过指出我的错来帮助我吗? 这些是我的代码:

I created a custom tokenizer, it seem work fine by checking with admin/analysis.jsp and with system.out log. However when I perform querying in the field which use this custom tokenizer, I saw that custom tokenizer solr only is invoked for the first query string (check by system.out log). Could you help me by point out what I am wrong ?. These are my code:

package com.fosp.searchengine;
import java.io.Reader;
import org.apache.lucene.analysis.WhitespaceTokenizer;
import org.apache.solr.analysis.WhitespaceTokenizerFactory;

public class JvnTextProTokenizerFactory extends WhitespaceTokenizerFactory{
    @Override
    public WhitespaceTokenizer create(Reader input) {
        System.out.println("WhitespaceTokenizer create(Reader input)");
        Reader processedStringReader = new ProcessedStringReader(input);
        return new WhitespaceTokenizer(processedStringReader);
    }

}


package com.fosp.searchengine;
import java.io.IOException;
import java.io.Reader;

public class ProcessedStringReader extends java.io.Reader {

    private static final int BUFFER_SIZE = 1024 * 8;
    private static TextProcess m_textProcess = null;
    private char[] m_inputData = null;
    private int m_offset = 0;
    private int m_length = 0;
    public ProcessedStringReader(Reader input){
        char[] arr = new char[BUFFER_SIZE];
        StringBuffer buf = new StringBuffer();
        int numChars;

        try {
            while ((numChars = input.read(arr, 0, arr.length)) > 0) {
                buf.append(arr, 0, numChars);
            }
        } catch (IOException e) {
            e.printStackTrace();
        }
        if(m_textProcess == null){
            try {
                m_textProcess = new TextProcess();
            } catch (IOException e) {
                e.printStackTrace();
            }
        }
        m_inputData = m_textProcess.processText(buf.toString()).toCharArray();
        m_offset = 0;
        m_length = m_inputData.length;
    }

    @Override
    public int read(char[] cbuf, int off, int len) throws IOException {
        int charNumber = 0;
        for(int i = m_offset + off;i<m_length && charNumber< len; i++){
            cbuf[charNumber] = m_inputData[i];
            m_offset ++;
            charNumber++;
        }
        if(charNumber == 0){
            return -1;
        }
        return charNumber;
    }

    @Override
    public void close() throws IOException {
        m_inputData = null;
        m_offset = 0;
        m_length = 0;
    }

}

Schema.xml

<fieldType name="text_jvnTextPro" class="solr.TextField" positionIncrementGap="100">
  <analyzer type="index">
        <tokenizer class="com.fosp.searchengine.JvnTextProTokenizerFactory"/>
    <filter class="solr.LowerCaseFilterFactory"/>
  </analyzer>
  <analyzer type="query">
        <tokenizer class="com.fosp.searchengine.JvnTextProTokenizerFactory"/>
    <filter class="solr.LowerCaseFilterFactory"/>       
  </analyzer>
</fieldType>

仅在第一个调用自定义令牌生成器solr [英] Custom tokenizer solr only is invoked at the first

问题描述

推荐答案

相关文章

Java开发最新文章

热门教程

热门工具

登录关闭

仅在第一个调用自定义令牌生成器solr [英] Custom tokenizer solr only is invoked at the first

问题描述

推荐答案

相关文章

Java开发最新文章

热门教程

热门工具

登录 关闭

登录关闭