仅在第一个调用自定义令牌生成器solr [英] Custom tokenizer solr only is invoked at the first

查看:79
本文介绍了仅在第一个调用自定义令牌生成器solr的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我创建了一个自定义令牌生成器,通过与admin/analysis.jsp和system.out日志进行检查,看来工作正常.但是,当我在使用此自定义标记生成器的字段中执行查询时,我看到仅对第一个查询字符串调用了自定义标记生成器solr(由system.out日志检查). 您能通过指出我的错来帮助我吗? 这些是我的代码:

I created a custom tokenizer, it seem work fine by checking with admin/analysis.jsp and with system.out log. However when I perform querying in the field which use this custom tokenizer, I saw that custom tokenizer solr only is invoked for the first query string (check by system.out log). Could you help me by point out what I am wrong ?. These are my code:

package com.fosp.searchengine;
import java.io.Reader;
import org.apache.lucene.analysis.WhitespaceTokenizer;
import org.apache.solr.analysis.WhitespaceTokenizerFactory;

public class JvnTextProTokenizerFactory extends WhitespaceTokenizerFactory{
    @Override
    public WhitespaceTokenizer create(Reader input) {
        System.out.println("WhitespaceTokenizer create(Reader input)");
        Reader processedStringReader = new ProcessedStringReader(input);
        return new WhitespaceTokenizer(processedStringReader);
    }

}


package com.fosp.searchengine;
import java.io.IOException;
import java.io.Reader;

public class ProcessedStringReader extends java.io.Reader {

    private static final int BUFFER_SIZE = 1024 * 8;
    private static TextProcess m_textProcess = null;
    private char[] m_inputData = null;
    private int m_offset = 0;
    private int m_length = 0;
    public ProcessedStringReader(Reader input){
        char[] arr = new char[BUFFER_SIZE];
        StringBuffer buf = new StringBuffer();
        int numChars;

        try {
            while ((numChars = input.read(arr, 0, arr.length)) > 0) {
                buf.append(arr, 0, numChars);
            }
        } catch (IOException e) {
            e.printStackTrace();
        }
        if(m_textProcess == null){
            try {
                m_textProcess = new TextProcess();
            } catch (IOException e) {
                e.printStackTrace();
            }
        }
        m_inputData = m_textProcess.processText(buf.toString()).toCharArray();
        m_offset = 0;
        m_length = m_inputData.length;
    }

    @Override
    public int read(char[] cbuf, int off, int len) throws IOException {
        int charNumber = 0;
        for(int i = m_offset + off;i<m_length && charNumber< len; i++){
            cbuf[charNumber] = m_inputData[i];
            m_offset ++;
            charNumber++;
        }
        if(charNumber == 0){
            return -1;
        }
        return charNumber;
    }

    @Override
    public void close() throws IOException {
        m_inputData = null;
        m_offset = 0;
        m_length = 0;
    }

}

Schema.xml

Schema.xml

<fieldType name="text_jvnTextPro" class="solr.TextField" positionIncrementGap="100">
  <analyzer type="index">
        <tokenizer class="com.fosp.searchengine.JvnTextProTokenizerFactory"/>
    <filter class="solr.LowerCaseFilterFactory"/>
  </analyzer>
  <analyzer type="query">
        <tokenizer class="com.fosp.searchengine.JvnTextProTokenizerFactory"/>
    <filter class="solr.LowerCaseFilterFactory"/>       
  </analyzer>
</fieldType>

推荐答案

这里没有错.工厂实例化的类被重用.在分析/管理页面中这是不同的.区别在于.

There nothing wrong here. Factory instantiated class is re-used. This is different in analysis/admin page. The difference is that.

这篇关于仅在第一个调用自定义令牌生成器solr的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆