Solr自定义过滤器，用于串联令牌 [英] Solr Custom Filter for concatenating tokens

查看：106 发布时间：2020/5/4 7:31:41 solr lucene

本文介绍了Solr自定义过滤器，用于串联令牌的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我需要为solr分析器阶段编写一个自定义过滤器.想法是首先通过空格对输入的公司名称进行标记，然后应用一组用于小写字母，模式替换和删除停用词的过滤器.经过这些过滤器之后，我想将所有令牌合并(串联)到一个令牌中，然后应用NGramFilterFactory从令牌生成N-Grams.

I need to write a custom filter for solr analyzer phase. The idea is to first tokenize the input business name by whitespace then apply a set of filters for lower case, patterns replacement and removing the stop words. After these filters, I want to merge (concatenate) all the token into one token and then apply the NGramFilterFactory for generating N-Grams from the token.

我要合并所有令牌(最初是从公司名称生成的)的原因是，我不会在solr中建立索引而丢失令牌(其长度小于N，在NGramFilter中为N)，并且用户可能不会插入输入公司名称时，请留出适当的空格.请让我知道进一步的澄清.

The reason I want to combine the all the token (generated initially from business name) is that I would not miss the tokens (whose length is less then N, in NGramFilter) from indexing in the solr and user might not insert the proper spaces while entering the business name. Please let me know for more clarification.

我曾尝试为此编写一个自定义过滤器，但这无法正常工作，我能够理解它的行为.

I made an attempt to write one custom filter for the same but this is not working properly and I am able to understand the behavior of it.

当我查询名称"apple"时，它将返回n1个结果.

When I query the name "apple" then it return n1 number of results.

当我查询名称"computers"时，它将返回n2个结果.

when I query the name "computers" then it returns n2 results.

当我查询名称苹果计算机"时，它将返回n3个结果.

when I query the name "apple computers" then it returns n3 results.

当我查询名称"computers apple"时，它将返回n4个结果.

when I query the name "computers apple" then it returns n4 results.

这里n3< (n1，n2)和n3！= n4

Here n3 < (n1,n2) and n3 != n4

这里是代码:我使用的是solr 4.10.2版本，并包含相同的solr-core jars.

Here is the code: I am using solr 4.10.2 version and included same solr-core jars.

public class ConcatFilter extends TokenFilter {

private CharTermAttribute charTermAtt;
private StringBuilder builder = new StringBuilder();

public ConcatFilter(TokenStream input)
{
    super(input);
    charTermAtt = addAttribute(CharTermAttribute.class);
}

@Override
public boolean incrementToken()  throws IOException  {

    if(input.incrementToken()) {
        int len = charTermAtt.length();
        char buffer[] = charTermAtt.buffer();
        builder.append(buffer, 0, len);
        char[] newBuffer = builder.toString().toCharArray();
        int newLength = builder.length();
        charTermAtt.setEmpty();
        charTermAtt.copyBuffer(newBuffer, 0, newLength);
        charTermAtt.setLength(newLength);
        return true;
    } else {
        builder.delete(0, builder.length());
        return false;
        }
    }
}

Solr自定义过滤器，用于串联令牌 [英] Solr Custom Filter for concatenating tokens

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录关闭

Solr自定义过滤器，用于串联令牌 [英] Solr Custom Filter for concatenating tokens

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录 关闭

登录关闭