用于连接令牌的 Solr 自定义过滤器 [英] Solr Custom Filter for concatenating tokens

查看：22 发布时间：2021/12/30 8:48:03 solr lucene

本文介绍了用于连接令牌的 Solr 自定义过滤器的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我需要为 solr 分析器阶段编写一个自定义过滤器.这个想法是首先用空格标记输入的企业名称，然后应用一组小写过滤器、模式替换和删除停用词.在这些过滤器之后，我想将所有令牌合并(连接)为一个令牌，然后应用 NGramFilterFactory 从令牌生成 N-Gram.

I need to write a custom filter for solr analyzer phase. The idea is to first tokenize the input business name by whitespace then apply a set of filters for lower case, patterns replacement and removing the stop words. After these filters, I want to merge (concatenate) all the token into one token and then apply the NGramFilterFactory for generating N-Grams from the token.

我想合并所有令牌(最初从公司名称生成)的原因是我不会错过 solr 中索引的令牌(其长度小于 N，在 NGramFilter 中)，并且用户可能不会插入输入公司名称时使用适当的空格.请让我知道更多的澄清.

The reason I want to combine the all the token (generated initially from business name) is that I would not miss the tokens (whose length is less then N, in NGramFilter) from indexing in the solr and user might not insert the proper spaces while entering the business name. Please let me know for more clarification.

我尝试为其编写一个自定义过滤器，但这不能正常工作，我能够理解它的行为.

I made an attempt to write one custom filter for the same but this is not working properly and I am able to understand the behavior of it.

当我查询名称apple"时，它返回 n1 个结果.

When I query the name "apple" then it return n1 number of results.

当我查询名称computers"时，它返回 n2 个结果.

when I query the name "computers" then it returns n2 results.

当我查询名称苹果计算机"时，它返回 n3 个结果.

when I query the name "apple computers" then it returns n3 results.

当我查询名称computers apple"时，它返回 n4 个结果.

when I query the name "computers apple" then it returns n4 results.

这里 n3 <(n1,n2) 和 n3 != n4

Here n3 < (n1,n2) and n3 != n4

这是代码:我使用的是 solr 4.10.2 版本并包含相同的 solr-core jars.

Here is the code: I am using solr 4.10.2 version and included same solr-core jars.

public class ConcatFilter extends TokenFilter {

private CharTermAttribute charTermAtt;
private StringBuilder builder = new StringBuilder();

public ConcatFilter(TokenStream input)
{
    super(input);
    charTermAtt = addAttribute(CharTermAttribute.class);
}

@Override
public boolean incrementToken()  throws IOException  {

    if(input.incrementToken()) {
        int len = charTermAtt.length();
        char buffer[] = charTermAtt.buffer();
        builder.append(buffer, 0, len);
        char[] newBuffer = builder.toString().toCharArray();
        int newLength = builder.length();
        charTermAtt.setEmpty();
        charTermAtt.copyBuffer(newBuffer, 0, newLength);
        charTermAtt.setLength(newLength);
        return true;
    } else {
        builder.delete(0, builder.length());
        return false;
        }
    }
}

用于连接令牌的 Solr 自定义过滤器 [英] Solr Custom Filter for concatenating tokens

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录关闭

用于连接令牌的 Solr 自定义过滤器 [英] Solr Custom Filter for concatenating tokens

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录 关闭

登录关闭