Solr Lucene中的连字符/破折号挑战 [英] Challenge with hyphens/dashes in Solr Lucene

查看：119 发布时间：2020/5/4 7:31:53 solr lucene hyphen

本文介绍了Solr Lucene中的连字符/破折号挑战的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我正试图使Solr仅提取格式为n-nnnnnnn的票证的后7位数字

I'm trying to cause Solr to extract only the second 7 digit portion of a ticket formatted like n-nnnnnnn

最初，我希望保留整张票.根据文档，带有数字的数字应保持在一起，但是在解决了一段时间这个问题并查看代码后，我认为不是这种情况. Solr始终生成两个项.因此，我认为我可以从第二部分获得更好的查询结果，而不是对n的第一位进行大量匹配.用A代替破折号:

Originally I hoped to keep the full ticket together. According to documentation digits with numbers should be kept together but after hammering away a this problem for some time and looking at the code I don't think that's the case. Solr always generates two terms. So rather than large numbers of matches for the first digit of n- I'm thinking I can get better query results from just the second portion. Substituting an A for a dash:

    <charFilter class="solr.PatternReplaceCharFilterFactory"
      pattern="\b\d[A](\d\d\d\d\d\d\d)\b" replacement="$1" replace="all" 
      maxBlockChars="20000"/>

将解析1A1234567罚款但 -\ b替换=" $ 1替换="全部 maxBlockChars ="20000"/>

will parse 1A1234567 fine But -\b" replacement="$1" replace="all" maxBlockChars="20000"/>

不会解析1-1234567

will not parse 1-1234567

所以看起来连字符只是一个问题.我尝试了-(转义)和[-]以及\ u002D和\ x {45}和\ x045，但没有成功.

So it looks like just a problem with the hyphen. I've tried -(escaped) and [-] and \u002D and \x{45} and \x045 without success.

我尝试过使用char过滤器:

I've tried putting char filters around it:

   <charFilter class="solr.MappingCharFilterFactory" mapping="mapping.txt"/>
    <charFilter class="solr.PatternReplaceCharFilterFactory"
      pattern="\b\d[-](\d\d\d\d\d\d\d)\b" replacement="$1" replace="all" maxBlockChars="20000"/>
    <charFilter class="solr.MappingCharFilterFactory" mapping="mapping2.txt"/>

具有映射:

-" =>"z"

然后

"z" =>-"

我看起来连字符在Flex令牌化中已被占用，甚至对于char过滤器都不可用.

I looks like the hyphen is eaten up in the Flex tokenization and isn't even available to the char filter.

有人在Solr/Lucene中使用连字符/破折号获得了更大的成功吗?谢谢

Has anyone had more success with hyphen/dash in Solr/Lucene? Thanks

Solr Lucene中的连字符/破折号挑战 [英] Challenge with hyphens/dashes in Solr Lucene

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录关闭

Solr Lucene中的连字符/破折号挑战 [英] Challenge with hyphens/dashes in Solr Lucene

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录 关闭

登录关闭