SOLR删除表情符号其他字符 [英] SOLR Dropping Emoji Miscellaneous characters

查看：156 发布时间：2020/5/4 7:58:21 unicode solr lucene emoji

本文介绍了SOLR删除表情符号其他字符的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

似乎SOLR正在考虑将有效的Unicode字符视为无效字符，并将其删除.

It looks like SOLR is considering what should be valid Unicode characters as invalid, and dropping them.

我通过打开查询调试以查看解析器对我的查询所做的工作来证明"了这一点.这是一个示例:

I "proved" this by turning on query debug to see what the parser was doing with my query. Here's an example:

查询='ァ☀'(\ u30a1 \ u2600)

Query = 'ァ☀' (\u30a1\u2600)

这是SOLR所做的:

调试":{ 'rawquerystring':u'\ u30a1 \ u2600'， 'querystring':u'\ u30a1 \ u2600'， 'parsedquery':u'(+ DisjunctionMaxQuery((text:\ u30a1)))/no_coord'， 'parsedquery_toString':u'+(text:\ u30a1)'，

'debug':{ 'rawquerystring':u'\u30a1\u2600', 'querystring':u'\u30a1\u2600', 'parsedquery':u'(+DisjunctionMaxQuery((text:\u30a1)))/no_coord', 'parsedquery_toString':u'+(text:\u30a1)',

您可以看到，用'ァ'可以，但是它却加上了黑太阳"字符.

As you can see, was OK with 'ァ', but it ATE the "Black Sun" character.

我还没有尝试过所有的块，但是我已经确认它也不喜欢⛿(\ u26ff)和♖(\ u2656).

I haven't tried ALL of the Block, but I've confirmed it also doesn't like ⛿ (\u26ff) and ♖ (\u2656).

我将SOLR与Jetty一起使用，因此不应应用各种TomCat问题，WRT字符编码.

I'm using SOLR with Jetty, so the various TomCat issues WRT character encoding shouldn't apply.

SOLR删除表情符号其他字符 [英] SOLR Dropping Emoji Miscellaneous characters

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录关闭

SOLR删除表情符号其他字符 [英] SOLR Dropping Emoji Miscellaneous characters

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录 关闭

登录关闭