如何在ElasticSearch的精彩集锦中过滤掉(损坏的)HTML标签? [英] How to filter out (broken) HTML Tags in ElasticSearch's Highlights?
本文介绍了如何在ElasticSearch的精彩集锦中过滤掉(损坏的)HTML标签?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!
问题描述
我在使用ElasticSearch Grails插件时遇到了麻烦,
即突出显示功能。
I'm having trouble with the ElasticSearch Grails Plugin, namely the highlighting Feature.
它正在返回带有HTML标签的文本,这不会这是一个大问题,但它也会返回损坏的,截断的HTML标签。
It is returning text with HTML tags, which would not be a big problem, but it is returning broken, cut-off HTML tags as well.
即 href = google.de>链接< a
使用RegEx。
解决方案似乎是这样的自定义分析器:
The solution to this seems to be a custom analyzer like this:
'{
"index" : {
"analysis" : {
"analyzer" : {
"test_1" : {
"char_filter" : [
"html_strip"
],
"tokenizer" : "standard"
},
"test_2" : {
"filter" : [
"standard",
"lowercase",
"stop",
"asciifolding"
],
"char_filter" : [
"html_strip"
],
"tokenizer" : "standard"
}
}
}
}
}'
问题是我如何将以上内容带入GRAILS elasticsearch插件?
(或与此有关的任何其他解决方案)
The question is how do i get the above into the GRAILS elasticsearch plugin ? (or any other solution for that matter)
推荐答案
尝试使用:
number_of_fragments: 0
try to use: "number_of_fragments": 0
它将返回所有内容
这篇关于如何在ElasticSearch的精彩集锦中过滤掉(损坏的)HTML标签?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!
查看全文