HTML Strip in Elastic Search [英] HTML Strip in Elastic Search
本文介绍了HTML Strip in Elastic Search的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!
问题描述
我有一个包含 html
标签的属性的文档。我想在索引之前删除 html
。
我发现这个 htmlstrip- charfilter ,但我无法找到使用此示例。我是新的弹性搜索和分析器概念。
谢谢
解决方案
p>请检查以下链接:
#分析文字:< b> quick< / b>bröwn < img src =fox/>& quot; jumped& quot;
/ pre>
curl -XPUT'http://127.0.0.1:9200/foo/'-d'
{
index:{
analysis: {
analyzer:{
test_1:{
char_filter:[
html_strip
],
标准
},
test_2:{
filter:[
standard,
smallcase,
stop
asciifolding
],
char_filter:[
html_strip
],
tokenizer:standard
}
}
}
}
}
'
curl -XGET'http://127.0.0.1:9200/foo/ ?_analyze格式=文本&安培;文本=的+%板3Cb%3Equick%3C%2FB%3E + BR%C3%B6wn +%3Cimg + SRC%3D%22fox%22%2F%3E +%26quot%3Bjumped%26quot%3B&安培;分析器=标准'
#tok ens:[b:5-> 6:< ALPHANUM>]
#
#3:
#[quick:7-> 12:< ALPHANUM>]
#
#4:
#[b:14-> 15:< ALPHANUM>]
#
#5:
#[br wn:17-> 22:< ALPHANUM]
#
#6:
#[img:24-> 27:< ALPHANUM>]
#
#7:
#[src:28-> 31:< ALPHANUM>]
#
#8:
#[fox:33-> 36 :< ALPHANUM>]
#
#9:
#[41:45:< ALPHANUM>]
#
#10:
#[jumped& quot; 46-> 57:< COMPANY>]
#
#}
curl -XGET'http://127.0。 0.1:9200 /富/ _analyze格式=文本&安培;文本=的+%板3Cb%3Equick%3C%2FB%3E + BR%C3%B6wn +%3Cimg + SRC%3D%22fox%22%2F%3E +%26quot%3Bjumped%? 26quot%3B& analyzer = test_1'
#{
#tokens:[the:0-> 3:< ALPHANUM>]
#
#2:
#[quick:7-&12;< ALPHANUM>]
#
#3:
#[br¶wn: 17-> 22:< ALPHANUM]
#
#4:
#[跳跃:46-> 52:< ALPHANUM>]
#
#$ $ $ $ $ $ $ $ $ $ $ C3%B6wn +%3Cimg + src%3D%22fox%22%2F%3E +%26quot%3Bjumped%26quot%3B& analyzer = test_2'
#{
#tokens [快速:7-> 12:< ALPHANUM>]
#
#3:
#[brown:17-> 22:< ALPHANUM>]
#
#4:
#[跳跃:46-> 52:< ALPHANUM>]
#
#}
https://gist.github。 com / clintongormley / 780895
感谢clintongormley
I have a document with property that contains
html
tags. I want to removehtml
before indexing.I found this htmlstrip-charfilter but I can't find example in using this. I'm new to elastic search and analyzer concept.
Thanks
解决方案Please check the link below:
# Analyze text: "the <b>quick</b> bröwn <img src="fox"/> "jumped"" curl -XPUT 'http://127.0.0.1:9200/foo/' -d ' { "index" : { "analysis" : { "analyzer" : { "test_1" : { "char_filter" : [ "html_strip" ], "tokenizer" : "standard" }, "test_2" : { "filter" : [ "standard", "lowercase", "stop", "asciifolding" ], "char_filter" : [ "html_strip" ], "tokenizer" : "standard" } } } } } ' curl -XGET 'http://127.0.0.1:9200/foo/_analyze?format=text&text=the+%3Cb%3Equick%3C%2Fb%3E+br%C3%B6wn+%3Cimg+src%3D%22fox%22%2F%3E+%26quot%3Bjumped%26quot%3B&analyzer=standard' # "tokens" : "[b:5->6:<ALPHANUM>] # # 3: # [quick:7->12:<ALPHANUM>] # # 4: # [b:14->15:<ALPHANUM>] # # 5: # [bröwn:17->22:<ALPHANUM>] # # 6: # [img:24->27:<ALPHANUM>] # # 7: # [src:28->31:<ALPHANUM>] # # 8: # [fox:33->36:<ALPHANUM>] # # 9: # [quot:41->45:<ALPHANUM>] # # 10: # [jumped":46->57:<COMPANY>] # " # } curl -XGET 'http://127.0.0.1:9200/foo/_analyze?format=text&text=the+%3Cb%3Equick%3C%2Fb%3E+br%C3%B6wn+%3Cimg+src%3D%22fox%22%2F%3E+%26quot%3Bjumped%26quot%3B&analyzer=test_1' # { # "tokens" : "[the:0->3:<ALPHANUM>] # # 2: # [quick:7->12:<ALPHANUM>] # # 3: # [bröwn:17->22:<ALPHANUM>] # # 4: # [jumped:46->52:<ALPHANUM>] # " # } curl -XGET 'http://127.0.0.1:9200/foo/_analyze?format=text&text=the+%3Cb%3Equick%3C%2Fb%3E+br%C3%B6wn+%3Cimg+src%3D%22fox%22%2F%3E+%26quot%3Bjumped%26quot%3B&analyzer=test_2' # { # "tokens" : "[quick:7->12:<ALPHANUM>] # # 3: # [brown:17->22:<ALPHANUM>] # # 4: # [jumped:46->52:<ALPHANUM>] # " # }
https://gist.github.com/clintongormley/780895
Thanks to clintongormley
这篇关于HTML Strip in Elastic Search的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!
查看全文