用存储的html字段突出显示Solr Strip html [英] Solr Strip html when highlighting with stored html fields

查看:122
本文介绍了用存储的html字段突出显示Solr Strip html的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

在轨道中使用Solr和Sunspot.

Using Solr and Sunspot in rails.

我正在使用以下字段类型搜索html字段:

I am searching on an html field using a field type like this:

<fieldType name="text_html" class="solr.TextField" omitNorms="false">
  <analyzer>
    <tokenizer class="solr.StandardTokenizerFactory"/>
    <charFilter class="solr.HTMLStripCharFilterFactory"/>
    <filter class="solr.StandardFilterFactory"/>
    <filter class="solr.LowerCaseFilterFactory"/>
    <filter class="solr.ISOLatin1AccentFilterFactory"/>
    <filter class="solr.PorterStemFilterFactory"/>
  </analyzer>
</fieldType>

然后我将执行搜索并使用存储的字段,以便可以在结果中返回突出显示的文本.我遇到的问题是,存储的值中包含原始的html文本.例如:对新闻"的搜索返回:

I am then performing a search and using a stored field so that I can return highlighted text in the results. The problem I am having is that the stored value has the original html text in it. For example: a search on 'news' is returning:

与@@@ hl @@@ news @@@@ endhl @@@的社区联系,体育,本地优惠和所有最新对话.</div> \ n</div> \ n</div> "

"community connection to @@@hl@@@news@@@endhl@@@, sports, local deals and all the latest conversations.</div>\n</div>\n</div>"

然后我想用html包装的标签替换标签@@@ hl @@@,@@@ endhl @@@.

I then want to replace tags @@@hl@@@, @@@endhl@@@ with html wrapped tags.

我是否需要自己手动剥离原始的html标签(divs等)标签,还是有办法获取已被剥离的html标签的存储值?

Do I need to manually strip out the original html tags (divs, etc) tags out myself or is there a way to get the stored value to already have html tags stripped out?

我知道如何手动执行此操作,只是想确保我没有在schema.xml或solrconfig.xml中丢失任何内容.

I know how to do this manually, just wanted to make sure I wasn't missing something in the schema.xml or solrconfig.xml.

谢谢

推荐答案

在插入Solr之前或从索引检索之后,您将需要手动剥离该数据/格式化. Solr中的分析器,令牌生成器和令牌过滤器针对该字段运行,并针对该值执行其操作在将标记/术语插入该文档的索引之前或在查询处理期间传递.但是,它将始终以传入的原始格式存储要返回查询结果的字段值.

You will need to manually strip that data/formatting out either prior to inserting into Solr or after retrieving from the index. The Analyzers, Tokenizers, and Token Filters in Solr run against the field and perform their actions against the value passed prior to inserting tokens/terms into the index for that document or during the query processing. However, it will always store the field value for returning with query results in the original form passed in.

如果您恰巧使用 DataImportHandler 将数据加载到Solr中,它将提供一个 HtmlStripTransformer 和/或

If you happen to be using the DataImportHandler to load your data into Solr, it provides an HtmlStripTransformer and/or RegExTransformer you could leverage to remove the html tags.

这篇关于用存储的html字段突出显示Solr Strip html的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆