索引编制过程中未使用自定义Solr分析器 [英] Custom Solr analyzers not being used during indexing

查看:35
本文介绍了索引编制过程中未使用自定义Solr分析器的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我的机器上有一堆PDF文件,我想在Solr中建立索引.为此,我创建了一个具有自定义字段类型和用户定义字段的架构文件.

I have a bunch of PDF files on my machine which I want to index in Solr. For this purpose, I have created a schema file with custom field types and user-defined fields.

下面是我的 schema.xml 中的字段和copyFields:

Given below are the fields and copyFields in my schema.xml:

<field name="id" type="custom01" indexed="true" stored="true" required="true" multiValued="false" />
<field name="_version_" type="long" indexed="true" stored="false"/>
<field name="_root_" type="string" indexed="true" stored="false" docValues="false" />
<field name="_text_" type="custom02" indexed="true" stored="true" multiValued="true"/>
<field name="fileEx" type="custom03" indexed="false" stored="true" multiValued="false"/>

<copyField source="id" dest="fileEx"/>

id 字段将包含索引文件的实际路径.我计划将此值复制到 fileEx 中,并使用字段定义中给出的自定义分析器将文件的扩展名仅保存在字段中.

The id field will contain the actual path of the indexed file. I plan to copy this value into fileEx and save just the extension of the file in the field using the custom analyzer as given in the field definition.

以下是我的自定义fieldType定义:

The following are my custom fieldType definitions:

<fieldType name="custom01" class="solr.TextField"> <!-- Dummy fieldType -->
<analyzer>
<tokenizer class="solr.PatternTokenizerFactory" pattern="^$"/>
</analyzer>
</fieldType>

<fieldType name="custom02" class="solr.TextField">
<analyzer>
<tokenizer class="solr.PatternTokenizerFactory" pattern="\.([^.]*$)" group="0"/>
<filter class="solr.PatternReplaceFilterFactory" pattern="\." replacement=""/>
</analyzer>
</fieldType>

当我尝试使用此架构索引文件时, id 字段的内容仅被复制到 fileEx 中,而没有进行任何分析. id fileEx 都具有相同的值.我使用了SolrUI中的分析器选项卡,查看我的fieldTypes是否真正起作用,并发现它们按预期工作.

When I tried to index the files using this schema, the contents of the id field were just copied into fileEx without any analyzing done. Both id and fileEx had the same value. I used the analyzer tab in the SolrUI to see if my fieldTypes actually work and found that they work as expected.

但是由于某些原因,分析器在索引实际文档时似乎无法正常运行.

But for some reason, the analyzers don't seem to be running properly while indexing actual documents.

因此,在这一点上,我感到困惑和沮丧.任何对此的帮助将不胜感激.TIA.

So, at this point I am stuck and frustrated. Any help regarding this will be much appreciated. TIA.

推荐答案

我是否正确理解您在问为什么从匹配中返回的文本没有更改?返回的文本是处理之前的值,而不是字段的标记化内容.您不会看到更改分析器返回的值有任何变化.这是使突出显示等正常工作所必需的.

Do I understand correctly that you're asking why the text returned from a hit hasn't changed? The text returned is the value before processing, not the tokenized contents of the field. You will not see any change in the value returned by changing the analyzer. This is required to make things like highlighting work properly.

如果您想在文本到达字段之前对其进行更改,请使用更新处理器.

If you want to change the text before it arrives in a field, use an update processor.

这篇关于索引编制过程中未使用自定义Solr分析器的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆