Solr 8.6.3无法索引HTML文件 [英] Solr 8.6.3 could not index html file

查看:64
本文介绍了Solr 8.6.3无法索引HTML文件的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

solr/
├── bin/
├── CHANGES.TXT
├── contrib/
├── dist/
├── docs/
├── example/
├── licenses
............
├── server/
└── tempfolder/
    └── index.html

我具有以下文件夹结构,我的solr版本是8.6.3.当我输入命令时:

I have following folder structure and my solr version is 8.6.3. When I enter command:

bin/post -c solrhelp -filetypes html tempfolder/

我收到以下错误:

Solr针对网址返回了错误#404(未找到):http://localhost:8983/solr/solrhelp/update/extract?resource.name =/home/user/solr-8.6.3/example/my-examples/index.html&literal.id =/home/user/solr-8.6.3/example/my-examples/index.html

Solr returned an error #404 (Not Found) for url: http://localhost:8983/solr/solrhelp/update/extract?resource.name=/home/user/solr-8.6.3/example/my-examples/index.html&literal.id=/home/user/solr-8.6.3/example/my-examples/index.html

但是在solr-8.3.1中,此命令可以正常工作.solr-8.6.3是否支持html文件索引编制?如果是,该怎么办?

But in solr-8.3.1 this command works fine. Does solr-8.6.3 supports html file indexing? If yes how to do it?

推荐答案

您有

You have to enable the ExtractingRequestHandler and configure it for /extract to be available. This was probably already done in your old installation.

如果您不使用示例配置集,则不会自动加载使用Solr Cell所需的jar.您将需要配置solrconfig.xml来找到ExtractingRequestHandler及其依赖项:

If you are not working with an example configset, the jars required to use Solr Cell will not be loaded automatically. You will need to configure your solrconfig.xml to find the ExtractingRequestHandler and its dependencies:

<lib dir="${solr.install.dir:../../..}/contrib/extraction/lib" regex=".*\.jar" />
<lib dir="${solr.install.dir:../../..}/dist/" regex="solr-cell-\d.*\.jar" />

然后可以在solrconfig.xml中配置ExtractingRequestHandler.以下是Solr的_default configset中找到的默认配置,您可以根据需要进行修改:

You can then configure the ExtractingRequestHandler in solrconfig.xml. The following is the default configuration found in Solr’s _default configset, which you can modify as needed:

<requestHandler name="/update/extract"
            startup="lazy"
            class="solr.extraction.ExtractingRequestHandler" >
  <lst name="defaults">
    <str name="lowernames">true</str>
    <str name="fmap.content">_text_</str>
  </lst>
</requestHandler>

这篇关于Solr 8.6.3无法索引HTML文件的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆