Solr 8.6.3无法索引HTML文件 [英] Solr 8.6.3 could not index html file
问题描述
solr/
├── bin/
├── CHANGES.TXT
├── contrib/
├── dist/
├── docs/
├── example/
├── licenses
............
├── server/
└── tempfolder/
└── index.html
我具有以下文件夹结构,我的solr版本是8.6.3.当我输入命令时:
I have following folder structure and my solr version is 8.6.3. When I enter command:
bin/post -c solrhelp -filetypes html tempfolder/
我收到以下错误:
Solr针对网址返回了错误#404(未找到):http://localhost:8983/solr/solrhelp/update/extract?resource.name =/home/user/solr-8.6.3/example/my-examples/index.html&literal.id =/home/user/solr-8.6.3/example/my-examples/index.html
Solr returned an error #404 (Not Found) for url: http://localhost:8983/solr/solrhelp/update/extract?resource.name=/home/user/solr-8.6.3/example/my-examples/index.html&literal.id=/home/user/solr-8.6.3/example/my-examples/index.html
但是在solr-8.3.1中,此命令可以正常工作.solr-8.6.3是否支持html文件索引编制?如果是,该怎么办?
But in solr-8.3.1 this command works fine. Does solr-8.6.3 supports html file indexing? If yes how to do it?
推荐答案
You have to enable the ExtractingRequestHandler and configure it for /extract
to be available. This was probably already done in your old installation.
如果您不使用示例配置集,则不会自动加载使用Solr Cell所需的jar.您将需要配置solrconfig.xml来找到ExtractingRequestHandler及其依赖项:
If you are not working with an example configset, the jars required to use Solr Cell will not be loaded automatically. You will need to configure your solrconfig.xml to find the ExtractingRequestHandler and its dependencies:
<lib dir="${solr.install.dir:../../..}/contrib/extraction/lib" regex=".*\.jar" />
<lib dir="${solr.install.dir:../../..}/dist/" regex="solr-cell-\d.*\.jar" />
然后可以在solrconfig.xml中配置ExtractingRequestHandler.以下是Solr的_default configset中找到的默认配置,您可以根据需要进行修改:
You can then configure the ExtractingRequestHandler in solrconfig.xml. The following is the default configuration found in Solr’s _default configset, which you can modify as needed:
<requestHandler name="/update/extract"
startup="lazy"
class="solr.extraction.ExtractingRequestHandler" >
<lst name="defaults">
<str name="lowernames">true</str>
<str name="fmap.content">_text_</str>
</lst>
</requestHandler>
这篇关于Solr 8.6.3无法索引HTML文件的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!