为什么Tika Facade选择EmptyParser? [英] Why does the Tika facade choose EmptyParser?
本文介绍了为什么Tika Facade选择EmptyParser?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!
问题描述
I'm using the Tika facade, per the example of the elasticsearch-mappper-attachment plugin. Here's my test code:
Tika tika = new Tika();
Metadata md = new Metadata();
try {
String content = tika.parseToString(src, md, 100000);
System.out.println("Content length: " + content.length());
for (String s: md.names()) {
System.out.println(s + ": " + md.get(s));
}
}
catch (TikaException e) {
System.out.println(e);
}
这是输出:
Content length: 0
X-Parsed-By: org.apache.tika.parser.EmptyParser
Content-Type: text/html
所以问题是:如果Tika正确地将输入标识为 text/html
,为什么它要使用 EmptyParser
?如果我应该通过一个解析器,那么假设自动检测成功,我应该通过哪个解析器以获得最佳结果.
So the question is: if Tika correctly identifies the input as text/html
, why does it use the EmptyParser
? If I'm supposed to pass a parser, which parser should I pass for best results, assuming that autodetection is successful, as above.
谢谢.
推荐答案
请确保 tika-parsers
在您的类路径中!如果您使用的是Gradle,则
Make sure that tika-parsers
is on your classpath! If you are using Gradle,
compile 'org.apache.tika:tika-parsers:1.7'
可以解决问题.
这篇关于为什么Tika Facade选择EmptyParser?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!
查看全文