为什么Tika Facade选择EmptyParser? [英] Why does the Tika facade choose EmptyParser?

查看:60
本文介绍了为什么Tika Facade选择EmptyParser?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

按照

I'm using the Tika facade, per the example of the elasticsearch-mappper-attachment plugin. Here's my test code:

Tika tika = new Tika();                                                                                                                                                                                 
Metadata md = new Metadata();

try {                                                                                                                                                                                                   
    String content = tika.parseToString(src, md, 100000);

    System.out.println("Content length: " + content.length());  

    for (String s: md.names()) {                                                                                                                                                                        
        System.out.println(s + ": " + md.get(s));                                                                                                                                                       
    }                                                                                                                                                                                                   
}                                                                                                                                                                                                       
catch (TikaException e) {                                                                                                                                                                               
    System.out.println(e);                                                                                                                                                                              
} 

这是输出:

Content length: 0
X-Parsed-By: org.apache.tika.parser.EmptyParser
Content-Type: text/html

所以问题是:如果Tika正确地将输入标识为 text/html ,为什么它要使用 EmptyParser ?如果我应该通过一个解析器,那么假设自动检测成功,我应该通过哪个解析器以获得最佳结果.

So the question is: if Tika correctly identifies the input as text/html, why does it use the EmptyParser? If I'm supposed to pass a parser, which parser should I pass for best results, assuming that autodetection is successful, as above.

谢谢.

推荐答案

请确保 tika-parsers 在您的类路径中!如果您使用的是Gradle,则

Make sure that tika-parsers is on your classpath! If you are using Gradle,

compile 'org.apache.tika:tika-parsers:1.7'

可以解决问题.

这篇关于为什么Tika Facade选择EmptyParser?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆