为什么Tika Facade选择EmptyParser? [英] Why does the Tika facade choose EmptyParser?

查看：60 发布时间：2021/4/8 20:33:21 java apache-tika

本文介绍了为什么Tika Facade选择EmptyParser?的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

按照

I'm using the Tika facade, per the example of the elasticsearch-mappper-attachment plugin. Here's my test code:

Tika tika = new Tika();                                                                                                                                                                                 
Metadata md = new Metadata();

try {                                                                                                                                                                                                   
    String content = tika.parseToString(src, md, 100000);

    System.out.println("Content length: " + content.length());  

    for (String s: md.names()) {                                                                                                                                                                        
        System.out.println(s + ": " + md.get(s));                                                                                                                                                       
    }                                                                                                                                                                                                   
}                                                                                                                                                                                                       
catch (TikaException e) {                                                                                                                                                                               
    System.out.println(e);                                                                                                                                                                              
}

这是输出:

Content length: 0
X-Parsed-By: org.apache.tika.parser.EmptyParser
Content-Type: text/html

所以问题是:如果Tika正确地将输入标识为 text/html ，为什么它要使用 EmptyParser ?如果我应该通过一个解析器，那么假设自动检测成功，我应该通过哪个解析器以获得最佳结果.

So the question is: if Tika correctly identifies the input as text/html, why does it use the EmptyParser? If I'm supposed to pass a parser, which parser should I pass for best results, assuming that autodetection is successful, as above.

谢谢.

为什么Tika Facade选择EmptyParser? [英] Why does the Tika facade choose EmptyParser?

问题描述

推荐答案

相关文章

Java开发最新文章

热门教程

热门工具

登录关闭

为什么Tika Facade选择EmptyParser? [英] Why does the Tika facade choose EmptyParser?

问题描述

推荐答案

相关文章

Java开发最新文章

热门教程

热门工具

登录 关闭

登录关闭