为什么 Tika 门面选择 EmptyParser? [英] Why does the Tika facade choose EmptyParser?

查看：27 发布时间：2021/11/14 23:47:43 java apache-tika

本文介绍了为什么 Tika 门面选择 EmptyParser?的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我正在使用 Tika 外观，根据 elasticsearch-mappper-attachment 插件.这是我的测试代码:

I'm using the Tika facade, per the example of the elasticsearch-mappper-attachment plugin. Here's my test code:

Tika tika = new Tika();                                                                                                                                                                                 
Metadata md = new Metadata();

try {                                                                                                                                                                                                   
    String content = tika.parseToString(src, md, 100000);

    System.out.println("Content length: " + content.length());  

    for (String s: md.names()) {                                                                                                                                                                        
        System.out.println(s + ": " + md.get(s));                                                                                                                                                       
    }                                                                                                                                                                                                   
}                                                                                                                                                                                                       
catch (TikaException e) {                                                                                                                                                                               
    System.out.println(e);                                                                                                                                                                              
}

输出如下:

Content length: 0
X-Parsed-By: org.apache.tika.parser.EmptyParser
Content-Type: text/html

所以问题是:如果 Tika 正确地将输入识别为 text/html，为什么它使用 EmptyParser?如果我应该传递一个解析器，我应该传递哪个解析器以获得最佳结果，假设自动检测成功，如上所述.

So the question is: if Tika correctly identifies the input as text/html, why does it use the EmptyParser? If I'm supposed to pass a parser, which parser should I pass for best results, assuming that autodetection is successful, as above.

谢谢.

为什么 Tika 门面选择 EmptyParser? [英] Why does the Tika facade choose EmptyParser?

问题描述

推荐答案

相关文章

Java开发最新文章

热门教程

热门工具

登录关闭

为什么 Tika 门面选择 EmptyParser? [英] Why does the Tika facade choose EmptyParser?

问题描述

推荐答案

相关文章

Java开发最新文章

热门教程

热门工具

登录 关闭

登录关闭