如何使用 tika 1.6 获取文本内容文件? [英] How to get the text content files with tika 1.6?

查看：15 发布时间：2021/11/14 23:48:04 jakarta-ee apache-tika

本文介绍了如何使用 tika 1.6 获取文本内容文件?的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我尝试从该列表中的任何文件中获取文本内容 pdf、txt、doc、docx 和 odttika 的实现以前运行良好，但现在坏了，代码是这样的:

Hi i try get the text content from any files in this list pdf,txt,doc,docx and odt the implementation with tika previously worked fine but now is broken, The code is it:

```

public void uploadFile(FileUploadEvent event) throws Exception {
 UploadedFile file = event.getUploadedFile();
 byte[] data = file.getData();
 Tika tika = new Tika();
 string = tika.parseToString(new ByteArrayInputStream(data));
 ...
}

```

有什么想法吗?，糟糕的实施?

Any ideas? , bad implementation ?

推荐答案

您需要添加 tika-parsers.

You need to add tika-parsers.

例如使用 maven 将此依赖项添加到您的 pom.xml:

For example with maven add this dependency to your pom.xml:

<dependency>
        <groupId>org.apache.tika</groupId>
        <artifactId>tika-parsers</artifactId>
        <version>1.7</version>
</dependency>

你可以使用自动检测解析器:

And you can use Auto-Detect Parser:

BodyContentHandler handler = new BodyContentHandler();
AutoDetectParser parser = new AutoDetectParser();
Metadata metadata = new Metadata();
try {
    parser.parse(is, handler, metadata);
    text = handler.toString();
} catch(TikaException te) {
    System.out.println(te.toString());
} finally {
    is.close();
}

这篇关于如何使用 tika 1.6 获取文本内容文件?的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

如何使用 tika 1.6 获取文本内容文件? [英] How to get the text content files with tika 1.6?

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录关闭

如何使用 tika 1.6 获取文本内容文件? [英] How to get the text content files with tika 1.6?

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录 关闭

登录关闭