Tika 1.13 运行时异常 [英] Tika 1.13 RuntimeException

查看:45
本文介绍了Tika 1.13 运行时异常的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我最近更新了我现有的 tika 项目以使用 tika 1.13 而不是 1.10.我所做的唯一一件事就是将依赖版本从 1.10 更改为 1.13.项目建设成功.然而,每当我尝试运行该应用程序时,我都会遇到此异常:

I recently updated my existing tika project to use tika 1.13 instead of 1.10. The only thing I did was changing the dependency version from 1.10 to 1.13. The project was built successfully. Yet whenever I try and run the application I get this exception:

java.lang.RuntimeException: Unable to parse the default media type registry
    at org.apache.tika.mime.MimeTypes.getDefaultMimeTypes(MimeTypes.java:580)
    at org.apache.tika.config.TikaConfig.getDefaultMimeTypes(TikaConfig.java:69)
    at org.apache.tika.config.TikaConfig.<init>(TikaConfig.java:218)
    at org.apache.tika.config.TikaConfig.getDefaultConfig(TikaConfig.java:341)
    at org.apache.tika.parser.AutoDetectParser.<init>(AutoDetectParser.java:51)
    at com.app.tikamanager.MetaParser.<init>(MetaParser.java:54)
    at com.app.services.MyService.HandleItemInThread(IntelligentDocumentsService.java:260)
    at com.app.intelligentservicebase.ItemHandlerThread.run(ItemHandlerThread.java:41)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
    at java.lang.Thread.run(Thread.java:745)
Caused by: org.apache.tika.mime.MimeTypeException: Invalid type configuration
    at org.apache.tika.mime.MimeTypesReader.read(MimeTypesReader.java:126)
    at org.apache.tika.mime.MimeTypesFactory.create(MimeTypesFactory.java:64)
    at org.apache.tika.mime.MimeTypesFactory.create(MimeTypesFactory.java:93)
    at org.apache.tika.mime.MimeTypesFactory.create(MimeTypesFactory.java:170)
    at org.apache.tika.mime.MimeTypes.getDefaultMimeTypes(MimeTypes.java:577)
    ... 10 more
Caused by: org.xml.sax.SAXNotRecognizedException: http://javax.xml.XMLConstants/feature/secure-processing
    at org.apache.xerces.parsers.AbstractSAXParser.setFeature(Unknown Source)
    at org.apache.xerces.jaxp.SAXParserImpl.setFeatures(Unknown Source)
    at org.apache.xerces.jaxp.SAXParserImpl.<init>(Unknown Source)
    at org.apache.xerces.jaxp.SAXParserFactoryImpl.newSAXParserImpl(Unknown Source)
    at org.apache.xerces.jaxp.SAXParserFactoryImpl.setFeature(Unknown Source)
    at org.apache.tika.mime.MimeTypesReader.read(MimeTypesReader.java:119)
    ... 14 more

异常是从我的 MetaParser 类的构造函数中抛出的,唯一的就是 AutoDetectParser 的初始化:

The exception is thrown from the constructor of my MetaParser class, the only thing there is the initialization of the AutoDetectParser:

private final AutoDetectParser _tikaExtractor;
public MetaParser()
    {
        _tikaExtractor = new AutoDetectParser();
    }

我正在使用 Oracle JDK 1.8.0_91-b14 的 Ubuntu 14.04 上运行该应用程序.

I am running the application on Ubuntu 14.04 with Oracle JDK 1.8.0_91-b14.

我在网上查了一下,这个异常被提到了几次,曾经有一个可能的解决方法是安装 OpenJDK,但那是针对旧版本的 Tika,因为旧版本过去可以与同一个 JDK 一起正常工作,我不认为这就是问题所在.

I looked online and this exception was mentioned a couple of times, once a probable fix was to install OpenJDK but that was for an old version of Tika and since the old version used to work fine with the same JDK I don't think that is the problem.

在调用 AutoDetectParser 构造函数之前,我需要做些什么或初始化吗?

Is there something I need to do or initialize before calling the AutoDetectParser constructor?

推荐答案

提升对答案的评论 - 您的类路径上有一个非常旧版本的 Xerces.您的 JVM 选择它作为默认的 XML 解析器,所以当 Tika 说嗨 JVM,我可以有一个安全的 XML 解析器吗"时,它失败了.

Promoting comments to an answer - you have a very old version of Xerces on your classpath. Your JVM is picking that as the default XML Parser, so when Tika says "Hi JVM, can I have a safe XML Parser" it fails.

(Tika 在 1.10 到 1.13 期间改进了 XML 解析的完成方式,包括设置更安全的默认值,这就是这种情况开始发生的原因)

(Tika made improvements in the 1.10 to 1.13 period to how XML Parsing is done, including setting safer defaults, which is why this has started happening)

您要么需要删除旧的 Xerces jar,以便开始使用 JVM 提供的 XML 解析器,要么将它们替换为更新的 Xerces 版本

You either need to remove your old Xerces jars, so that the JVM-supplied XML Parser starts being used, or replace them with a more recent Xerces version

您还可以在 Java 8 中的 XML 解组错误安全处理 org.xml.sax.SAXNotRecognizedException" 很有帮助,尤其是当您正在努力在构建中找到讨厌的旧 Xerces jar 时!

You may also find some of the advice in Error unmarshalling XML in Java 8 "secure-processing org.xml.sax.SAXNotRecognizedException" helpful, especially if you're struggling to locate the pesky old Xerces jar in your build!

这篇关于Tika 1.13 运行时异常的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆