Tika 1.13 运行时异常 [英] Tika 1.13 RuntimeException
问题描述
我最近更新了我现有的 tika 项目以使用 tika 1.13 而不是 1.10.我所做的唯一一件事就是将依赖版本从 1.10 更改为 1.13.项目建设成功.然而,每当我尝试运行该应用程序时,我都会遇到此异常:
I recently updated my existing tika project to use tika 1.13 instead of 1.10. The only thing I did was changing the dependency version from 1.10 to 1.13. The project was built successfully. Yet whenever I try and run the application I get this exception:
java.lang.RuntimeException: Unable to parse the default media type registry
at org.apache.tika.mime.MimeTypes.getDefaultMimeTypes(MimeTypes.java:580)
at org.apache.tika.config.TikaConfig.getDefaultMimeTypes(TikaConfig.java:69)
at org.apache.tika.config.TikaConfig.<init>(TikaConfig.java:218)
at org.apache.tika.config.TikaConfig.getDefaultConfig(TikaConfig.java:341)
at org.apache.tika.parser.AutoDetectParser.<init>(AutoDetectParser.java:51)
at com.app.tikamanager.MetaParser.<init>(MetaParser.java:54)
at com.app.services.MyService.HandleItemInThread(IntelligentDocumentsService.java:260)
at com.app.intelligentservicebase.ItemHandlerThread.run(ItemHandlerThread.java:41)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
Caused by: org.apache.tika.mime.MimeTypeException: Invalid type configuration
at org.apache.tika.mime.MimeTypesReader.read(MimeTypesReader.java:126)
at org.apache.tika.mime.MimeTypesFactory.create(MimeTypesFactory.java:64)
at org.apache.tika.mime.MimeTypesFactory.create(MimeTypesFactory.java:93)
at org.apache.tika.mime.MimeTypesFactory.create(MimeTypesFactory.java:170)
at org.apache.tika.mime.MimeTypes.getDefaultMimeTypes(MimeTypes.java:577)
... 10 more
Caused by: org.xml.sax.SAXNotRecognizedException: http://javax.xml.XMLConstants/feature/secure-processing
at org.apache.xerces.parsers.AbstractSAXParser.setFeature(Unknown Source)
at org.apache.xerces.jaxp.SAXParserImpl.setFeatures(Unknown Source)
at org.apache.xerces.jaxp.SAXParserImpl.<init>(Unknown Source)
at org.apache.xerces.jaxp.SAXParserFactoryImpl.newSAXParserImpl(Unknown Source)
at org.apache.xerces.jaxp.SAXParserFactoryImpl.setFeature(Unknown Source)
at org.apache.tika.mime.MimeTypesReader.read(MimeTypesReader.java:119)
... 14 more
异常是从我的 MetaParser
类的构造函数中抛出的,唯一的就是 AutoDetectParser
的初始化:
The exception is thrown from the constructor of my MetaParser
class, the only thing there is the initialization of the AutoDetectParser
:
private final AutoDetectParser _tikaExtractor;
public MetaParser()
{
_tikaExtractor = new AutoDetectParser();
}
我正在使用 Oracle JDK 1.8.0_91-b14 的 Ubuntu 14.04 上运行该应用程序.
I am running the application on Ubuntu 14.04 with Oracle JDK 1.8.0_91-b14.
我在网上查了一下,这个异常被提到了几次,曾经有一个可能的解决方法是安装 OpenJDK,但那是针对旧版本的 Tika,因为旧版本过去可以与同一个 JDK 一起正常工作,我不认为这就是问题所在.
I looked online and this exception was mentioned a couple of times, once a probable fix was to install OpenJDK but that was for an old version of Tika and since the old version used to work fine with the same JDK I don't think that is the problem.
在调用 AutoDetectParser
构造函数之前,我需要做些什么或初始化吗?
Is there something I need to do or initialize before calling the AutoDetectParser
constructor?
推荐答案
提升对答案的评论 - 您的类路径上有一个非常旧版本的 Xerces.您的 JVM 选择它作为默认的 XML 解析器,所以当 Tika 说嗨 JVM,我可以有一个安全的 XML 解析器吗"时,它失败了.
Promoting comments to an answer - you have a very old version of Xerces on your classpath. Your JVM is picking that as the default XML Parser, so when Tika says "Hi JVM, can I have a safe XML Parser" it fails.
(Tika 在 1.10 到 1.13 期间改进了 XML 解析的完成方式,包括设置更安全的默认值,这就是这种情况开始发生的原因)
(Tika made improvements in the 1.10 to 1.13 period to how XML Parsing is done, including setting safer defaults, which is why this has started happening)
您要么需要删除旧的 Xerces jar,以便开始使用 JVM 提供的 XML 解析器,要么将它们替换为更新的 Xerces 版本
You either need to remove your old Xerces jars, so that the JVM-supplied XML Parser starts being used, or replace them with a more recent Xerces version
您还可以在 Java 8 中的 XML 解组错误安全处理 org.xml.sax.SAXNotRecognizedException" 很有帮助,尤其是当您正在努力在构建中找到讨厌的旧 Xerces jar 时!
You may also find some of the advice in Error unmarshalling XML in Java 8 "secure-processing org.xml.sax.SAXNotRecognizedException" helpful, especially if you're struggling to locate the pesky old Xerces jar in your build!
这篇关于Tika 1.13 运行时异常的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!