Alfresco community 4.0 无法识别 DITA 文件 mimetype [英] Alfresco community 4.0 doesn't recognize DITA files mimetype

查看:21
本文介绍了Alfresco community 4.0 无法识别 DITA 文件 mimetype的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

所以我已经安装了 Community 4.0.a 并使用 mimetype-map.xml 扩展了 mimetype 列表,就像我之前在 3.4 中所做的一样

So I've installed the Community 4.0.a and extended the mimetype list using mimetype-map.xml as I did before in 3.4

<alfresco-config area="mimetype-map">
<config evaluator="string-compare" condition="Mimetype Map">
    <mimetypes>
<mimetype mimetype="application/dita+xml" text="true" display="DITA">
        <extension default="true" display="DITA Topic">dita</extension>
        <extension default="true" display="DITA Map">ditamap</extension>
        <extension default="true" display="DITA Conditional Processing Profile">ditaval</extension>
      </mimetype>

等等...

但是每次我导入一个 DITA 文件时,它要么被识别为 XML 文件,要么被识别为 PLAIN TEXT.我已经深入研究了它,看起来是因为 Apache TIKA 会分析文件的开头以检查它的 mimetype.

But each time I import a DITA file, it is either recognise as an XML file, or PLAIN TEXT. I've digged into it and it looks like it's because of Apache TIKA which analyze the beginning of the file to check it's mimetype.

如何使用我的自定义 mimetype-map 快捷方式 TIKA(从代码中可以看出,TIKA 首先被触发,如果它找到了一些东西,那么游戏就结束了)?

How do I shortcut TIKA with my custom mimetype-map (as it looks from the code that TIKA is triggered first and if it found something then it's game over)?

我是否必须扩展 TIKA 编写自己的解析器?

DO I have to extend TIKA writing my own parser?

推荐答案

4.0 中的 Mimetype 匹配逻辑略有变化,现在可以检测内容,而不仅仅是文件名.作为其中的一部分,如果 Tika 非常确定文件是什么,那么这将是首选.

The Mimetype matching logic in 4.0 has changed slightly, now that the content is available for detection, rather than just the filename. As part of this, if Tika is very sure about what a file is, then this will be preferred.

在大多数情况下,这意味着对于常见但名称不正确的文件,Tika 可以帮助纠正错误.对于非标准文件,Tika 将拒绝提供强烈建议,将像以前一样使用基于 Alfresco 名称的匹配.(如果 Tika 和 Alfresco 在 mimetype 的规范形式上有所不同,则首选 Alfresco 版本)

In most cases, this means that for common but incorrectly named files, Tika can help correct mistakes. For non standard files, Tika will decline to offer a strong suggestion, and the Alfresco name based matching will be used as before. (In cases where Tika and Alfresco differ on what the canonical form of the mimetype is, the Alfresco version is preferred though)

在少数情况下,文件类型实际上是一种常见类型的特化,Tika 知道父类型但不知道具体类型.在这种情况下,Tika 强烈建议使用父类型,我们无法意识到添加到 Alfresco 的新类型基于此.(Tika 有一个 mimetypes 层次结构,而 Alfresco 只有一个平面列表).对于这些少数情况,Tika 也需要指导.

There are a small number of cases where the file type is actually a specialisation of a common type, and Tika knows about the parent type but not the specific one. In this case, Tika strongly suggests the parent type, and we've no way to realise the new type added to Alfresco is based on that. (Tika has a hierarchy of mimetypes, while Alfresco just has a flat list). For these small number of cases, Tika needs guiding too.

通常的修复方法是报告 Tika 错误,并在上游添加文件类型.(对于非常自定义的类型,您还需要添加一个 Tika custom-mimetypes.xml,它定义了层次结构 + glob.)

The usual fix is to report a Tika bug, and have the filetype added upstream. (For very custom types, you need to add a Tika custom-mimetypes.xml too, which defines the hierarchy + glob.)

在这个 DITA 案例中,我打开了 TIKA-784 并添加了临时修复.这现在也进入了 Alfresco.

In this DITA case, I've opened TIKA-784 and added a provisional fix. This has now gone into Alfresco too.

这篇关于Alfresco community 4.0 无法识别 DITA 文件 mimetype的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆