如何将新的 mime 类型添加到 apache tika [英] How to add new mime type to apache tika

查看:54
本文介绍了如何将新的 mime 类型添加到 apache tika的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

这是我阅读 mime 类型的课程.我正在尝试添加新的 MIME 类型(属性文件)并阅读它.

This is my class for reading mime types. I am trying to add a new mime type(properties file) and read it.

这是我的类文件:

/*
 * To change this license header, choose License Headers in Project Properties.
 * To change this template file, choose Tools | Templates
 * and open the template in the editor.
 */
package check_mime;

import java.io.IOException;
import java.nio.file.Path;
import java.nio.file.Paths;
import org.apache.tika.Tika;
import org.apache.tika.mime.MimeTypes;


public class TikaFileTypeDetector {

    private final Tika tika = new Tika();

    public TikaFileTypeDetector() {
        super();
    }

    public String probeContentType(Path path) throws IOException {

        // Check contents first
        String fileContentDetect = tika.detect(path.toFile());
        if (!fileContentDetect.equals(MimeTypes.OCTET_STREAM)) {
            return fileContentDetect;
        }

        // Try file name only if content search was not successful
        String fileNameDetect = tika.detect(path.toString());
        if (!fileNameDetect.equals(MimeTypes.OCTET_STREAM)) {
            return fileNameDetect;
        }

        return null;
    }

    public static void main(String[] args) throws IOException {

        Tika tika = new Tika();

        if (args.length != 1) {
            printUsage();
            return;
        }
        Path path = Paths.get(args[0]);

        TikaFileTypeDetector detector = new TikaFileTypeDetector();

        String contentType = detector.probeContentType(path);

        System.out.println("File is of type - " + contentType);
    }

    public static void printUsage() {
        System.out.print("Usage: java -classpath ... "
                + TikaFileTypeDetector.class.getName()
                + " ");
    }
}

docs 我创建了一个自定义 xml:

From the docs I have created a custom xml:

 <?xml version="1.0" encoding="UTF-8"?>
 <mime-info>
   <mime-type type="text/properties">
          <glob pattern="*.properties"/>
   </mime-type>
 </mime-info>

现在我如何添加到我的程序中并阅读它.我必须创建解析器吗?我被困在这里了.

Now how do I add to my program and read it. Do I have to create a parser? I'm stuck here.

推荐答案

这在 Apache Tika 5 分钟解析器说明.要添加对 Java .properties 文件的支持,您应该首先创建一个名为 custom-mimetypes.xml 的文件,并使用以下内容填充它:

This is covered in the Apache Tika 5 minute parser instructions. To add support for Java .properties files, you should first create a file called custom-mimetypes.xml and populate it with something like:

<?xml version="1.0" encoding="UTF-8"?>
<mime-info>
  <mime-type type="text/properties">
     <_comment>Java Properties</_comment>
     <glob pattern="*.properties"/>
     <sub-class-of type="text/plain"/>
   </mime-type>
</mime-info>

接下来,您需要使用正确的名称将其放在 Tika 可以找到的地方.它必须作为 org/apache/tika/mime/custom-mimetypes.xml 存储在您的类路径中.最简单的方法是创建该目录结构,将新文件移入,然后将根目录添加到类路径中.对于部署,您应该将其打包成一个 jar 并将其放在类路径中

Next, you need to put that somewhere that Tika can find it, with the right name. It must be stored as org/apache/tika/mime/custom-mimetypes.xml on your classpath. The easiest thing to do is to create that directory structure, move the new file in, then add the root directory to your classpath. For deployment, you should wrap that up into a jar and put it on the classpath

如果您小心的话,您可以使用 Tika 应用程序检查您的 MIME 类型文件是否已加载.将您的代码打包为 jar,以如下方式运行:

You can use the Tika App to check your mime type file was loaded, if you're careful. With your code pacakged as a jar, run it as something like:

java -classpath tika-app-1.10-SNAPSHOT.jar:my-custom-mimetypes.jar org.apache.tika.cli.TikaCLI --list-supported-types | grep text/properties

或者,如果您在本地目录中有它,请尝试类似

Alternately, if you have it in a local directory, try something like

ls -l org/apache/tika/mime/custom-mimetypes.xml
# Check a file was found, with some content in it
java -classpath tika-app-1.10-SNAPSHOT.jar:. org.apache.tika.cli.TikaCLI --list-supported-types | grep text/properties

如果这没有显示您的 MIME 类型,那么您没有获得正确的路径或文件名,请仔细检查它们

If that isn't showing your mime type, then you didn't get the path or filename correct, double check them

(或者,升级到更新版本的 Apache Tika,因为从 r1686315 开始,Tika 内置了 Java 属性 mimetype!)

(Alternately, upgrade to a newer version of Apache Tika, as since r1686315 Tika has a Java Properties mimetype built in!)

这篇关于如何将新的 mime 类型添加到 apache tika的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆