如何可靠地检测文件类型? [英] How to reliably detect file types?

查看:138
本文介绍了如何可靠地检测文件类型?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

目标:给定的文件,确定它是否是给定类型(XML,JSON属性等)的

Objective: given the file, determine whether it is of a given type (XML, JSON, Properties etc)

考虑XML的情况下 - 直到我们遇到了这个问题,下面的示例的方式工作得很好:

Consider the case of XML - Up until we ran into this issue, the following sample approach worked fine:

    try {
        saxReader.read(f);
    } catch (DocumentException e) {
        logger.warn("  - File is not XML: " + e.getMessage());
        return false;
    }
    return true;

正如预期的那样,当XML结构良好,该测试将通过与方法将返回true。如果有什么不好的事情发生和文件不能被解析,错误的将被退回。

As expected, when XML is well formed, the test would pass and method would return true. If something bad happens and file can't be parsed, false will be returned.

然而,这打破了,当我们在处理畸形XML(仍然XML虽然)文件。

This breaks however when we deal with a malformed XML (still XML though) file.

我宁愿不依赖于的.xml 扩展(失败所有的时间),找< XML版本=1.0编码=UTF-8> 字符串里面的文件等。

I'd rather not rely on .xml extension (fails all the time), looking for <?xml version="1.0" encoding="UTF-8"?> string inside the file etc.

有另一种方式这可以被处理?

Is there another way this can be handled?

什么,你必须看到里面的文件,以怀疑它可能被 XML 虽然 DocumentException 被抓 。这是必要的解析用途

What would you have to see inside the file to "suspect it may be XML though DocumentException was caught". This is needed for parsing purposes.

推荐答案

文件类型检测工具:

  • Mime Type Detection Utility
  • DROID (Digital Record Object Identification)
  • ftc - File Type Classifier
  • JHOVE, JHOVE2
  • NLNZ Metadata Extraction Tool
  • Apache Tika
  • TrID, TrIDNet
  • Oracle Outside In (commercial)
  • Forensic Innovations File Investigator TOOLS (commercial)

这篇关于如何可靠地检测文件类型?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆