如何可靠地检测文件类型? [英] How to reliably detect file types?
问题描述
目标:给定的文件,确定它是否是给定类型(XML,JSON属性等)的
Objective: given the file, determine whether it is of a given type (XML, JSON, Properties etc)
考虑XML的情况下 - 直到我们遇到了这个问题,下面的示例的方式工作得很好:
Consider the case of XML - Up until we ran into this issue, the following sample approach worked fine:
try {
saxReader.read(f);
} catch (DocumentException e) {
logger.warn(" - File is not XML: " + e.getMessage());
return false;
}
return true;
正如预期的那样,当XML结构良好,该测试将通过与方法将返回true。如果有什么不好的事情发生和文件不能被解析,错误的将被退回。
As expected, when XML is well formed, the test would pass and method would return true. If something bad happens and file can't be parsed, false will be returned.
然而,这打破了,当我们在处理畸形XML(仍然XML虽然)文件。
This breaks however when we deal with a malformed XML (still XML though) file.
我宁愿不依赖于的.xml
扩展(失败所有的时间),找< XML版本=1.0编码=UTF-8>
字符串里面的文件等。
I'd rather not rely on .xml
extension (fails all the time), looking for <?xml version="1.0" encoding="UTF-8"?>
string inside the file etc.
有另一种方式这可以被处理?
Is there another way this can be handled?
什么,你必须看到里面的文件,以怀疑它可能被 XML
虽然 DocumentException
被抓 。这是必要的解析用途
What would you have to see inside the file to "suspect it may be XML
though DocumentException
was caught". This is needed for parsing purposes.
推荐答案
文件类型检测工具:
- Mime类型检测工具
- DROID(数字记录的对象标识)
- FTC - 文件类型分类
- JHOVE , JHOVE2
- NLNZ元数据提取工具
- 阿帕奇提卡
- 论坛报,的 TrIDNet
- 甲骨文从外到内(商业)
- 法医创新文件调查工具(商业)
- Mime Type Detection Utility
- DROID (Digital Record Object Identification)
- ftc - File Type Classifier
- JHOVE, JHOVE2
- NLNZ Metadata Extraction Tool
- Apache Tika
- TrID, TrIDNet
- Oracle Outside In (commercial)
- Forensic Innovations File Investigator TOOLS (commercial)
这篇关于如何可靠地检测文件类型?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!