从二进制数据中识别没有扩展名的文件的类型 [英] Identifying the type of a file without extension from binary data

查看:184
本文介绍了从二进制数据中识别没有扩展名的文件的类型的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一些没有扩展名的文件。我想要关联扩展到他们。为此我写了一个python程序来读取文件中的数据。我的疑问是,如何在不使用第三方工具的情况下识别其类型。



我只能识别pdf,doc和文本文件。其他类型的文件是不可能的。

我的服务器是cent os

解决方案

您可以读取文件的前几个字节,然后查找幻数。关于幻数的维基百科页表明PDF文件以ASCII %开始PDF 和doc文件以十六进制开头D0 CF 11 E0。



识别文本文件在一般情况下会非常困难,因为很多标准魔术数字实际上是ASCII文本在二进制文件的开头。对于你的情况,如果你能保证你不会得到任何东西,但PDF,DOC或TXT,你可能会逃避检查的PDF和DOC幻数,然后假设它的文本,如果它不是那些。


I have some files without extension. I would like associate extensions to them. For that I have written a python program to read the data in the file. My doubt is how can I identify its type without the extension without using third party tools.

I have to identify a pdf, doc and text file only. Other type of files are not possible.

My server is cent os

解决方案

You could read the first few bytes of the file and look for a "magic number". The Wikipedia page on magic numbers suggests that PDF files begin with ASCII %PDF and doc files begin with hex D0 CF 11 E0.

Identifying text files is going be pretty tough in the general case, because a lot of standard magic numbers are actually ASCII text at the beginning of a binary file. For your case, if you can guarantee that you won't be getting anything but PDF, DOC, or TXT, what you could probably get away with is checking for the PDF and DOC magic numbers, and then assuming it's text if it's not either of those.

这篇关于从二进制数据中识别没有扩展名的文件的类型的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆