我怎么能检测如果是二进制文件(非文本)在Python? [英] How can I detect if a file is binary (non-text) in python?

查看:343
本文介绍了我怎么能检测如果是二进制文件(非文本)在Python?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我如何知道一个文件在python是二进制(非文本)?我通过Python编写一个大型文件集的搜索和不断收到比赛在二进制文件。这使得输出看起来非常凌乱。

How can I tell if a file is binary (non-text) in python? I am searching through a large set of files in python, and keep getting matches in binary files. This makes the output look incredibly messy.

我知道我可以用grep -I,但我比grep所允许的数据做更多。

I know I could use grep -I, but I am doing more with the data than what grep allows for.

在过去,我会只搜索大于0x7F字符,但是UTF8之类的就现代系统是不可能的。理想情况下,解决办法是快,但任何解决方案都行。

In the past I would have just searched for characters greater than 0x7f, but utf8 and the like make that impossible on modern systems. Ideally the solution would be fast, but any solution will do.

推荐答案

您也可以使用MIME类型模块:

You can also use the mimetypes module:

import mimetypes
...
mime = mimetypes.guess_type(file)

这是相当简单,易于编译二进制MIME类型的列表。例如Apache的分布与mime.types文件,你可以解析为一组列表,二进制和文本,然后检查,看看如果MIME是在文本或二进制列表。

It's fairly easy to compile a list of binary mime types. For example Apache distributes with a mime.types file that you could parse into a set of lists, binary and text and then check to see if the mime is in your text or binary list.

这篇关于我怎么能检测如果是二进制文件(非文本)在Python?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆