如何在 Python 中检测文件是否为二进制(非文本)? [英] How can I detect if a file is binary (non-text) in Python?

查看:32
本文介绍了如何在 Python 中检测文件是否为二进制(非文本)?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

如何在 Python 中判断文件是否为二进制(非文本)?

How can I tell if a file is binary (non-text) in Python?

我正在 Python 中搜索大量文件,并不断获取二进制文件中的匹配项.这使得输出看起来非常混乱.

I am searching through a large set of files in Python, and keep getting matches in binary files. This makes the output look incredibly messy.

我知道我可以使用 grep -I,但我对数据的处理比 grep 允许的要多.

I know I could use grep -I, but I am doing more with the data than what grep allows for.

在过去,我只会搜索大于 0x7f 的字符,但是 utf8 之类的东西在现代系统上是不可能的.理想情况下,解决方案应该很快.

In the past, I would have just searched for characters greater than 0x7f, but utf8 and the like, make that impossible on modern systems. Ideally, the solution would be fast.

推荐答案

您也可以使用 mimetypes 模块:

You can also use the mimetypes module:

import mimetypes
...
mime = mimetypes.guess_type(file)

编译二进制 mime 类型列表相当容易.例如,Apache 分发了一个 mime.types 文件,您可以将其解析为一组列表、二进制和文本,然后检查 mime 是否在您的文本或二进制列表中.

It's fairly easy to compile a list of binary mime types. For example Apache distributes with a mime.types file that you could parse into a set of lists, binary and text and then check to see if the mime is in your text or binary list.

这篇关于如何在 Python 中检测文件是否为二进制(非文本)?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆