如何使用Python识别的二进制和文本文件? [英] How to identify binary and text files using Python?
本文介绍了如何使用Python识别的二进制和文本文件?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!
问题描述
我需要确定哪些目录中的文件为二并这是一个文字。
I need identify which file is binary and which is a text in a directory.
我试着使用的 MIME类型的,但在我的案件心不是一个好主意,因为它不能识别的所有文件默剧,我有那些陌生人......在这里我只需要知道,二进制或文本。简单吗?但我couldn't找到一个解决办法...
I tried use mimetypes but it isnt a good idea in my case because it cant identify all files mimes, and I have strangers ones here... I just need know, binary or text. Simple ? But I couldn´t find a solution...
感谢
推荐答案
谢谢大家,我发现,我的适应问题的解决方案。我发现这个code。在 HTTP://$c$c.activestate.com/食谱/ 173220 / ,然后我改变一点点一块适合我。
Thanks everybody, I found a solution that suited my problem. I found this code at http://code.activestate.com/recipes/173220/ and I changed just a little piece to suit me.
它工作正常。
from __future__ import division
import string
def istext(filename):
s=open(filename).read(512)
text_characters = "".join(map(chr, range(32, 127)) + list("\n\r\t\b"))
_null_trans = string.maketrans("", "")
if not s:
# Empty files are considered text
return True
if "\0" in s:
# Files with null bytes are likely binary
return False
# Get the non-text characters (maps a character to itself then
# use the 'remove' option to get rid of the text characters.)
t = s.translate(_null_trans, text_characters)
# If more than 30% non-text characters, then
# this is considered a binary file
if float(len(t))/float(len(s)) > 0.30:
return False
return True
这篇关于如何使用Python识别的二进制和文本文件?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!
查看全文