在Python中搜索/读取二进制数据 [英] Searching/reading binary data in Python

查看:340
本文介绍了在Python中搜索/读取二进制数据的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在读取一个二进制文件(在这种情况下为jpg),并且需要在该文件中找到一些值.对于那些感兴趣的人,二进制文件是jpg,我正在尝试通过查找二进制结构来选择其尺寸,如

I'm reading in a binary file (a jpg in this case), and need to find some values in that file. For those interested, the binary file is a jpg and I'm attempting to pick out its dimensions by looking for the binary structure as detailed here.

我需要在二进制数据中找到FFC0,向前跳过一些字节,然后读取4个字节(这应该给我图像尺寸).

I need to find FFC0 in the binary data, skip ahead some number of bytes, and then read 4 bytes (this should give me the image dimensions).

在二进制数据中搜索值的一种好方法是什么?有等于发现"的东西,还是类似re的东西?

What's a good way of searching for the value in the binary data? Is there an equivalent of 'find', or something like re?

推荐答案

您实际上可以将文件加载到字符串中,然后使用str.find()方法在该字符串中搜索字节序列0xffc0.它适用于任何字节序列.

You could actually load the file into a string and search that string for the byte sequence 0xffc0 using the str.find() method. It works for any byte sequence.

执行此操作的代码取决于几件事.如果您以二进制模式打开文件并且使用的是Python 3(这两种都是该方案的最佳实践),则需要搜索一个字节字符串(而不是字符串),这意味着您必须在字符串前面加上b.

The code to do this depends on a couple things. If you open the file in binary mode and you're using Python 3 (both of which are probably best practice for this scenario), you'll need to search for a byte string (as opposed to a character string), which means you have to prefix the string with b.

with open(filename, 'rb') as f:
    s = f.read()
s.find(b'\xff\xc0')

如果您在Python 3中以文本模式打开文件,则必须搜索一个字符串:

If you open the file in text mode in Python 3, you'd have to search for a character string:

with open(filename, 'r') as f:
    s = f.read()
s.find('\xff\xc0')

尽管没有特别的理由这样做.与以前的方法相比,它没有任何优势,而且如果您使用的平台对二进制文件和文本文件的处理方式不同(例如Windows),则很可能会引起问题.

though there's no particular reason to do this. It doesn't get you any advantage over the previous way, and if you're on a platform that treats binary files and text files differently (e.g. Windows), there is a chance this will cause problems.

Python 2不会区分字节字符串和字符串,因此,如果您使用的是该版本,则在b'\xff\xc0'中包含还是排除b都没有关系.而且,如果您的平台对二进制文件和文本文件的处理方式相同(例如Mac或Linux),则将'r''rb'用作文件模式也没关系.但是我仍然建议您使用上面第一个代码示例之类的东西,以保持向前兼容性-万一您确实切换到Python 3,那就没事了.

Python 2 doesn't make the distinction between byte strings and character strings, so if you're using that version, it doesn't matter whether you include or exclude the b in b'\xff\xc0'. And if your platform treats binary files and text files identically (e.g. Mac or Linux), it doesn't matter whether you use 'r' or 'rb' as the file mode either. But I'd still recommend using something like the first code sample above just for forward compatibility - in case you ever do switch to Python 3, it's one less thing to fix.

这篇关于在Python中搜索/读取二进制数据的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆