如何在Python中打开和显示原始二进制数据? [英] How to open and present raw binary data in Python?

查看:263
本文介绍了如何在Python中打开和显示原始二进制数据?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

这似乎是应该有很多重复和大量答案的问题类型,但是我的搜索仅导致沮丧,没有可用的解决方案.

This seems to be the type of question that should have a lot of duplicates and plenty of answers, but my searches have led only to frustration and no useable solutions.

在Python(最好是3.x)中,我想知道如何打开任意类型的文件,读取存储在磁盘上的字节,并以它们的本机",原始"形式显示这些字节. ','raw'形式,然后对它们进行任何编码.

In Python (preferably 3.x), I would like to know how I can open a file of an arbitrary type, read the bytes that are stored on disk, and present those bytes in their most 'native', 'original', 'raw' form, before any encoding is done on them.

如果文件以00010100 10000100 ...流的形式存储在磁盘上,那么这就是我希望在屏幕上显示的内容.

If the file is stored on disk as a stream of 00010100 10000100 ... then that's what I would like to have presented on the screen.

这类问题通常会引起您为什么想知道"和用例是什么"的回答.我很好奇,这是我的用例.

These sort of questions usually elicit the response 'why do you want to know' and 'what's the use case'. I'm curious, that's my use case.

在将其标记为重复之前,请确保您牢记的答案确实回答了问题(而不仅仅是讨论编码等).谢谢!

Before you mark this as duplicate, please be sure that the answer you have in mind does indeed answer the question (rather than merely discuss encodings, etc.). Thank you!

在前三个答案中进行

感谢到此为止的三位响应者,尤其是J.F. Sebastian的扩展讨论.从所说的看来,我的问题归结为如何将文件中的字节物理记录到磁盘上以及如何读取和显示它们.在这一点上,在Python中似乎无法以原始格式获取字节的视图,但是它们可以以各种表示形式使用.整数,十六进制值,ascii等.由于问题尚未解决,因此我将开放该问题以供更多输入.

Thanks to the three responders up to this point, and especially to J.F. Sebastian for the extended discussion. It appears from what has been said that my question boils down to how bytes in files are physically recorded to disk and how they can be read and presented. At this point it doesn't seem possible in Python to obtain a view on to the bytes in their raw form, but they are available in various representations; integers, hex values, ascii, etc. As the matter isn't settled, I will leave the question open for more input.

推荐答案

'rb'模式使您能够从Python文件中读取原始二进制数据:

'rb' mode enables you to read raw binary data from a file in Python:

with open(filename, 'rb') as file:
    raw_binary_data = file.read()

type(raw_binary_data) == bytes. bytes是Python中不变的字节序列.

type(raw_binary_data) == bytes. bytes is an immutable sequence of bytes in Python.

不要混淆字节及其文本表示形式:print(raw_binary_data)将向您显示数据的文本表示形式,例如,一个字节127(以10为基:十进制),您可以将其表示为
bin(127) == '0b1111111'(基数2:二进制)或hex(127) == '0x7f'(基数16:十六进制)显示为b'\x7f'(打印七个ascii字符).来自可打印ascii范围的字节表示为相应的ascii字符,例如b'\x41'显示为b'A'(65 == 0x41 == 0b1000001).

Don't confuse bytes and their text representation: print(raw_binary_data) would show you the text representation of the data e.g., a byte 127 (base 10: decimal) that you can represent as
bin(127) == '0b1111111' (base 2: binary) or as hex(127) == '0x7f' (base 16: hexadecimal) is shown as b'\x7f' (seven ascii characters are printed). Bytes from the printable ascii range are represented as the corresponding ascii characters e.g., b'\x41' is shown as b'A' (65 == 0x41 == 0b1000001).

0x7f字节不以七个ascii二进制数字1111111的形式存储在磁盘上,也不以两个ascii十六进制数字的形式存储:7F,不以三个文字十进制数127的形式存储. b'\x7f'是字节的文本表示形式,可用于在Python源代码中指定该字节(在磁盘上也找不到文字7个ascii字符b'\x7f'). 此代码将单个字节写入磁盘:

0x7f byte is not stored on disk as seven ascii binary digits 1111111, it is not stored as two ascii hex digits: 7F, it is not stored as three literal decimal digits 127. b'\x7f' is a text representation of the byte that may be used to specify it in Python source code (you won't find literal seven ascii characters b'\x7f' on disk too). This code writes a single byte to disk:

with open('output.bin', 'wb') as file:
    file.write(b'\x7f')

必须使用某种字符来表示字节,它们是什么?

Some kind of characters must be used to represent the bytes, what are they?

OS接口(访问磁盘等硬件的方式)是按字节定义的,例如 Richard Feynman.为什么.

OS interfaces (the way you access hardware such as disks) are defined in terms of bytes e.g., POSIX read(2) i.e., the byte is a fundamental unit here: you can read/write bytes directly -- you don't need any intermediate representation. Watch Richard Feynman. Why.

如何物理上表示字节的方式在操作系统驱动程序和硬件之间-可以是任何东西-您无需担心:它隐藏在统一的OS接口后面.请参阅如何在硬盘中物理写入,读取和存储数据?

How bytes are represented physically is between OS drivers and the hardware -- it may be anything -- you don't need to worry about it: it is hidden behind the uniform OS interface. See How is data physically written, read and stored inside hard drives?

您可以直接在Python中调用os.read(),但是您不需要它; file.read()为您做到这一点(Python 3文件对象直接在POSIX接口之上实现.Python2 I/O使用C stdio库,而C stdio库又使用OS接口来实现其功能).

You could call os.read() directly in Python but you don't need it; file.read() does it for you (Python 3 file objects are implemented on top of POSIX interface directly. Python 2 I/O uses C stdio library that in turn uses OS interfaces to implement its functionality).

正如您所指出的,由OS驱动程序和硬件来确定如何写入字节,但是Python解释器随后将能够读取它们.因此,它正在读取某些内容-那是什么?它不是在读取磁盘上粒子的磁取向,是吗?它正在读取具有象征意义的内容,我想访问它.

As you point out, it's up to the OS drivers and hardware to establish how bytes are written, but the Python interpreter would then be able to read them. So it's reading something - what is that? It's not reading magnetic orientation of particles on the disk, is it? It's reading something symbolic, and I want access to it.

它正在读取字节.硬盘是一台小型计算机,因此可能会发生有趣的事情,但它不会改变它的字节数一直向下(就符号"或软件而言).

It's reading bytes. A hard disk is a small computer and therefore interesting things may happen but it does not change that It's bytes all the way down (as far as "symbolic" or software is concerned).

这本书"CODE 《计算机硬件和软件的隐藏语言》 很好地介绍了计算机中信息的表示方式-字节"一词直到第180页才定义.要查看计算机中使用的抽象级别,请从NAND到俄罗斯方块"课程可以提供帮助.

The book "CODE The Hidden Language of Computer Hardware and Software" provides a very gentle introduction into how information is represented in computers — the word "byte" is not defined until page 180. To see through abstraction levels used in computers, the course "From NAND to Tetris" can help.

这篇关于如何在Python中打开和显示原始二进制数据?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆