使用ASCII文本标题二进制输入,从标准输入读取 [英] binary input with an ASCII text header, read from stdin
问题描述
我想读一个二进制 PNM 图像从标准输入文件。该文件包含一个头是连接codeD为ASCII文本,以及有效载荷是二进制的。作为读取头一个简单的例子,我已经创建了下面的代码片段:
I want to read a binary PNM image file from stdin. The file contains a header which is encoded as ASCII text, and a payload which is binary. As a simplified example of reading the header, I have created the following snippet:
#! /usr/bin/env python3
import sys
header = sys.stdin.readline()
print("header=["+header.strip()+"]")
我运行为test.py(从Bash shell中),它在这种情况下正常工作:
I run it as "test.py" (from a Bash shell), and it works fine in this case:
$ printf "P5 1 1 255\n\x41" |./test.py
header=[P5 1 1 255]
然而,在二进制负载一个小的变化打破它:
However, a small change in the binary payload breaks it:
$ printf "P5 1 1 255\n\x81" |./test.py
Traceback (most recent call last):
File "./test.py", line 3, in <module>
header = sys.stdin.readline()
File "/usr/lib/python3.4/codecs.py", line 313, in decode
(result, consumed) = self._buffer_decode(data, self.errors, final)
UnicodeDecodeError: 'utf-8' codec can't decode byte 0x81 in position 11: invalid start byte
有一个简单的方法,使在Python 3这个工作呢?
Is there an easy way to make this work in Python 3?
推荐答案
的文档,就可以从 sys.stdin.buffer.read()$标准输入读取二进制数据(如键入
字节
) C $ C>:
From the docs, it is possible to read binary data (as type bytes
) from stdin with sys.stdin.buffer.read()
:
要写入或读取二进制数据从/到标准流,使用
底层二进制缓冲区对象。例如,要写入的字节到
标准输出,使用sys.stdout.buffer.write(b'abc')
To write or read binary data from/to the standard streams, use the underlying binary buffer object. For example, to write bytes to stdout, use sys.stdout.buffer.write(b'abc').
所以这是,你可以采取一个方向 - 读二进制模式下的数据。 的ReadLine()
和其他各种功能仍然有效。一旦你捕获的ASCII字符串,它可以被转换成文本,使用德code('ASCII码')
,附加特定的文本处理。
So this is one direction that you can take -- read the data in binary mode. readline()
and various other functions still work. Once you have captured the ASCII string, it can be converted to text, using decode('ASCII')
, for additional text-specific processing.
另外,你可以使用 io.TextIOWrapper()
来表示了拉丁-1
字符集上的输入流。与此相关,隐德code操作实际上都是直通操作 - 这样的数据将类型 STR
(其中重present文本),但数据被再次与从二进制1对1映射(虽然也可以使用每个输入字节以上的存储字节psented $ p $)。
Alternatively, you can use io.TextIOWrapper()
to indicate the use of the latin-1
character set on the input stream. With this, the implicit decode operation will essentially be a pass-through operation -- so the data will be of type str
(which represent text), but the data is represented with a 1-to-1 mapping from the binary (although it could be using more than one storage byte per input byte).
下面是code这两种模式下工作:
Here's code that works in either mode:
#! /usr/bin/python3
import sys, io
BINARY=True ## either way works
if BINARY: istream = sys.stdin.buffer
else: istream = io.TextIOWrapper(sys.stdin.buffer,encoding='latin-1')
header = istream.readline()
if BINARY: header = header.decode('ASCII')
print("header=["+header.strip()+"]")
payload = istream.read()
print("len="+str(len(payload)))
for i in payload: print( i if BINARY else ord(i) )
测试每一个可能的1个像素的有效载荷具有以下bash命令:
Test every possible 1-pixel payload with the following Bash command:
for i in $(seq 0 255) ; do printf "P5 1 1 255\n\x$(printf %02x $i)" |./test.py ; done
这篇关于使用ASCII文本标题二进制输入,从标准输入读取的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!