Python 2 和 3 中的分块字节(不是字符串) [英] Chunking bytes (not strings) in Python 2 and 3

查看:43
本文介绍了Python 2 和 3 中的分块字节(不是字符串)的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

事实证明这比我预期的要棘手.我有一个字节串:

data = b'abcdefghijklmnopqrstuvwxyz'

我想以 n 个字节的块读取这些数据.在 Python 2 下,只需对 itertools 文档中的 grouper 配方稍作修改即可:

def grouper(iterable, n, fillvalue=None):将数据收集到固定长度的块或块中"# grouper('ABCDEFG', 3, 'x') -->ABC DEF Gxxargs = [iter(iterable)] * nreturn (''.join(x) for x in izip_longest(fillvalue=fillvalue, *args))

有了这个,我可以打电话:

<预><代码>>>>列表(石斑鱼(数据,2))

并得到:

['ab', 'cd', 'ef', 'gh', 'ij', 'kl', 'mn', 'op', 'qr', 'st', 'uv', 'wx', 'yz']

在 Python 3 下,这变得更加棘手.grouper 函数写成只是摔倒了:

<预><代码>>>>列表(石斑鱼(数据,2))回溯(最近一次调用最后一次):文件<stdin>",第 1 行,位于 <module>文件<stdin>",第 5 行,在 <genexpr> 中.类型错误:序列项 0:预期的 str 实例,找到 int

这是因为在 Python 3 中,当您遍历一个字节串(如 b'foo')时,您会得到一个整数列表,而不是一个字节列表:

<预><代码>>>>列表(b'foo')[102, 111, 111]

python 3 bytes 函数将在这里提供帮助:

def grouper(iterable, n, fillvalue=None):将数据收集到固定长度的块或块中"# grouper('ABCDEFG', 3, 'x') -->ABC DEF Gxxargs = [iter(iterable)] * nreturn (bytes(x) for x in izip_longest(fillvalue=fillvalue, *args))

使用它,我得到了我想要的:

<预><代码>>>>列表(石斑鱼(数据,2))[b'ab', b'cd', b'ef', b'gh', b'ij', b'kl', b'mn', b'op', b'qr', b'st', b'uv', b'wx', b'yz']

但是(当然!)Python 2 下的 bytes 函数不起作用一样的方法.它只是 str 的别名,所以结果是:

<预><代码>>>>列表(石斑鱼(数据,2))["('a', 'b')", "('c', 'd')", "('e', 'f')", "('g', 'h')", "('i', 'j')", "('k', 'l')", "('m', 'n')", "('o', 'p')", "('q', 'r')", "('s', 't')", "('u', 'v')", "('w', 'x')", "('y'), 'z')"]

...这根本没有帮助.我最终写了以下内容:

def to_bytes(s):如果六.PY3:返回字节别的:返回 '​​'.encode('utf-8').join(list(s))def grouper(iterable, n, fillvalue=None):将数据收集到固定长度的块或块中"# grouper('ABCDEFG', 3, 'x') -->ABC DEF Gxxargs = [iter(iterable)] * nreturn (to_bytes(x) for x in izip_longest(fillvalue=fillvalue, *args))

这似乎有效,但这真的是方法吗?

解决方案

Funcy(一个提供各种有用实用程序的库,支持 Python 2 和 3)提供了一个 chunks 函数就是这样做的:

<预><代码>>>>导入功能>>>数据 = b'abcdefghijklmnopqrstuvwxyz'>>>列表(funcy.chunks(6,数据))[b'abcdef', b'ghijkl', b'mnopqr', b'stuvwx', b'yz'] # Python 3['abcdef', 'ghijkl', 'mnopqr', 'stuvwx', 'yz'] # Python 2.7

或者,您可以在程序中包含一个简单的实现(与 Python 2.7 和 3 兼容):

def 分块(大小,来源):对于范围内的 i (0, len(source), size):产量来源[i:i+size]

它的行为是相同的(至少对于您的数据而言;Funcy 的 chunks 也适用于迭代器,但不能):

<预><代码>>>>列表(分块(6,数据))[b'abcdef', b'ghijkl', b'mnopqr', b'stuvwx', b'yz'] # Python 3['abcdef', 'ghijkl', 'mnopqr', 'stuvwx', 'yz'] # Python 2.7

This is turning out to be trickier than I expected. I have a byte string:

data = b'abcdefghijklmnopqrstuvwxyz'

I want to read this data in chunks of n bytes. Under Python 2, this is trivial using a minor modification to the grouper recipe from the itertools documentation:

def grouper(iterable, n, fillvalue=None):
    "Collect data into fixed-length chunks or blocks"
    # grouper('ABCDEFG', 3, 'x') --> ABC DEF Gxx
    args = [iter(iterable)] * n
    return (''.join(x) for x in izip_longest(fillvalue=fillvalue, *args))

With this in place, I can call:

>>> list(grouper(data, 2))

And get:

['ab', 'cd', 'ef', 'gh', 'ij', 'kl', 'mn', 'op', 'qr', 'st', 'uv', 'wx', 'yz']

Under Python 3, this gets trickier. The grouper function as written simply falls over:

>>> list(grouper(data, 2))
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "<stdin>", line 5, in <genexpr>
TypeError: sequence item 0: expected str instance, int found

And this is because in Python 3, when you iterate over a bytestring (like b'foo'), you get a list of integers, rather than a list of bytes:

>>> list(b'foo')
[102, 111, 111]

The python 3 bytes function will help out here:

def grouper(iterable, n, fillvalue=None):
    "Collect data into fixed-length chunks or blocks"
    # grouper('ABCDEFG', 3, 'x') --> ABC DEF Gxx
    args = [iter(iterable)] * n
    return (bytes(x) for x in izip_longest(fillvalue=fillvalue, *args))

Using that, I get what I want:

>>> list(grouper(data, 2))
[b'ab', b'cd', b'ef', b'gh', b'ij', b'kl', b'mn', b'op', b'qr', b'st', b'uv', b'wx', b'yz']

But (of course!) the bytes function under Python 2 does not behave the same way. It's just an alias for str, so that results in:

>>> list(grouper(data, 2))
["('a', 'b')", "('c', 'd')", "('e', 'f')", "('g', 'h')", "('i', 'j')", "('k', 'l')", "('m', 'n')", "('o', 'p')", "('q', 'r')", "('s', 't')", "('u', 'v')", "('w', 'x')", "('y', 'z')"]

...which is not at all helpful. I ended up writing the following:

def to_bytes(s):
    if six.PY3:
        return bytes(s)
    else:
        return ''.encode('utf-8').join(list(s))

def grouper(iterable, n, fillvalue=None):
    "Collect data into fixed-length chunks or blocks"
    # grouper('ABCDEFG', 3, 'x') --> ABC DEF Gxx
    args = [iter(iterable)] * n
    return (to_bytes(x) for x in izip_longest(fillvalue=fillvalue, *args))

This seems to work, but is this really the way to do it?

解决方案

Funcy (a library offering various useful utilities, supporting both Python 2 and 3) offers a chunks function that does exactly this:

>>> import funcy
>>> data = b'abcdefghijklmnopqrstuvwxyz'
>>> list(funcy.chunks(6, data))
[b'abcdef', b'ghijkl', b'mnopqr', b'stuvwx', b'yz']   # Python 3
['abcdef', 'ghijkl', 'mnopqr', 'stuvwx', 'yz']        # Python 2.7

Alternatively, you could include a simple implementation of this in your program (compatible with both Python 2.7 and 3):

def chunked(size, source):
    for i in range(0, len(source), size):
        yield source[i:i+size]

It behaves the same (at least for your data; Funcy's chunks also works with iterators, this doesn't):

>>> list(chunked(6, data))
[b'abcdef', b'ghijkl', b'mnopqr', b'stuvwx', b'yz']   # Python 3
['abcdef', 'ghijkl', 'mnopqr', 'stuvwx', 'yz']        # Python 2.7

这篇关于Python 2 和 3 中的分块字节(不是字符串)的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆