Python中的Stream / string / bytearray转换3 [英] Stream/string/bytearray transformations in Python 3

查看：742 发布时间：2017/8/17 1:01:12 encoding python-3.x

本文介绍了Python中的Stream / string / bytearray转换3的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

Python 3清理了Python对Unicode字符串的处理。根据 Python 3文档与 Python 2文档相比。

例如，概念上将字节流转换为不同形式的字节流的编解码器已被删除：

base64_codec

bz2_codec

hex_codec

并且将Unicode概念转换为不同形式的Unicode的编解码器也被删除（在Python 2中，它实际上是在Unicode和bytestream之间，但在概念上它是Unicode的Unicode，我估计）：

rot_13

我的主要问题是，什么是正确的方式在Python 3中做这些删除的编解码器用来做什么？他们是严格意义上不是编解码器，而是转换。但是接口和实现将非常类似于编解码器。

我不在乎rot_13，但我有兴趣知道什么是最佳方式实现线结束样式（Unix行结束vs Windows行结束）的转换，这应该是编码到字节流之前完成的Unicode到Unicode转换，特别是当使用UTF-16时，如所讨论的其他SO问题。

解决方案

看起来好像所有这些非编解码器模块都是根据具体情况处理的。这是我到目前为止发现的：

base64 现在可通过 base64 模块

bz2 现在可以使用 bz2 模块

十六进制字符串编码/解码可以使用 hexlify 和 unhexlify binascii 模块的功能（有一点隐藏的功能）

我想这意味着没有创建这样的字符串/ bytearray转换模块的标准框架，在Python 3中根据具体情况重新完成。

Python 3.2更新

A 在博客文章评论使用Python的unicode支持压缩文本提醒我，这些编解码器已经回到Python 3.2。

引用评论：

由于这些是 to-text或
binary-to-binary转换，但

中的encode（）/ decode（）方法Python 3.x不支持此样式的
用法 - 这是一个Python 2.x只
功能）。

编解码器本身已经在3.2，
中，但是您需要通过编解码器
模块API以使用它们 - 他们
不可通过对象方法
简写。

查看 Python 3文档for 编解码器 - 二进制转换。

从 Barry Warsaw的博文：

你知道Python 2提供了一些编解码器来进行有趣的转换，如凯撒轮换（即ROT13）？因此，您可以执行以下操作：
 >>> 'foo'.encode（'rot-13'）
'sbb'
  
尽管如此，由于即使某些str-to-str编解码器（如rot-13）仍然存在，str.encode（）接口要求编解码器返回一个字节对象。为了在Python 2和Python 3中使用str-to-str编解码器，您必须弹出引擎盖并使用较低级别的API直接获取和调用编解码器：
 >>>从编解码器导入getencoder 
>>> encoder = getencoder（'rot-13'）
>>> rot13string = encoder（mystring）[0] 
  
你必须从返回值中获取zeroth元素的编码器由于编解码器API。有点丑，但它在两个版本的Python中都有效。

Python 3 cleans up Python's handling of Unicode strings. I assume as part of this effort, the codecs in Python 3 have become more restrictive, according to the Python 3 documentation compared to the Python 2 documentation.

For example, codecs that conceptually convert a bytestream to a different form of bytestream have been removed:

base64_codec
bz2_codec
hex_codec

And codecs that conceptually convert Unicode to a different form of Unicode have also been removed (in Python 2 it actually went between Unicode and bytestream, but conceptually it's really Unicode to Unicode I reckon):

rot_13

My main question is, what is the "right way" in Python 3 to do what these removed codecs used to do? They're not codecs in the strict sense, but "transformations". But the interface and implementation would be very similar to codecs.

I don't care about rot_13, but I'm interested to know what would be the "best way" to implement a transformation of line ending styles (Unix line endings vs Windows line endings) which should really be a Unicode-to-Unicode transformation done before encoding to byte stream, especially when UTF-16 is being used, as discussed this other SO question.

解决方案

It looks as though all these non-codec modules are being handled on a case-by-case basis. Here's what I've found so far:

base64 is now available via base64 module
bz2 can now be done using bz2 module
hex string encoding/decoding can be done with the hexlify and unhexlify functions of the binascii module (a bit of a hidden feature)

I guess that means there's no standard framework for creating such string/bytearray transformation modules, but they're being done on a case-by-case basis in Python 3.

Update for Python 3.2

A comment on a blog post "Compressing text using Python’s unicode support" alerted me to the fact that these codecs are back for Python 3.2.

Quoting the comment:

Since these are "text-to-text" or "binary-to-binary" transforms, though, the encode()/decode() methods in Python 3.x don’t support this style of usage – it’s a Python 2.x only feature).

The codecs themselves are back in 3.2, but you need to go through the codecs module API in order to use them – they aren’t available via the object method shorthand.

Look in the Python 3 docs for codecs — Binary Transforms.

From a blog post by Barry Warsaw:

Did you know that Python 2 provides some codecs for doing interesting conversions such as Caeser rotation (i.e. rot13)? Thus, you can do things like:
>>> 'foo'.encode('rot-13')
'sbb'
This doesn't work in Python 3 though, because even though certain str-to-str codecs like rot-13 still exist, the str.encode() interface requires that the codec return a bytes object. In order to use str-to-str codecs in both Python 2 and Python 3, you'll have to pop the hood and use a lower-level API, getting and calling the codec directly:
>>> from codecs import getencoder
>>> encoder = getencoder('rot-13')
>>> rot13string = encoder(mystring)[0]
You have to get the zeroth-element from the return value of the encoder because of the codecs API. A bit ugly, but it works in both versions of Python.

这篇关于Python中的Stream / string / bytearray转换3的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

Python中的Stream / string / bytearray转换3 [英] Stream/string/bytearray transformations in Python 3

问题描述

Python 3.2更新

Update for Python 3.2

相关文章

开发方法最新文章

热门教程

热门工具

登录关闭

Python中的Stream / string / bytearray转换3 [英] Stream/string/bytearray transformations in Python 3

问题描述

Python 3.2更新

Update for Python 3.2

相关文章

开发方法最新文章

热门教程

热门工具

登录 关闭

登录关闭