什么是 Python 字节串? [英] What is a Python bytestring?

查看:37
本文介绍了什么是 Python 字节串?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

什么是 Python 字节串?

我所能找到的只是关于如何编码为字节串或解码为 asciiutf-8 的主题.我试图了解它是如何工作的.在普通的 ASCII 字符串中,它是一个数组或字符列表,每个字符代表一个 0-255 之间的 ASCII 值,这样您就知道数字代表什么字符了.在 Unicode 中,它是字符的 8 或 16 字节表示,告诉您它是什么字符.

那么什么是字节串?Python 如何知道将哪些字符表示为什么?它是如何在引擎盖下工作的?由于您可以打印甚至返回这些字符串并且它向您显示字符串表示形式,因此我不太明白...

好的,所以我的观点是肯定在这里被错过.有人告诉我这是一个不可变字节序列,没有任何特定的解释.

一个字节序列..好吧,让我们说一个字节:
'a'.encode() 返回 b'a'.

很简单.为什么我可以阅读a?

假设我通过执行以下操作获得 a 的 ASCII 值:
printf "%d" "'a"

它返回97.好的,好的,ASCII 字符 a 的整数值.如果我们将 97 解释为 ASCII,比如在 C char 中,那么我们得到字母 a.很公平.如果我们将字节表示转换为位,我们会得到:

01100001

2^0 + 2^5 + 2^6 = 97.很酷.

那么为什么 'a'.encode() 返回 b'a' 而不是 01100001??
如果它没有特定的解释,它不应该返回类似b'01100001'的东西吗?
似乎就像是将其解释为 ASCII.

有人提到它在字节串上调用 __repr__ ,因此它以人类可读的形式显示.但是,即使我执行以下操作:

 with open('testbytestring.txt', 'wb') as f:f.write(b'helloworld')

它会仍然helloworld 作为常规字符串插入到文件中,而不是作为字节序列...ASCII 中的字节字符串也是这样吗?

解决方案

Python 知道如何表示字节串.这就是重点.

当您将一个值为 97 的字符输出到几乎任何输出窗口时,您都会得到字符a",但这不是实现的一部分;这只是在当地发生的事情.如果你想要一个编码,你不使用字节串.如果使用字节串,则没有编码.

您关于 .txt 文件的文章表明您误解了正在发生的事情.你看,纯文本文件也没有编码.它们只是一系列字节.这些字节会被文本编辑器翻译成字母,但不能保证根本,如果您偏离了常见的 ASCII 字符集,其他打开您文件的人会看到与您相同的内容.>

What's a Python bytestring?

All I can find are topics on how to encode to bytestring or decode to ascii or utf-8. I'm trying to understand how it works under the hood. In a normal ASCII string, it's an array or list of characters, and each character represents an ASCII value from 0-255, so that's how you know what character is represented by the number. In Unicode, it's the 8- or 16-byte representation for the character that tells you what character it is.

So what is a bytestring? How does Python know which characters to represent as what? How does it work under the hood? Since you can print or even return these strings and it shows you the string representation, I don't quite get it...

Ok, so my point is definitely getting missed here. I've been told that it's an immutable sequence of bytes without any particular interpretation.

A sequence of bytes.. Okay, let's say one byte:
'a'.encode() returns b'a'.

Simple enough. Why can I read the a?

Say I get the ASCII value for a, by doing this:
printf "%d" "'a"

It returns 97. Okay, good, the integer value for the ASCII character a. If we interpret 97 as ASCII, say in a C char, then we get the letter a. Fair enough. If we convert the byte representation to bits, we get this:

01100001

2^0 + 2^5 + 2^6 = 97. Cool.

So why is 'a'.encode() returning b'a' instead of 01100001??
If it's without a particular interpretation, shouldn't it be returning something like b'01100001'?
It seems like it's interpreting it like ASCII.

Someone mentioned that it's calling __repr__ on the bytestring, so it's displayed in human-readable form. However, even if I do something like:

with open('testbytestring.txt', 'wb') as f:
    f.write(b'helloworld')

It will still insert helloworld as a regular string into the file, not as a sequence of bytes... So is a bytestring in ASCII?

解决方案

Python does not know how to represent a bytestring. That's the point.

When you output a character with value 97 into pretty much any output window, you'll get the character 'a' but that's not part of the implementation; it's just a thing that happens to be locally true. If you want an encoding, you don't use bytestring. If you use bytestring, you don't have an encoding.

Your piece about .txt files shows you have misunderstood what is happening. You see, plain text files too don't have an encoding. They're just a series of bytes. These bytes get translated into letters by the text editor but there is no guarantee at all that someone else opening your file will see the same thing as you if you stray outside the common set of ASCII characters.

这篇关于什么是 Python 字节串?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆