为什么索引字节时会得到一个 int 值? [英] Why do I get an int when I index bytes?
问题描述
我试图在 python 3.4 中获取字节字符串的第一个字符,但是当我索引它时,我得到一个 int
:
这对我来说似乎不直观,因为我期望得到 b'j'
.
我发现我可以获得我期望的价值,但这对我来说就像是一种黑客攻击.
<预><代码>>>>my_bytes[0:1]b'j'谁能解释一下为什么会这样?
bytes
类型是 二进制序列类型,并明确记录为包含 0 到 255 范围内的整数序列.
来自文档:
<块引用>Bytes 对象是不可变的单字节序列.
[...]
虽然字节文字和表示基于 ASCII 文本,但字节对象实际上表现得像不可变的整数序列,序列中的每个值都受到限制,使得 0 <= x <;256
[.]
[...]
由于字节对象是整数序列(类似于元组),对于字节对象 b
,b[0]
将是一个整数,而 b[0:1]
将是一个长度为 1 的 bytes
对象.(这与文本字符串形成对比,其中索引和切片都会产生一个字符串长度 1).
粗体强调我的.请注意,在序列类型中索引字符串是一个例外;'abc'[0]
给你一个长度为 1 的 str
对象;str
是唯一一个始终包含其自身类型元素的序列类型.
这与其他语言处理字符串数据的方式相呼应;在 C 中,unsigned char
类型实际上也是一个 0-255 范围内的整数.如果您使用非限定的 char
类型,并且文本被建模为 char[]
数组,那么许多 C 编译器默认为 unsigned
.
I'm trying to get the first char of a byte-string in python 3.4, but when I index it, I get an int
:
>>> my_bytes = b'just a byte string'
b'just a byte string'
>>> my_bytes[0]
106
>>> type(my_bytes[0])
<class 'int'>
This seems unintuitive to me, as I was expecting to get b'j'
.
I have discovered that I can get the value I expect, but it feels like a hack to me.
>>> my_bytes[0:1]
b'j'
Can someone please explain why this happens?
The bytes
type is a Binary Sequence type, and is explicitly documented as containing a sequence of integers in the range 0 to 255.
From the documentation:
Bytes objects are immutable sequences of single bytes.
[...]
While bytes literals and representations are based on ASCII text, bytes objects actually behave like immutable sequences of integers, with each value in the sequence restricted such that
0 <= x < 256
[.][...]
Since bytes objects are sequences of integers (akin to a tuple), for a bytes object
b
,b[0]
will be an integer, whileb[0:1]
will be abytes
object of length 1. (This contrasts with text strings, where both indexing and slicing will produce a string of length 1).
Bold emphasis mine. Note than indexing a string is a bit of an exception among the sequence types; 'abc'[0]
gives you a str
object of length one; str
is the only sequence type that contains elements of its own type, always.
This echoes how other languages treat string data; in C the unsigned char
type is also effectively an integer in the range 0-255. Many C compilers default to unsigned
if you use an unqualified char
type, and text is modelled as a char[]
array.
这篇关于为什么索引字节时会得到一个 int 值?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!