为什么索引字节时会得到一个 int 值? [英] Why do I get an int when I index bytes?

查看:37
本文介绍了为什么索引字节时会得到一个 int 值?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我试图在 python 3.4 中获取字节字符串的第一个字符,但是当我索引它时,我得到一个 int:

<预><代码>>>>my_bytes = b'只是一个字节串'b'只是一个字节串'>>>my_bytes[0]106>>>类型(my_bytes[0])<类'int'>

这对我来说似乎不直观,因为我期望得到 b'j'.

我发现我可以获得我期望的价值,但这对我来说就像是一种黑客攻击.

<预><代码>>>>my_bytes[0:1]b'j'

谁能解释一下为什么会这样?

解决方案

bytes 类型是 二进制序列类型,并明确记录为包含 0 到 255 范围内的整数序列.

来自文档:

<块引用>

Bytes 对象是不可变的单字节序列.

[...]

虽然字节文字和表示基于 ASCII 文本,但字节对象实际上表现得像不可变的整数序列,序列中的每个值都受到限制,使得 0 <= x <;256[.]

[...]

由于字节对象是整数序列(类似于元组),对于字节对象 bb[0] 将是一个整数,而 b[0:1] 将是一个长度为 1 的 bytes 对象.(这与文本字符串形成对比,其中索引和切片都会产生一个字符串长度 1).

粗体强调我的.请注意,在序列类型中索引字符串是一个例外;'abc'[0] 给你一个长度为 1 的 str 对象;str 是唯一一个始终包含其自身类型元素的序列类型.

这与其他语言处理字符串数据的方式相呼应;在 C 中,unsigned char 类型实际上也是一个 0-255 范围内的整数.如果您使用非限定的 char 类型,并且文本被建模为 char[] 数组,那么许多 C 编译器默认为 unsigned.

I'm trying to get the first char of a byte-string in python 3.4, but when I index it, I get an int:

>>> my_bytes = b'just a byte string'
b'just a byte string'
>>> my_bytes[0]
106
>>> type(my_bytes[0])
<class 'int'>

This seems unintuitive to me, as I was expecting to get b'j'.

I have discovered that I can get the value I expect, but it feels like a hack to me.

>>> my_bytes[0:1]
b'j'

Can someone please explain why this happens?

解决方案

The bytes type is a Binary Sequence type, and is explicitly documented as containing a sequence of integers in the range 0 to 255.

From the documentation:

Bytes objects are immutable sequences of single bytes.

[...]

While bytes literals and representations are based on ASCII text, bytes objects actually behave like immutable sequences of integers, with each value in the sequence restricted such that 0 <= x < 256[.]

[...]

Since bytes objects are sequences of integers (akin to a tuple), for a bytes object b, b[0] will be an integer, while b[0:1] will be a bytes object of length 1. (This contrasts with text strings, where both indexing and slicing will produce a string of length 1).

Bold emphasis mine. Note than indexing a string is a bit of an exception among the sequence types; 'abc'[0] gives you a str object of length one; str is the only sequence type that contains elements of its own type, always.

This echoes how other languages treat string data; in C the unsigned char type is also effectively an integer in the range 0-255. Many C compilers default to unsigned if you use an unqualified char type, and text is modelled as a char[] array.

这篇关于为什么索引字节时会得到一个 int 值?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
相关文章
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆