切片端点无形地被截断 [英] Slice endpoints invisibly truncated

查看:112
本文介绍了切片端点无形地被截断的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

 >> class Potato(object):
... def __getslice __(self,start,stop):
...打印开始,停止
...
>>> ; sys.maxint
9223372036854775807
>> x = sys.maxint + 69
>>打印x
9223372036854775876
>> Potato()[123:x]
123 9223372036854775807

为什么对getslice的调用没有尊重我寄出的止损点,而不是默默地替换2 ^ 63-1?这是否意味着对于您自己的语法实现 __ getslice __ 通常对于long来说是不安全的?



无论如何,我可以使用 __ getitem __ 做任何我想做的事,我只是想知道为什么 __getslice __ 显然已损坏。



编辑: CPython中截断切片的代码在哪里?这是python(语言)规范的一部分,还是cpython(实现)的功能?

解决方案

处理实现 sq_slice 的对象的切片的Python C代码c $ c>插槽,不能处理 Py_ssize_t (== sys.maxsize )。 sq_slice 插槽与 __ getslice __ 特殊方法的C-API等效。



对于两元素切片,Python 2使用 SLICE + * 操作码;然后由 apply_slice()处理函数。这使用 _PyEval_SliceIndex 函数转换Python索引对象( int long 或实现 __ index __ 方法 )到 Py_ssize_t 整数。该方法具有以下注释:

  / *从PyInt或PyLong或定义了
nb_index插槽的对象,并存储在* pi中。
静默将大于PY_SSIZE_T_MAX的值减小为PY_SSIZE_T_MAX,
,并将小于-PY_SSIZE_T_MAX-1的值静默增大为-PY_SSIZE_T_MAX-1。
错误返回0,成功返回1。
* /

这意味着在Python 2中切片 any 如果提供了 sq_slice 插槽,则使用2值语法限制为 sys.maxsize 范围内的值。 / p>

使用三值形式( item [start:stop:stride] )切片使用 BUILD_SLICE 操作码(然后是 BINARY_SUBSCR ),而是创建一个 slice()对象,但不限于 sys.maxsize



如果对象没有不会实现 sq_slice()插槽(因此不存在 __ getslice __ ), apply_slice( )函数还可以使用 slice()对象。



至此,这是实现细节或语言的一部分: 切片表达式文档区分 simple_slicing extended_slicing ;前者仅允许使用 short_slice 形式。为了简单切片,索引必须为纯整数


下限和上限表达式(如果存在)必须计算为纯整数;默认值分别为零和 sys.maxint


Python 2 语言将索引限制为 sys.maxint 值,不允许使用长整数。在Python 3中,简单的切片已从该语言中完全删除。



如果您的代码必须支持切片,且切片的值超出 sys.maxsize ,您必须从实现 __ getslice __ 的类型继承,那么您的选择是:




  • 使用三值语法,其中 None 为大步:

      Potato()[123:x:None] 


  • 显式创建 slice()对象:

      Potato()[slice(123,x)] 




slice()对象可以处理 long 个整数。但是 slice.indices()方法仍不能处理超过 sys.maxsize 的长度:

 >> import sys 
>> s = slice(0,sys.maxsize +1)
>> s
slice(0,9223372036854775808L,None)
>> s.stop
9223372036854775808L
>> s.indices(sys.maxsize + 2)
追溯(最近一次呼叫最近):
文件< stdin>,< module>中的第1行。
OverflowError:无法将'long'装入索引大小的整数


>>> class Potato(object):
...    def __getslice__(self, start, stop):
...       print start, stop
...         
>>> sys.maxint
9223372036854775807
>>> x = sys.maxint + 69
>>> print x
9223372036854775876
>>> Potato()[123:x]
123 9223372036854775807

Why the call to getslice doesn't respect the stop I sent in, instead silently substituting 2^63 - 1? Does it mean that implementing __getslice__ for your own syntax will generally be unsafe with longs?

I can do whatever I need with __getitem__ anyway, I'm just wondering why __getslice__ is apparently broken.

Edit: Where is the code in CPython which truncates the slice? Is this part of python (language) spec or just a "feature" of cpython (implementation)?

解决方案

The Python C code that handles slicing for objects that implement the sq_slice slot, cannot handle any integers over Py_ssize_t (== sys.maxsize). The sq_slice slot is the C-API equivalent of the __getslice__ special method.

For a two-element slice, Python 2 uses one of the SLICE+* opcodes; this is then handled by the apply_slice() function. This uses the _PyEval_SliceIndex function to convert the Python index objects (int, long, or anything implementing the __index__ method) to a Py_ssize_t integer. The method has the following comment:

/* Extract a slice index from a PyInt or PyLong or an object with the
   nb_index slot defined, and store in *pi.
   Silently reduce values larger than PY_SSIZE_T_MAX to PY_SSIZE_T_MAX,
   and silently boost values less than -PY_SSIZE_T_MAX-1 to -PY_SSIZE_T_MAX-1.
   Return 0 on error, 1 on success.
*/

This means that any slicing in Python 2 using the 2-value syntax is limited to values in the sys.maxsize range when a sq_slice slot is provided.

Slicing using the three-value form (item[start:stop:stride]) uses the BUILD_SLICE opcode instead (followed by BINARY_SUBSCR) and this instead creates a slice() object without limiting to sys.maxsize.

If the object doesn't implement a sq_slice() slot (so no __getslice__ is present) the apply_slice() function also falls back to using a slice() object.

As for this being an implementation detail or part of the language: the Slicings expression documentation distinguishes between simple_slicing and extended_slicing; the former only permits the short_slice form. For simple slicing the indices must be plain integers:

The lower and upper bound expressions, if present, must evaluate to plain integers; defaults are zero and the sys.maxint, respectively.

This suggests that Python 2 the language limits the indices to sys.maxint values, disallowing long integers. In Python 3 simple slicing has been excised from the language altogether.

If your code has to support slicing with values beyond sys.maxsize and you have to inherit from a type that implements __getslice__ then your options are to:

  • use the three-value syntax, with None for the stride:

    Potato()[123:x:None]
    

  • to create slice() objects explicitly:

    Potato()[slice(123, x)]
    

slice() objects can handle long integers just fine; however the slice.indices() method cannot handle lengths over sys.maxsize still:

>>> import sys
>>> s = slice(0, sys.maxsize + 1)
>>> s
slice(0, 9223372036854775808L, None)
>>> s.stop
9223372036854775808L
>>> s.indices(sys.maxsize + 2)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
OverflowError: cannot fit 'long' into an index-sized integer

这篇关于切片端点无形地被截断的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆