为什么迭代一小串字符串比一串小列表要慢? [英] Why is it slower to iterate over a small string than a small list?

查看:98
本文介绍了为什么迭代一小串字符串比一串小列表要慢?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我在玩timeit时发现,对小字符串进行简单的列表理解要比对小字符串列表进行相同的操作花费的时间更长.有什么解释吗?时间几乎是原来的1.35倍.

I was playing around with timeit and noticed that doing a simple list comprehension over a small string took longer than doing the same operation on a list of small single character strings. Any explanation? It's almost 1.35 times as much time.

>>> from timeit import timeit
>>> timeit("[x for x in 'abc']")
2.0691067844831528
>>> timeit("[x for x in ['a', 'b', 'c']]")
1.5286479570345861

在较低级别上发生了什么,这是导致这种情况的原因?

What's happening on a lower level that's causing this?

推荐答案

TL; DR

  • 对于Python 2,一旦消除了很多开销,实际速度差将接近70%(或更高).

    TL;DR

    • The actual speed difference is closer to 70% (or more) once a lot of the overhead is removed, for Python 2.

      创建对象不是不是.这两种方法都不会创建新对象,因为会缓存一个字符的字符串.

      Object creation is not at fault. Neither method creates a new object, as one-character strings are cached.

      区别并不明显,但可能是由于对类型和格式正确的字符串索引进行了大量检查而造成的.这也很可能要归功于需要检查返回的内容.

      The difference is unobvious, but is likely created from a greater number of checks on string indexing, with regards to the type and well-formedness. It is also quite likely thanks to the need to check what to return.

      列表索引非常快.

      >>> python3 -m timeit '[x for x in "abc"]'
      1000000 loops, best of 3: 0.388 usec per loop
      
      >>> python3 -m timeit '[x for x in ["a", "b", "c"]]'
      1000000 loops, best of 3: 0.436 usec per loop
      

      这与您发现的内容不同...

      This disagrees with what you've found...

      那么您必须使用Python 2.

      You must be using Python 2, then.

      >>> python2 -m timeit '[x for x in "abc"]'
      1000000 loops, best of 3: 0.309 usec per loop
      
      >>> python2 -m timeit '[x for x in ["a", "b", "c"]]'
      1000000 loops, best of 3: 0.212 usec per loop
      

      让我们解释两个版本之间的区别.我将检查编译后的代码.

      Let's explain the difference between the versions. I'll examine the compiled code.

      对于Python 3:

      import dis
      
      def list_iterate():
          [item for item in ["a", "b", "c"]]
      
      dis.dis(list_iterate)
      #>>>   4           0 LOAD_CONST               1 (<code object <listcomp> at 0x7f4d06b118a0, file "", line 4>)
      #>>>               3 LOAD_CONST               2 ('list_iterate.<locals>.<listcomp>')
      #>>>               6 MAKE_FUNCTION            0
      #>>>               9 LOAD_CONST               3 ('a')
      #>>>              12 LOAD_CONST               4 ('b')
      #>>>              15 LOAD_CONST               5 ('c')
      #>>>              18 BUILD_LIST               3
      #>>>              21 GET_ITER
      #>>>              22 CALL_FUNCTION            1 (1 positional, 0 keyword pair)
      #>>>              25 POP_TOP
      #>>>              26 LOAD_CONST               0 (None)
      #>>>              29 RETURN_VALUE
      
      def string_iterate():
          [item for item in "abc"]
      
      dis.dis(string_iterate)
      #>>>  21           0 LOAD_CONST               1 (<code object <listcomp> at 0x7f4d06b17150, file "", line 21>)
      #>>>               3 LOAD_CONST               2 ('string_iterate.<locals>.<listcomp>')
      #>>>               6 MAKE_FUNCTION            0
      #>>>               9 LOAD_CONST               3 ('abc')
      #>>>              12 GET_ITER
      #>>>              13 CALL_FUNCTION            1 (1 positional, 0 keyword pair)
      #>>>              16 POP_TOP
      #>>>              17 LOAD_CONST               0 (None)
      #>>>              20 RETURN_VALUE
      

      您在这里看到,由于每次都建立列表,列表变体可能会变慢.

      You see here that the list variant is likely to be slower due to the building of the list each time.

      这是

       9 LOAD_CONST   3 ('a')
      12 LOAD_CONST   4 ('b')
      15 LOAD_CONST   5 ('c')
      18 BUILD_LIST   3
      

      部分.字符串变体仅具有

      part. The string variant only has

       9 LOAD_CONST   3 ('abc')
      

      您可以检查一下是否确实有所不同:

      You can check that this does seem to make a difference:

      def string_iterate():
          [item for item in ("a", "b", "c")]
      
      dis.dis(string_iterate)
      #>>>  35           0 LOAD_CONST               1 (<code object <listcomp> at 0x7f4d068be660, file "", line 35>)
      #>>>               3 LOAD_CONST               2 ('string_iterate.<locals>.<listcomp>')
      #>>>               6 MAKE_FUNCTION            0
      #>>>               9 LOAD_CONST               6 (('a', 'b', 'c'))
      #>>>              12 GET_ITER
      #>>>              13 CALL_FUNCTION            1 (1 positional, 0 keyword pair)
      #>>>              16 POP_TOP
      #>>>              17 LOAD_CONST               0 (None)
      #>>>              20 RETURN_VALUE
      

      这只会产生

       9 LOAD_CONST               6 (('a', 'b', 'c'))
      

      因为元组是不可变的.测试:

      as tuples are immutable. Test:

      >>> python3 -m timeit '[x for x in ("a", "b", "c")]'
      1000000 loops, best of 3: 0.369 usec per loop
      

      太好了,赶快上手吧.

      对于Python 2:

      def list_iterate():
          [item for item in ["a", "b", "c"]]
      
      dis.dis(list_iterate)
      #>>>   2           0 BUILD_LIST               0
      #>>>               3 LOAD_CONST               1 ('a')
      #>>>               6 LOAD_CONST               2 ('b')
      #>>>               9 LOAD_CONST               3 ('c')
      #>>>              12 BUILD_LIST               3
      #>>>              15 GET_ITER            
      #>>>         >>   16 FOR_ITER                12 (to 31)
      #>>>              19 STORE_FAST               0 (item)
      #>>>              22 LOAD_FAST                0 (item)
      #>>>              25 LIST_APPEND              2
      #>>>              28 JUMP_ABSOLUTE           16
      #>>>         >>   31 POP_TOP             
      #>>>              32 LOAD_CONST               0 (None)
      #>>>              35 RETURN_VALUE        
      
      def string_iterate():
          [item for item in "abc"]
      
      dis.dis(string_iterate)
      #>>>   2           0 BUILD_LIST               0
      #>>>               3 LOAD_CONST               1 ('abc')
      #>>>               6 GET_ITER            
      #>>>         >>    7 FOR_ITER                12 (to 22)
      #>>>              10 STORE_FAST               0 (item)
      #>>>              13 LOAD_FAST                0 (item)
      #>>>              16 LIST_APPEND              2
      #>>>              19 JUMP_ABSOLUTE            7
      #>>>         >>   22 POP_TOP             
      #>>>              23 LOAD_CONST               0 (None)
      #>>>              26 RETURN_VALUE        
      

      奇怪的是,我们拥有列表的 same 构建,但是这样做的速度仍然更快. Python 2的运行速度异常快.

      The odd thing is that we have the same building of the list, but it's still faster for this. Python 2 is acting strangely fast.

      让我们消除理解并重新计时. _ =是为了防止它被优化.

      Let's remove the comprehensions and re-time. The _ = is to prevent it getting optimised out.

      >>> python3 -m timeit '_ = ["a", "b", "c"]'
      10000000 loops, best of 3: 0.0707 usec per loop
      
      >>> python3 -m timeit '_ = "abc"'
      100000000 loops, best of 3: 0.0171 usec per loop
      

      我们可以看到初始化不足以说明版本之间的差异(这些数字很小)!因此,我们可以得出结论,Python 3的理解速度较慢.随着Python 3将理解方式更改为具有更安全的作用域,这是有道理的.

      We can see that initialization is not significant enough to account for the difference between the versions (those numbers are small)! We can thus conclude that Python 3 has slower comprehensions. This makes sense as Python 3 changed comprehensions to have safer scoping.

      好吧,现在提高基准(我只是删除不是迭代的开销).这通过预先分配来删除可迭代对象的构建:

      Well, now improve the benchmark (I'm just removing overhead that isn't iteration). This removes the building of the iterable by pre-assigning it:

      >>> python3 -m timeit -s 'iterable = "abc"'           '[x for x in iterable]'
      1000000 loops, best of 3: 0.387 usec per loop
      
      >>> python3 -m timeit -s 'iterable = ["a", "b", "c"]' '[x for x in iterable]'
      1000000 loops, best of 3: 0.368 usec per loop
      

      >>> python2 -m timeit -s 'iterable = "abc"'           '[x for x in iterable]'
      1000000 loops, best of 3: 0.309 usec per loop
      
      >>> python2 -m timeit -s 'iterable = ["a", "b", "c"]' '[x for x in iterable]'
      10000000 loops, best of 3: 0.164 usec per loop
      

      我们可以检查是否调用iter是开销:

      We can check if calling iter is the overhead:

      >>> python3 -m timeit -s 'iterable = "abc"'           'iter(iterable)'
      10000000 loops, best of 3: 0.099 usec per loop
      
      >>> python3 -m timeit -s 'iterable = ["a", "b", "c"]' 'iter(iterable)'
      10000000 loops, best of 3: 0.1 usec per loop
      

      >>> python2 -m timeit -s 'iterable = "abc"'           'iter(iterable)'
      10000000 loops, best of 3: 0.0913 usec per loop
      
      >>> python2 -m timeit -s 'iterable = ["a", "b", "c"]' 'iter(iterable)'
      10000000 loops, best of 3: 0.0854 usec per loop
      

      不.不它不是.差别太小,尤其是对于Python 3.

      No. No it is not. The difference is too small, especially for Python 3.

      因此,让整个过程变慢,让我们消除更多不必要的开销!这样做的目的只是为了有更长的迭代时间,以免浪费时间.

      So let's remove yet more unwanted overhead... by making the whole thing slower! The aim is just to have a longer iteration so the time hides overhead.

      >>> python3 -m timeit -s 'import random; iterable = "".join(chr(random.randint(0, 127)) for _ in range(100000))' '[x for x in iterable]'
      100 loops, best of 3: 3.12 msec per loop
      
      >>> python3 -m timeit -s 'import random; iterable =        [chr(random.randint(0, 127)) for _ in range(100000)]' '[x for x in iterable]'
      100 loops, best of 3: 2.77 msec per loop
      

      >>> python2 -m timeit -s 'import random; iterable = "".join(chr(random.randint(0, 127)) for _ in range(100000))' '[x for x in iterable]'
      100 loops, best of 3: 2.32 msec per loop
      
      >>> python2 -m timeit -s 'import random; iterable =        [chr(random.randint(0, 127)) for _ in range(100000)]' '[x for x in iterable]'
      100 loops, best of 3: 2.09 msec per loop
      

      这实际上并没有改变很多,但有所帮助.

      This hasn't actually changed much, but it's helped a little.

      因此,请删除理解.开销不是问题的一部分:

      So remove the comprehension. It's overhead that's not part of the question:

      >>> python3 -m timeit -s 'import random; iterable = "".join(chr(random.randint(0, 127)) for _ in range(100000))' 'for x in iterable: pass'
      1000 loops, best of 3: 1.71 msec per loop
      
      >>> python3 -m timeit -s 'import random; iterable =        [chr(random.randint(0, 127)) for _ in range(100000)]' 'for x in iterable: pass'
      1000 loops, best of 3: 1.36 msec per loop
      

      >>> python2 -m timeit -s 'import random; iterable = "".join(chr(random.randint(0, 127)) for _ in range(100000))' 'for x in iterable: pass'
      1000 loops, best of 3: 1.27 msec per loop
      
      >>> python2 -m timeit -s 'import random; iterable =        [chr(random.randint(0, 127)) for _ in range(100000)]' 'for x in iterable: pass'
      1000 loops, best of 3: 935 usec per loop
      

      更像是它!通过使用deque进行迭代,我们可以获得更快的速度.基本上是一样的,但是更快:

      That's more like it! We can get slightly faster still by using deque to iterate. It's basically the same, but it's faster:

      >>> python3 -m timeit -s 'import random; from collections import deque; iterable = "".join(chr(random.randint(0, 127)) for _ in range(100000))' 'deque(iterable, maxlen=0)'
      1000 loops, best of 3: 777 usec per loop
      
      >>> python3 -m timeit -s 'import random; from collections import deque; iterable =        [chr(random.randint(0, 127)) for _ in range(100000)]' 'deque(iterable, maxlen=0)'
      1000 loops, best of 3: 405 usec per loop
      

      >>> python2 -m timeit -s 'import random; from collections import deque; iterable = "".join(chr(random.randint(0, 127)) for _ in range(100000))' 'deque(iterable, maxlen=0)'
      1000 loops, best of 3: 805 usec per loop
      
      >>> python2 -m timeit -s 'import random; from collections import deque; iterable =        [chr(random.randint(0, 127)) for _ in range(100000)]' 'deque(iterable, maxlen=0)'
      1000 loops, best of 3: 438 usec per loop
      

      给我印象最深的是Unicode在字节串方面具有竞争力.我们可以通过同时尝试bytesunicode来明确检查此问题:

      What impresses me is that Unicode is competitive with bytestrings. We can check this explicitly by trying bytes and unicode in both:

      • bytes

      >>> python3 -m timeit -s 'import random; from collections import deque; iterable = b"".join(chr(random.randint(0, 127)).encode("ascii") for _ in range(100000))' 'deque(iterable, maxlen=0)'                                                                    :(
      1000 loops, best of 3: 571 usec per loop
      
      >>> python3 -m timeit -s 'import random; from collections import deque; iterable =         [chr(random.randint(0, 127)).encode("ascii") for _ in range(100000)]' 'deque(iterable, maxlen=0)'
      1000 loops, best of 3: 394 usec per loop
      

      >>> python2 -m timeit -s 'import random; from collections import deque; iterable = b"".join(chr(random.randint(0, 127))                 for _ in range(100000))' 'deque(iterable, maxlen=0)'
      1000 loops, best of 3: 757 usec per loop
      
      >>> python2 -m timeit -s 'import random; from collections import deque; iterable =         [chr(random.randint(0, 127))                 for _ in range(100000)]' 'deque(iterable, maxlen=0)'
      1000 loops, best of 3: 438 usec per loop
      

      在这里您看到Python 3实际上比Python 2更快.

      Here you see Python 3 actually faster than Python 2.

      unicode

      >>> python3 -m timeit -s 'import random; from collections import deque; iterable = u"".join(   chr(random.randint(0, 127)) for _ in range(100000))' 'deque(iterable, maxlen=0)'
      1000 loops, best of 3: 800 usec per loop
      
      >>> python3 -m timeit -s 'import random; from collections import deque; iterable =         [   chr(random.randint(0, 127)) for _ in range(100000)]' 'deque(iterable, maxlen=0)'
      1000 loops, best of 3: 394 usec per loop
      

      >>> python2 -m timeit -s 'import random; from collections import deque; iterable = u"".join(unichr(random.randint(0, 127)) for _ in range(100000))' 'deque(iterable, maxlen=0)'
      1000 loops, best of 3: 1.07 msec per loop
      
      >>> python2 -m timeit -s 'import random; from collections import deque; iterable =         [unichr(random.randint(0, 127)) for _ in range(100000)]' 'deque(iterable, maxlen=0)'
      1000 loops, best of 3: 469 usec per loop
      

      同样,Python 3速度更快,尽管这是可以预料的(str在Python 3中引起了很多关注.)

      Again, Python 3 is faster, although this is to be expected (str has had a lot of attention in Python 3).

      实际上,这种unicode-bytes差异很小,令人印象深刻.

      In fact, this unicode-bytes difference is very small, which is impressive.

      因此,让我们分析一下这种情况,因为它对我来说既快速又方便:

      So let's analyse this one case, seeing as it's fast and convenient for me:

      >>> python3 -m timeit -s 'import random; from collections import deque; iterable = "".join(chr(random.randint(0, 127)) for _ in range(100000))' 'deque(iterable, maxlen=0)'
      1000 loops, best of 3: 777 usec per loop
      
      >>> python3 -m timeit -s 'import random; from collections import deque; iterable =        [chr(random.randint(0, 127)) for _ in range(100000)]' 'deque(iterable, maxlen=0)'
      1000 loops, best of 3: 405 usec per loop
      

      实际上,我们可以排除蒂姆·彼得(Tim Peter)提出的10次回答!

      >>> foo = iterable[123]
      >>> iterable[36] is foo
      True
      

      这些不是新对象!

      但这值得一提:为 costs 编制索引.区别可能在于索引,因此删除迭代并仅索引:

      These are not new objects!

      But this is worth mentioning: indexing costs. The difference will likely be in the indexing, so remove the iteration and just index:

      >>> python3 -m timeit -s 'import random; iterable = "".join(chr(random.randint(0, 127)) for _ in range(100000))' 'iterable[123]'
      10000000 loops, best of 3: 0.0397 usec per loop
      
      >>> python3 -m timeit -s 'import random; iterable =        [chr(random.randint(0, 127)) for _ in range(100000)]' 'iterable[123]'
      10000000 loops, best of 3: 0.0374 usec per loop
      

      差异似乎很小,但至少 一半的成本是间接费用:

      The difference seems small, but at least half of the cost is overhead:

      >>> python3 -m timeit -s 'import random; iterable =        [chr(random.randint(0, 127)) for _ in range(100000)]' 'iterable; 123'
      100000000 loops, best of 3: 0.0173 usec per loop
      

      所以速度差足以决定归咎于它.我想.

      so the speed difference is sufficient to decide to blame it. I think.

      那为什么索引列表这么快呢?

      So why is indexing a list so much faster?

      好吧,我会在此与您联系,但是我想这取决于对 interned 字符串(或缓存字符,如果是单独的机制)的检查.这将不如最佳速度快.但是我将检查源代码(尽管我对C语言不太满意):).

      Well, I'll come back to you on that, but my guess is that's is down to the check for interned strings (or cached characters if it's a separate mechanism). This will be less fast than optimal. But I'll go check the source (although I'm not comfortable in C...) :).

      所以这是来源:

      static PyObject *
      unicode_getitem(PyObject *self, Py_ssize_t index)
      {
          void *data;
          enum PyUnicode_Kind kind;
          Py_UCS4 ch;
          PyObject *res;
      
          if (!PyUnicode_Check(self) || PyUnicode_READY(self) == -1) {
              PyErr_BadArgument();
              return NULL;
          }
          if (index < 0 || index >= PyUnicode_GET_LENGTH(self)) {
              PyErr_SetString(PyExc_IndexError, "string index out of range");
              return NULL;
          }
          kind = PyUnicode_KIND(self);
          data = PyUnicode_DATA(self);
          ch = PyUnicode_READ(kind, data, index);
          if (ch < 256)
              return get_latin1_char(ch);
      
          res = PyUnicode_New(1, ch);
          if (res == NULL)
              return NULL;
          kind = PyUnicode_KIND(res);
          data = PyUnicode_DATA(res);
          PyUnicode_WRITE(kind, data, 0, ch);
          assert(_PyUnicode_CheckConsistency(res, 1));
          return res;
      }
      

      从顶部开始,我们将进行一些检查.这些很无聊.然后一些分配,这也应该很无聊.第一个有趣的行是

      Walking from the top, we'll have some checks. These are boring. Then some assigns, which should also be boring. The first interesting line is

      ch = PyUnicode_READ(kind, data, index);
      

      但是我们希望它很快,因为我们正在通过索引从连续的C数组读取数据.结果ch将小于256,因此我们将在get_latin1_char(ch)中返回缓存的字符.

      but we'd hope that is fast, as we're reading from a contiguous C array by indexing it. The result, ch, will be less than 256 so we'll return the cached character in get_latin1_char(ch).

      所以我们将运行(删除第一个检查)

      So we'll run (dropping the first checks)

      kind = PyUnicode_KIND(self);
      data = PyUnicode_DATA(self);
      ch = PyUnicode_READ(kind, data, index);
      return get_latin1_char(ch);
      

      哪里

      #define PyUnicode_KIND(op) \
          (assert(PyUnicode_Check(op)), \
           assert(PyUnicode_IS_READY(op)),            \
           ((PyASCIIObject *)(op))->state.kind)
      

      (这很无聊,因为断言在调试时会被忽略[因此我可以检查它们是否很快],而((PyASCIIObject *)(op))->state.kind)是(我认为)是间接调用和C级强制转换);

      (which is boring because asserts get ignored in debug [so I can check that they're fast] and ((PyASCIIObject *)(op))->state.kind) is (I think) an indirection and a C-level cast);

      #define PyUnicode_DATA(op) \
          (assert(PyUnicode_Check(op)), \
           PyUnicode_IS_COMPACT(op) ? _PyUnicode_COMPACT_DATA(op) :   \
           _PyUnicode_NONCOMPACT_DATA(op))
      

      (由于类似的原因,它也很无聊,假设宏(Something_CAPITALIZED)都很快),

      (which is also boring for similar reasons, assuming the macros (Something_CAPITALIZED) are all fast),

      #define PyUnicode_READ(kind, data, index) \
          ((Py_UCS4) \
          ((kind) == PyUnicode_1BYTE_KIND ? \
              ((const Py_UCS1 *)(data))[(index)] : \
              ((kind) == PyUnicode_2BYTE_KIND ? \
                  ((const Py_UCS2 *)(data))[(index)] : \
                  ((const Py_UCS4 *)(data))[(index)] \
              ) \
          ))
      

      (涉及索引,但实际上一点也不慢)和

      (which involves indexes but really isn't slow at all) and

      static PyObject*
      get_latin1_char(unsigned char ch)
      {
          PyObject *unicode = unicode_latin1[ch];
          if (!unicode) {
              unicode = PyUnicode_New(1, ch);
              if (!unicode)
                  return NULL;
              PyUnicode_1BYTE_DATA(unicode)[0] = ch;
              assert(_PyUnicode_CheckConsistency(unicode, 1));
              unicode_latin1[ch] = unicode;
          }
          Py_INCREF(unicode);
          return unicode;
      }
      

      证实了我的怀疑:

      • 已缓存:

      • This is cached:

      PyObject *unicode = unicode_latin1[ch];
      

    • 这应该很快. if (!unicode)没有运行,因此在这种情况下,其字面意义等同于

    • This should be fast. The if (!unicode) is not run, so it's literally equivalent in this case to

      PyObject *unicode = unicode_latin1[ch];
      Py_INCREF(unicode);
      return unicode;
      

    • 老实说,在测试了assert的速度很快之后(通过禁用它们[我认为它可以在C级断言上运行...]),唯一可能缓慢的部分是:

      Honestly, after testing the asserts are fast (by disabling them [I think it works on the C-level asserts...]), the only plausibly-slow parts are:

      PyUnicode_IS_COMPACT(op)
      _PyUnicode_COMPACT_DATA(op)
      _PyUnicode_NONCOMPACT_DATA(op)
      

      其中:

      #define PyUnicode_IS_COMPACT(op) \
          (((PyASCIIObject*)(op))->state.compact)
      

      (像以前一样快速)

      #define _PyUnicode_COMPACT_DATA(op)                     \
          (PyUnicode_IS_ASCII(op) ?                   \
           ((void*)((PyASCIIObject*)(op) + 1)) :              \
           ((void*)((PyCompactUnicodeObject*)(op) + 1)))
      

      (如果宏IS_ASCII是快速的,则为快速),并且

      (fast if the macro IS_ASCII is fast), and

      #define _PyUnicode_NONCOMPACT_DATA(op)                  \
          (assert(((PyUnicodeObject*)(op))->data.any),        \
           ((((PyUnicodeObject *)(op))->data.any)))
      

      (因为它是断言,间接寻址和强制类型转换,因此速度也很快).

      (also fast as it's an assert plus an indirection plus a cast).

      所以我们跌倒了(兔子洞):

      So we're down (the rabbit hole) to:

      PyUnicode_IS_ASCII
      

      这是

      #define PyUnicode_IS_ASCII(op)                   \
          (assert(PyUnicode_Check(op)),                \
           assert(PyUnicode_IS_READY(op)),             \
           ((PyASCIIObject*)op)->state.ascii)
      

      嗯...似乎也很快...

      Hmm... that seems fast too...

      好吧,但让我们将其与PyList_GetItem进行比较. (是的,感谢蒂姆·彼得斯(Tim Peters)为我提供了更多的工作要做:P.)

      Well, OK, but let's compare it to PyList_GetItem. (Yeah, thanks Tim Peters for giving me more work to do :P.)

      PyObject *
      PyList_GetItem(PyObject *op, Py_ssize_t i)
      {
          if (!PyList_Check(op)) {
              PyErr_BadInternalCall();
              return NULL;
          }
          if (i < 0 || i >= Py_SIZE(op)) {
              if (indexerr == NULL) {
                  indexerr = PyUnicode_FromString(
                      "list index out of range");
                  if (indexerr == NULL)
                      return NULL;
              }
              PyErr_SetObject(PyExc_IndexError, indexerr);
              return NULL;
          }
          return ((PyListObject *)op) -> ob_item[i];
      }
      

      我们可以看到,在非错误情况下,这将要运行:

      We can see that on non-error cases this is just going to run:

      PyList_Check(op)
      Py_SIZE(op)
      ((PyListObject *)op) -> ob_item[i]
      

      PyList_Check所在的位置

      #define PyList_Check(op) \
           PyType_FastSubclass(Py_TYPE(op), Py_TPFLAGS_LIST_SUBCLASS)
      

      ( TABS!TABS! ! )( issue21587 )已修复并合并到其中 5分钟.就像...是的.该死.他们让Skeet感到羞耻.

      (TABS! TABS!!!) (issue21587) That got fixed and merged in 5 minutes. Like... yeah. Damn. They put Skeet to shame.

      #define Py_SIZE(ob)             (((PyVarObject*)(ob))->ob_size)
      

      #define PyType_FastSubclass(t,f)  PyType_HasFeature(t,f)
      

      #ifdef Py_LIMITED_API
      #define PyType_HasFeature(t,f)  ((PyType_GetFlags(t) & (f)) != 0)
      #else
      #define PyType_HasFeature(t,f)  (((t)->tp_flags & (f)) != 0)
      #endif
      

      因此,除非Py_LIMITED_API处于打开状态,否则这通常是微不足道的(两个间接调用和几个布尔检查),在这种情况下...

      So this is normally really trivial (two indirections and a couple of boolean checks) unless Py_LIMITED_API is on, in which case... ???

      然后有索引和强制转换(((PyListObject *)op) -> ob_item[i]),我们就完成了.

      Then there's the indexing and a cast (((PyListObject *)op) -> ob_item[i]) and we're done.

      所以肯定有更少检查列表,并且速度的微小差异肯定意味着它可能是相关的.

      So there are definitely fewer checks for lists, and the small speed differences certainly imply that it could be relevant.

      我认为通常来说,Unicode的类型检查和间接性(->)更多.似乎我遗漏了一点,但是什么?

      I think in general, there's just more type-checking and indirection (->) for Unicode. It seems I'm missing a point, but what?

      这篇关于为什么迭代一小串字符串比一串小列表要慢?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆