为什么string的startwith比in慢? [英] Why is string's startswith slower than in?

查看:170
本文介绍了为什么string的startwith比in慢?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

令人惊讶的是,我发现startswith的速度比in慢:

Surprisingly, I find startswith is slower than in:

In [10]: s="ABCD"*10

In [11]: %timeit s.startswith("XYZ")
1000000 loops, best of 3: 307 ns per loop

In [12]: %timeit "XYZ" in s
10000000 loops, best of 3: 81.7 ns per loop

众所周知,in操作需要搜索整个字符串,而startswith只需要检查前几个字符,因此startswith应该更有效.

As we all know, the in operation needs to search the whole string and startswith just needs to check the first few characters, so startswith should be more efficient.

s足够大时,startswith会更快:

In [13]: s="ABCD"*200

In [14]: %timeit s.startswith("XYZ")
1000000 loops, best of 3: 306 ns per loop

In [15]: %timeit "XYZ" in s
1000000 loops, best of 3: 666 ns per loop

因此,调用startswith似乎有一些开销,当字符串较小时,调用它会变慢.

So it seems that calling startswith has some overhead which makes it slower when the string is small.

然后我试图弄清楚startswith调用的开销是多少.

And than I tried to figure out what's the overhead of the startswith call.

首先,我使用了f变量来减少点运算的成本-如 answer 中所述-在这里我们可以看到startswith仍然较慢:

First, I used an f variable to reduce the cost of the dot operation - as mentioned in this answer - here we can see startswith is still slower:

In [16]: f=s.startswith

In [17]: %timeit f("XYZ")
1000000 loops, best of 3: 270 ns per loop

此外,我测试了空函数调用的成本:

Further, I tested the cost of an empty function call:

In [18]: def func(a): pass

In [19]: %timeit func("XYZ")
10000000 loops, best of 3: 106 ns per loop

不管点操作和函数调用的开销如何,startswith的时间约为(270-106)= 164ns,但是in操作仅花费81.7ns.看来startswith仍有一些开销,那是什么?

Regardless of the cost of the dot operation and function call, the time of startswith is about (270-106)=164ns, but the in operation takes only 81.7ns. It seems there are still some overheads for startswith, what's that?

按照poke和lvc的建议在startswith__contains__之间添加测试结果:

Add the test result between startswith and __contains__ as suggested by poke and lvc:

In [28]: %timeit s.startswith("XYZ")
1000000 loops, best of 3: 314 ns per loop

In [29]: %timeit s.__contains__("XYZ")
1000000 loops, best of 3: 192 ns per loop

推荐答案

如注释中所述,如果使用s.__contains__("XYZ"),则得到的结果与s.startswith("XYZ")更为相似,因为它需要采用相同的路线:在字符串对象上进行成员查找,然后进行函数调用.这通常比较昂贵(当然,您还不必担心).另一方面,当您执行"XYZ" in s时,解析器将解释运算符,并可以简化成员对__contains__的访问(或更确切地说,是其背后的实现,因为__contains__本身只是访问该对象的一种方式).实施).

As already mentioned in the comments, if you use s.__contains__("XYZ") you get a result that is more similar to s.startswith("XYZ") because it needs to take the same route: Member lookup on the string object, followed by a function call. This is usually somewhat expensive (not enough that you should worry about of course). On the other hand, when you do "XYZ" in s, the parser interprets the operator and can short-cut the member access to the __contains__ (or rather the implementation behind it, because __contains__ itself is just one way to access the implementation).

通过查看字节码,您可以对此有所了解:

You can get an idea about this by looking at the bytecode:

>>> dis.dis('"XYZ" in s')
  1           0 LOAD_CONST               0 ('XYZ')
              3 LOAD_NAME                0 (s)
              6 COMPARE_OP               6 (in)
              9 RETURN_VALUE
>>> dis.dis('s.__contains__("XYZ")')
  1           0 LOAD_NAME                0 (s)
              3 LOAD_ATTR                1 (__contains__)
              6 LOAD_CONST               0 ('XYZ')
              9 CALL_FUNCTION            1 (1 positional, 0 keyword pair)
             12 RETURN_VALUE

因此,将s.__contains__("XYZ")s.startswith("XYZ")进行比较会产生更相似的结果,但是对于示例字符串sstartswith仍然会更慢.

So comparing s.__contains__("XYZ") with s.startswith("XYZ") will produce a more similar result, however for your example string s, the startswith will still be slower.

要做到这一点,您可以检查两者的实现.有趣的是包含实现,因为它是静态类型的,只是假设参数是unicode对象本身.因此,这非常有效.

To get to that, you could check the implementation of both. Interesting to see for the contains implementation is that it is statically typed, and just assumes that the argument is a unicode object itself. So this is quite efficient.

startswith实现是一种动态" Python方法这要求实现实际解​​析参数. startswith还支持使用元组作为参数,这会使方法的整个启动过程变慢:(由我缩短,并附上我的评论):

The startswith implementation however is a "dynamic" Python method which requires the implementation to actually parse the arguments. startswith also supports a tuple as an argument, which makes the whole start-up of the method a bit slower: (shortened by me, with my comments):

static PyObject * unicode_startswith(PyObject *self, PyObject *args)
{
    // argument parsing
    PyObject *subobj;
    PyObject *substring;
    Py_ssize_t start = 0;
    Py_ssize_t end = PY_SSIZE_T_MAX;
    int result;
    if (!stringlib_parse_args_finds("startswith", args, &subobj, &start, &end))
        return NULL;

    // tuple handling
    if (PyTuple_Check(subobj)) {}

    // unicode conversion
    substring = PyUnicode_FromObject(subobj);
    if (substring == NULL) {}

    // actual implementation
    result = tailmatch(self, substring, start, end, -1);
    Py_DECREF(substring);
    if (result == -1)
        return NULL;
    return PyBool_FromLong(result);
}

这可能是为什么startswith较慢的原因,对于contains较快的字符串来说,它的速度较慢.

This is likely a big reason why startswith is slower for strings for which a contains is fast because of its simplicity.

这篇关于为什么string的startwith比in慢?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆