为什么 'x' in ('x',) 比 'x' == 'x' 快? [英] Why is 'x' in ('x',) faster than 'x' == 'x'?

查看:54
本文介绍了为什么 'x' in ('x',) 比 'x' == 'x' 快?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

<预><代码>>>>timeit.timeit("'x' in ('x',)")0.04869917374131205>>>timeit.timeit("'x' == 'x'")0.06144205736110564

也适用于具有多个元素的元组,两个版本似乎都呈线性增长:

<预><代码>>>>timeit.timeit("'x' in ('x', 'y')")0.04866674801541748>>>timeit.timeit("'x' == 'x' 或 'x' == 'y'")0.06565782838087131>>>timeit.timeit("'x' in ('y', 'x')")0.08975995576448526>>>timeit.timeit("'x' == 'y' 或 'x' == 'y'")0.12992391047427532

基于此,我认为我应该完全开始使用 in 而不是 ==

解决方案

正如我向 David Wolever 提到的,这不仅仅是表面上看到的;两种方法都分派到 is;你可以通过这样做来证明这一点

min(Timer("x == x", setup="x = 'a' * 1000000").repeat(10, 10000))#>>>0.00045456900261342525min(Timer("x == y", setup="x = 'a' * 1000000; y = 'a' * 1000000").repeat(10, 10000))#>>>0.5256857610074803

第一个只能这么快,因为它通过身份检查.

要找出为什么一个比另一个花费更长的时间,让我们跟踪执行.

它们都从 ceval.c 开始,从 COMPARE_OP 开始,因为这是涉及的字节码

TARGET(COMPARE_OP) {PyObject *right = POP();PyObject *left = TOP();PyObject *res = cmp_outcome(oparg, left, right);Py_DECREF(左);Py_DECREF(右);SET_TOP(res);如果(res == NULL)转到错误;预测(POP_JUMP_IF_FALSE);预测(POP_JUMP_IF_TRUE);派遣();}

这会从堆栈中弹出值(技术上它只弹出一个)

PyObject *right = POP();PyObject *left = TOP();

并运行比较:

PyObject *res = cmp_outcome(oparg, left, right);

cmp_outcome 是这样的:

静态 PyObject *cmp_outcome(int op, PyObject *v, PyObject *w){int res = 0;开关(操作){案例 PyCmp_IS: ...案例 PyCmp_IS_NOT: ...案例 PyCmp_IN:res = PySequence_Contains(w, v);如果 (res <0)返回空;休息;案例 PyCmp_NOT_IN: ...案例 PyCmp_EXC_MATCH: ...默认:返回 PyObject_RichCompare(v, w, op);}v = 资源?Py_True : Py_False;Py_INCREF(v);返回 v;}

这是路径分裂的地方.PyCmp_IN 分支执行

intPySequence_Contains(PyObject *seq, PyObject *ob){Py_ssize_t 结果;PySequenceMethods *sqm = seq->ob_type->tp_as_sequence;if (sqm != NULL && sqm->sq_contains != NULL)返回 (*sqm->sq_contains)(seq, ob);结果 = _PySequence_IterSearch(seq, ob, PY_ITERSEARCH_CONTAINS);返回 Py_SAFE_DOWNCAST(result, Py_ssize_t, int);}

注意元组定义为

static PySequenceMethods tuple_as_sequence = {...(objobjproc)tuplecontains,/* sq_contains */};PyTypeObject PyTuple_Type = {...&tuple_as_sequence,/* tp_as_sequence */...};

所以分支

if (sqm != NULL && sqm->sq_contains != NULL)

将被采用,*sqm->sq_contains,即函数(objobjproc)tuplecontains,将被采用.

这样做

静态整数元组包含(PyTupleObject *a,PyObject *el){py_ssize_t i;int cmp;for (i = 0, cmp = 0 ; cmp == 0 && i < Py_SIZE(a); ++i)cmp = PyObject_RichCompareBool(el, PyTuple_GET_ITEM(a, i),Py_EQ);返回 cmp;}

...等等,这不是PyObject_RichCompareBool 另一个分支采用的吗?不,那是 PyObject_RichCompare.

那个代码路径很短,所以很可能归结为这两个的速度.我们来比较一下.

intPyObject_RichCompareBool(PyObject *v, PyObject *w, int op){PyObject *res;正常;/* 对象相同时的快速结果.保证身份意味着平等.*/如果(v == w){如果(操作 == Py_EQ)返回 1;否则如果(op == Py_NE)返回0;}...}

PyObject_RichCompareBool 中的代码路径几乎立即终止.对于 PyObject_RichCompare,它确实

PyObject *PyObject_RichCompare(PyObject *v, PyObject *w, int op){PyObject *res;断言(Py_LT <= op && op <= Py_GE);if (v == NULL || w == NULL) { ... }如果(Py_EnterRecursiveCall(比较"))返回空;res = do_richcompare(v, w, op);Py_LeaveRecursiveCall();返回资源;}

Py_EnterRecursiveCall/Py_LeaveRecursiveCall 组合没有在前面的路径中使用,但这些是相对快速的宏,在增加和减少一些全局变量后会短路.

do_richcompare 做了:

静态 PyObject *do_richcompare(PyObject *v, PyObject *w, int op){丰富的cmpfunc f;PyObject *res;int check_reverse_op = 0;if (v->ob_type != w->ob_type && ...) { ... }如果((f = v-> ob_type-> tp_richcompare)!= NULL){res = (*f)(v, w, op);如果(res != Py_NotImplemented)返回资源;...}...}

这会进行一些快速检查以调用 v->ob_type->tp_richcompare,即

PyTypeObject PyUnicode_Type = {...PyUnicode_RichCompare,/* tp_richcompare */...};

哪个是

PyObject *PyUnicode_RichCompare(PyObject *left, PyObject *right, int op){整数结果;PyObject *v;if (!PyUnicode_Check(left) || !PyUnicode_Check(right))Py_RETURN_NOTIMPLEMENTED;if (PyUnicode_READY(left) == -1 ||PyUnicode_READY(右)== -1)返回空;如果(左==右){开关(操作){案例 Py_EQ:案例 Py_LE:案例 Py_GE:/* 一个字符串等于它自己 */v = Py_True;休息;案例 Py_NE:案例 Py_LT:案例 Py_GT:v = Py_False;休息;默认:...}}否则如果 (...) { ... }别的 { ...}Py_INCREF(v);返回 v;}

也就是说,这个快捷键在left == right...但只有在做完

之后

 if (!PyUnicode_Check(left) || !PyUnicode_Check(right))if (PyUnicode_READY(left) == -1 ||PyUnicode_READY(右)== -1)

所有的路径看起来像这样(手动递归内联、展开和修剪已知分支)

POP() # 堆栈的东西最佳()                           ##case PyCmp_IN: # Dispatch on operation#sqm != NULL # 调度到内置操作sqm->sq_contains != NULL #*sqm->sq_contains ##cmp == 0 # 循环比较我<Py_SIZE(a) #v == w #op == Py_EQ #++我#cmp == 0 ##资源<0 # 转换为 Python 空间资源?Py_True : Py_False #Py_INCREF(v) ##Py_DECREF(left) # 堆栈的东西Py_DECREF(右)#SET_TOP(res) #资源 == NULL #派遣()                      #

对比

POP() # 堆栈的东西最佳()                           ##默认值:#调度操作#Py_LT <= op # 检查操作op <= Py_GE #v == NULL #w == NULL #Py_EnterRecursiveCall(...) # 递归检查#v->ob_type != w->ob_type # 更多操作检查f = v->ob_type->tp_richcompare # 调度到内置操作f != NULL ##!PyUnicode_Check(left) # ...更多检查!PyUnicode_Check(right)) #PyUnicode_READY(left) == -1 #PyUnicode_READY(right) == -1 #left == right # 最后做比较case Py_EQ: # 立即短路Py_INCREF(v);##res != Py_NotImplemented ##Py_LeaveRecursiveCall() # 递归检查#Py_DECREF(left) # 堆栈的东西Py_DECREF(右)#SET_TOP(res) #资源 == NULL #派遣()                      #

现在,PyUnicode_CheckPyUnicode_READY 非常便宜,因为它们只检查几个字段,但很明显,最上面的一个是较小的代码路径,它函数调用更少,只有一个开关声明,只是有点薄.

TL;DR:

都分派给if (left_pointer == right_pointer);不同之处在于他们为达到目标做了多少工作.in 只是做得更少.

>>> timeit.timeit("'x' in ('x',)")
0.04869917374131205
>>> timeit.timeit("'x' == 'x'")
0.06144205736110564

Also works for tuples with multiple elements, both versions seem to grow linearly:

>>> timeit.timeit("'x' in ('x', 'y')")
0.04866674801541748
>>> timeit.timeit("'x' == 'x' or 'x' == 'y'")
0.06565782838087131
>>> timeit.timeit("'x' in ('y', 'x')")
0.08975995576448526
>>> timeit.timeit("'x' == 'y' or 'x' == 'y'")
0.12992391047427532

Based on this, I think I should totally start using in everywhere instead of ==!

解决方案

As I mentioned to David Wolever, there's more to this than meets the eye; both methods dispatch to is; you can prove this by doing

min(Timer("x == x", setup="x = 'a' * 1000000").repeat(10, 10000))
#>>> 0.00045456900261342525

min(Timer("x == y", setup="x = 'a' * 1000000; y = 'a' * 1000000").repeat(10, 10000))
#>>> 0.5256857610074803

The first can only be so fast because it checks by identity.

To find out why one would take longer than the other, let's trace through execution.

They both start in ceval.c, from COMPARE_OP since that is the bytecode involved

TARGET(COMPARE_OP) {
    PyObject *right = POP();
    PyObject *left = TOP();
    PyObject *res = cmp_outcome(oparg, left, right);
    Py_DECREF(left);
    Py_DECREF(right);
    SET_TOP(res);
    if (res == NULL)
        goto error;
    PREDICT(POP_JUMP_IF_FALSE);
    PREDICT(POP_JUMP_IF_TRUE);
    DISPATCH();
}

This pops the values from the stack (technically it only pops one)

PyObject *right = POP();
PyObject *left = TOP();

and runs the compare:

PyObject *res = cmp_outcome(oparg, left, right);

cmp_outcome is this:

static PyObject *
cmp_outcome(int op, PyObject *v, PyObject *w)
{
    int res = 0;
    switch (op) {
    case PyCmp_IS: ...
    case PyCmp_IS_NOT: ...
    case PyCmp_IN:
        res = PySequence_Contains(w, v);
        if (res < 0)
            return NULL;
        break;
    case PyCmp_NOT_IN: ...
    case PyCmp_EXC_MATCH: ...
    default:
        return PyObject_RichCompare(v, w, op);
    }
    v = res ? Py_True : Py_False;
    Py_INCREF(v);
    return v;
}

This is where the paths split. The PyCmp_IN branch does

int
PySequence_Contains(PyObject *seq, PyObject *ob)
{
    Py_ssize_t result;
    PySequenceMethods *sqm = seq->ob_type->tp_as_sequence;
    if (sqm != NULL && sqm->sq_contains != NULL)
        return (*sqm->sq_contains)(seq, ob);
    result = _PySequence_IterSearch(seq, ob, PY_ITERSEARCH_CONTAINS);
    return Py_SAFE_DOWNCAST(result, Py_ssize_t, int);
}

Note that a tuple is defined as

static PySequenceMethods tuple_as_sequence = {
    ...
    (objobjproc)tuplecontains,                  /* sq_contains */
};

PyTypeObject PyTuple_Type = {
    ...
    &tuple_as_sequence,                         /* tp_as_sequence */
    ...
};

So the branch

if (sqm != NULL && sqm->sq_contains != NULL)

will be taken and *sqm->sq_contains, which is the function (objobjproc)tuplecontains, will be taken.

This does

static int
tuplecontains(PyTupleObject *a, PyObject *el)
{
    Py_ssize_t i;
    int cmp;

    for (i = 0, cmp = 0 ; cmp == 0 && i < Py_SIZE(a); ++i)
        cmp = PyObject_RichCompareBool(el, PyTuple_GET_ITEM(a, i),
                                           Py_EQ);
    return cmp;
}

...Wait, wasn't that PyObject_RichCompareBool what the other branch took? Nope, that was PyObject_RichCompare.

That code path was short so it likely just comes down to the speed of these two. Let's compare.

int
PyObject_RichCompareBool(PyObject *v, PyObject *w, int op)
{
    PyObject *res;
    int ok;

    /* Quick result when objects are the same.
       Guarantees that identity implies equality. */
    if (v == w) {
        if (op == Py_EQ)
            return 1;
        else if (op == Py_NE)
            return 0;
    }

    ...
}

The code path in PyObject_RichCompareBool pretty much immediately terminates. For PyObject_RichCompare, it does

PyObject *
PyObject_RichCompare(PyObject *v, PyObject *w, int op)
{
    PyObject *res;

    assert(Py_LT <= op && op <= Py_GE);
    if (v == NULL || w == NULL) { ... }
    if (Py_EnterRecursiveCall(" in comparison"))
        return NULL;
    res = do_richcompare(v, w, op);
    Py_LeaveRecursiveCall();
    return res;
}

The Py_EnterRecursiveCall/Py_LeaveRecursiveCall combo are not taken in the previous path, but these are relatively quick macros that'll short-circuit after incrementing and decrementing some globals.

do_richcompare does:

static PyObject *
do_richcompare(PyObject *v, PyObject *w, int op)
{
    richcmpfunc f;
    PyObject *res;
    int checked_reverse_op = 0;

    if (v->ob_type != w->ob_type && ...) { ... }
    if ((f = v->ob_type->tp_richcompare) != NULL) {
        res = (*f)(v, w, op);
        if (res != Py_NotImplemented)
            return res;
        ...
    }
    ...
}

This does some quick checks to call v->ob_type->tp_richcompare which is

PyTypeObject PyUnicode_Type = {
    ...
    PyUnicode_RichCompare,      /* tp_richcompare */
    ...
};

which does

PyObject *
PyUnicode_RichCompare(PyObject *left, PyObject *right, int op)
{
    int result;
    PyObject *v;

    if (!PyUnicode_Check(left) || !PyUnicode_Check(right))
        Py_RETURN_NOTIMPLEMENTED;

    if (PyUnicode_READY(left) == -1 ||
        PyUnicode_READY(right) == -1)
        return NULL;

    if (left == right) {
        switch (op) {
        case Py_EQ:
        case Py_LE:
        case Py_GE:
            /* a string is equal to itself */
            v = Py_True;
            break;
        case Py_NE:
        case Py_LT:
        case Py_GT:
            v = Py_False;
            break;
        default:
            ...
        }
    }
    else if (...) { ... }
    else { ...}
    Py_INCREF(v);
    return v;
}

Namely, this shortcuts on left == right... but only after doing

    if (!PyUnicode_Check(left) || !PyUnicode_Check(right))

    if (PyUnicode_READY(left) == -1 ||
        PyUnicode_READY(right) == -1)

All in all the paths then look something like this (manually recursively inlining, unrolling and pruning known branches)

POP()                           # Stack stuff
TOP()                           #
                                #
case PyCmp_IN:                  # Dispatch on operation
                                #
sqm != NULL                     # Dispatch to builtin op
sqm->sq_contains != NULL        #
*sqm->sq_contains               #
                                #
cmp == 0                        # Do comparison in loop
i < Py_SIZE(a)                  #
v == w                          #
op == Py_EQ                     #
++i                             # 
cmp == 0                        #
                                #
res < 0                         # Convert to Python-space
res ? Py_True : Py_False        #
Py_INCREF(v)                    #
                                #
Py_DECREF(left)                 # Stack stuff
Py_DECREF(right)                #
SET_TOP(res)                    #
res == NULL                     #
DISPATCH()                      #

vs

POP()                           # Stack stuff
TOP()                           #
                                #
default:                        # Dispatch on operation
                                #
Py_LT <= op                     # Checking operation
op <= Py_GE                     #
v == NULL                       #
w == NULL                       #
Py_EnterRecursiveCall(...)      # Recursive check
                                #
v->ob_type != w->ob_type        # More operation checks
f = v->ob_type->tp_richcompare  # Dispatch to builtin op
f != NULL                       #
                                #
!PyUnicode_Check(left)          # ...More checks
!PyUnicode_Check(right))        #
PyUnicode_READY(left) == -1     #
PyUnicode_READY(right) == -1    #
left == right                   # Finally, doing comparison
case Py_EQ:                     # Immediately short circuit
Py_INCREF(v);                   #
                                #
res != Py_NotImplemented        #
                                #
Py_LeaveRecursiveCall()         # Recursive check
                                #
Py_DECREF(left)                 # Stack stuff
Py_DECREF(right)                #
SET_TOP(res)                    #
res == NULL                     #
DISPATCH()                      #

Now, PyUnicode_Check and PyUnicode_READY are pretty cheap since they only check a couple of fields, but it should be obvious that the top one is a smaller code path, it has fewer function calls, only one switch statement and is just a bit thinner.

TL;DR:

Both dispatch to if (left_pointer == right_pointer); the difference is just how much work they do to get there. in just does less.

这篇关于为什么 'x' in ('x',) 比 'x' == 'x' 快?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆