为什么有些浮子<整数比较慢四倍？ [英] Why are some float < integer comparisons four times slower than others?

查看：70 发布时间：2020/10/11 0:54:08 python performance floating-point cpython python-internals

本文介绍了为什么有些浮子<整数比较慢四倍？的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

将浮点数与整数进行比较时，某些值对的计算时间要比其他类似数量级的值要长得多。

例如：

 >>导入时间
>> timeit.timeit（ 562949953420000.7< 562949953421000）＃运行一百万次
 0.5387085462592742

但是，如果将float或integer变小或变大一定数量，则比较将运行得更快：

 >> ;> timeit.timeit（ 562949953420000.7< 562949953422000）＃整数增加了1000 
 0.1481498428446173 
> timeit.timeit（ 562949953423001.8< 562949953421000）＃浮点数增加了3001.1 
 0.1459577925548956

更改比较运算符（例如，使用 == 或> ）不会以任何明显的方式影响时间。

这与幅度不是唯一的，因为选择较大或较小的值可以更快地进行比较，因此我怀疑这是不幸的位排列的方式。

很明显，比较这些值对于大多数用例而言已经足够快了。我只是对为什么Python似乎在某些对值上比在其他值上更费力而感到好奇。

解决方案

浮点对象的Python源代码承认：

比较几乎是一场噩梦

这在将浮点数与整数进行比较时尤其如此，因为与浮点数不同，Python中的整数可以任意大，并且总是精确的。尝试将整数强制转换为浮点数可能会失去精度，并使比较不准确。尝试将浮点数转换为整数也不会起作用，因为任何小数部分都会丢失。

为解决此问题，Python执行了一系列检查，如果其中一项检查成功，则返回结果。它比较两个值的符号，然后比较整数是否太大而不能成为浮点数，然后将浮点的指数与整数长度进行比较。如果所有这些检查均失败，则有必要构造两个新的Python对象进行比较以获得结果。

比较浮点数 v 到整数/长 w ，最坏的情况是：

v 和 w 具有相同的符号（正号或负号），

整数 w 的位数很少，因此可以保存在 size_t 类型（通常为32或64位），

整数 w 至少有49位，

浮点数 v 的指数与 w 中的位数。

这正是我们所需要的问题中的值：

 >>导入数学
>>> math.frexp（562949953420000.7）＃给出浮点数（有效数，指数）对
（0.9999999999976706，49）
> （562949953421000）.bit_length（）
 49

我们看到49都是浮点数和整数位数。这两个数字都是正数，因此满足了以上四个条件。

选择其中一个值较大（或更小）可以更改整数的位数，或指数的值，因此Python能够确定比较结果而无需执行昂贵的最终检查。

这特定于CPython的CPython实现

更详细的比较

The < a href = https://hg.python.org/cpython/file/ea33b61cac74/Objects/floatobject.c#l301 rel = noreferrer> float_richcompare 函数处理两个值 v 和 w 之间的比较。

下面是该功能执行的检查的分步说明。在尝试了解该函数的功能时，Python源代码中的注释实际上非常有帮助，因此我将其放在相关的地方。我还将这些检查总结在答案的底部。

主要思想是映射Python对象 v 和 w 到两个适当的C双打， i 和 j ，然后可以轻松将其进行比较以给出正确的结果。 Python 2和Python 3都使用相同的思想来做到这一点（前者只处理 int 和 long 类型）

要做的第一件事是检查 v 绝对是Python浮点数并将其映射到C double i 。接下来，该函数查看 w 是否也是浮点数，并将其映射到C双精度 j 。这是该功能的最佳情况，因为可以跳过所有其他检查。该函数还会检查 v 是 inf 还是 nan ：

 静态PyObject * 
 float_richcompare（PyObject * v，PyObject * w， int op）
 {
 double i，j; 
 int r = 0; 
 assert（PyFloat_Check（v））; 
 i = PyFloat_AS_DOUBLE（v）; 
 
如果（PyFloat_Check（w））
 j = PyFloat_AS_DOUBLE（w）; 
 
 else if（！Py_IS_FINITE（i））{
 if（PyLong_Check（w））
 j = 0.0; 
 else 
 goto未实现； 
}

现在我们知道如果 w 没有通过这些检查，它不是Python浮点数。现在，该函数将检查它是否为Python整数。如果是这种情况，最简单的测试是提取 v 的符号和 w 的符号（返回 0 如果为零， -1 如果为负， 1 如果为负）。如果符号不同，则这是返回比较结果所需的全部信息：

 否则（PyLong_Check（w））{
 int vsign = i == 0.0吗？ 0：我＜ 0.0？ -1：1; 
 int wsign = _PyLong_Sign（w）; 
 size_t nbits; 
 int指数； 
 
 if（vsign！= wsign）{
 / *幅度无关紧要-仅由符号
 *决定结果。 
 * / 
 i =（双）vsign; 
 j =（双）wsign; 
 goto比较； 
} 
}

如果此检查失败，则 v 和 w 具有相同的符号。

下一个检查将计算整数 w 中的位数。如果它有太多位，那么它就不可能作为浮点数保存，因此其大小必须大于浮点数 v ：

  nbits = _PyLong_NumBits（w）; 
 if（nbits ==（size_t）-1&& PyErr_Occurred（））{
 / *这个长度太大，以致size_t不够大
 *以容纳＃的位。换成给出相同结果的小双倍
 *-w大到
 *其大小必须超过任何
 *有限浮点的大小。 
 * / 
 PyErr_Clear（）; 
 i =（双精度）vsign; 
 assert（wsign！= 0）; 
 j = wsign * 2.0; 
 goto比较； 
}

另一方面，如果整数 w 具有48个或更少的位，可以安全地将C double j 上交并进行比较：

< pre class = lang-C prettyprint-override>

 if（nbits <= 48）{
j = PyLong_AsDouble（w）; 
 / *不可能< = 48位溢出。 * / 
 assert（j！= -1.0 ||！PyErr_Occurred（））; 
 goto比较； 
}

从现在开始，我们知道 w 具有49位或更多位。将 w 视为正整数会很方便，因此可以根据需要更改符号和比较运算符：

  if（nbits == 48）{
 / *将两边都乘以-1；这也会交换
 *比较器。 
 * / 
 i = -i; 
 op = _Py_SwappedOp [op]; 
}

现在，函数将查看浮点数的指数。回想一下，可以将浮点数写为有效位数* 2 ^指数，并且有效位数表示0.5到1之间的数字。

 （void）frexp（i，& exponent）; 
 if（指数< 0 ||（size_t）指数< nbits）{
 i = 1.0; 
 j = 2.0; 
 goto比较； 
}

这将检查两件事。如果指数小于0，则浮点数小于1（因此，其大小小于任何整数）。或者，如果指数小于 w 中的位数，则我们有 v< | w | ，因为有效位数* 2 ^指数小于2 ^nbits。

这两项检查均失败，该函数将查看该指数是否大于 w 中的位数。这表明有效位数* 2 ^指数大于2 ^nbits，因此 v> | w | ：

  if（（size_t）exponent> nbits）{
i = 2.0; 
 j = 1.0; 
 goto比较； 
}

如果此检查未成功，我们知道float的指数 v 与整数 w 中的位数相同。

现在可以比较两个值的唯一方法是从 v 和 w 。这个想法是丢弃 v 的小数部分，将整数部分加倍，然后再加一个。 w 也会加倍，并且可以将这两个新的Python对象进行比较以提供正确的返回值。使用具有较小值的示例， 4.65< 4 将通过比较（2 * 4）+1 == 9< 8 ==（2 * 4）（返回假）。

  {
个双分数； 
 double intpart; 
 PyObject *结果= NULL; 
 PyObject *一个= NULL; 
 PyObject * vv = NULL; 
 PyObject * ww = w; 
 
 //剪切
 
 fracpart = modf（i，& intpart）; //分割i（v映射到的double）
 vv = PyLong_FromDouble（intpart）; 
 
 //剪切
 
 if（fracpart！= 0.0）{
 / *向左移，或将1位移入vv 
 *以表示损失的分数。 
 * / 
 PyObject * temp; 
 
一个= PyLong_FromLong（1）; 
 
 temp = PyNumber_Lshift（ww，one）; //左移将整数
 ww = temp加倍； 
 
 temp = PyNumber_Lshift（vv，one）; 
 vv = temp; 
 
 temp = PyNumber_Or（vv，one）; //一个双精度整数是偶数，因此它加了1 
 vv = temp; 
} 
 //剪断
} 
}

 
 
 下面是比较功能执行的检查的摘要。
 
 
 让 v 为浮点型，将其转换为C double。现在，如果 w 也是浮点数：
 
 
  
  检查是否 w 是 nan 或 inf 。如果是这样，请根据 w 的类型单独处理此特殊情况。
 
 
  如果不是，则进行比较 v 和 w 直接以C的倍数表示。
 
 
 
 
 
 如果 w 是整数：
 
 
  
  提取 v 和 w 的符号。如果它们不同，那么我们就知道 v 和 w 是不同的，那是更大的值。
 
 
  （符号相同。）检查 w 是否有太多位浮点数（大于 size_t ）。如果是这样， w 的幅度大于 v 的幅度。
 
 
  检查 w 是否具有48位或更少的位。如果是这样，可以安全地将其强制转换为C double而不损失其精度，并与 v 进行比较。
 
 
  （  w 具有超过48位。我们现在将 w 视为已更改的正整数）
 
 
  考虑浮点数 v 的指数。如果指数为负，则 v 小于 1 ，因此小于任何正整数。否则，如果指数小于 w 中的位数，则它必须小于 w 。 p> 
 
  如果 v 的指数大于 w <中的位数/ code>，则 v 大于 w 。
 
 
  （指数与 w 中的位数相同。）
 
 
  最终检查。将 v 分成整数和小数部分。将整数部分加倍并加1以补偿小数部分。现在将整数 w 加倍。比较这两个新的整数以获得结果。
 
 
 
 
When comparing floats to integers, some pairs of values take much longer to be evaluated than other values of a similar magnitude.

For example:
>>> import timeit
>>> timeit.timeit("562949953420000.7 < 562949953421000") # run 1 million times
0.5387085462592742
But if the float or integer is made smaller or larger by a certain amount, the comparison runs much more quickly:
>>> timeit.timeit("562949953420000.7 < 562949953422000") # integer increased by 1000
0.1481498428446173
>>> timeit.timeit("562949953423001.8 < 562949953421000") # float increased by 3001.1
0.1459577925548956
Changing the comparison operator (e.g. using == or > instead) does not affect the times in any noticeable way. 

This is not solely related to magnitude because picking larger or smaller values can result in faster comparisons, so I suspect it is down to some unfortunate way the bits line up. 

Clearly, comparing these values is more than fast enough for most use cases. I am simply curious as to why Python seems to struggle more with some pairs of values than with others.
 解决方案 
A comment in the Python source code for float objects acknowledges that:

  Comparison is pretty much a nightmare
This is especially true when comparing a float to an integer, because, unlike floats, integers in Python can be arbitrarily large and are always exact. Trying to cast the integer to a float might lose precision and make the comparison inaccurate. Trying to cast the float to an integer is not going to work either because any fractional part will be lost.

To get around this problem, Python performs a series of checks, returning the result if one of the checks succeeds. It compares the signs of the two values, then whether the integer is "too big" to be a float, then compares the exponent of the float to the length of the integer. If all of these checks fail, it is necessary to construct two new Python objects to compare in order to obtain the result.

When comparing a float v to an integer/long w, the worst case is that:


v and w have the same sign (both positive or both negative),
the integer w has few enough bits that it can be held in the size_t type (typically 32 or 64 bits),
the integer w has at least 49 bits,
the exponent of the float v is the same as the number of bits in w.


And this is exactly what we have for the values in the question:
>>> import math
>>> math.frexp(562949953420000.7) # gives the float's (significand, exponent) pair
(0.9999999999976706, 49)
>>> (562949953421000).bit_length()
49
We see that 49 is both the exponent of the float and the number of bits in the integer. Both numbers are positive and so the four criteria above are met.

Choosing one of the values to be larger (or smaller) can change the number of bits of the integer, or the value of the exponent, and so Python is able to determine the result of the comparison without performing the expensive final check.

This is specific to the CPython implementation of the language.



The comparison in more detail

The float_richcompare function handles the comparison between two values v and w.

Below is a step-by-step description of the checks that the function performs. The comments in the Python source are actually very helpful when trying to understand what the function does, so I've left them in where relevant. I've also summarised these checks in a list at the foot of the answer.

The main idea is to map the Python objects v and w to two appropriate C doubles, i and j, which can then be easily compared to give the correct result. Both Python 2 and Python 3 use the same ideas to do this (the former just handles int and long types separately).

The first thing to do is check that v is definitely a Python float and map it to a C double i. Next the function looks at whether w is also a float and maps it to a C double j. This is the best case scenario for the function as all the other checks can be skipped. The function also checks to see whether v is inf or nan: 
static PyObject*
float_richcompare(PyObject *v, PyObject *w, int op)
{
    double i, j;
    int r = 0;
    assert(PyFloat_Check(v));       
    i = PyFloat_AS_DOUBLE(v);       

    if (PyFloat_Check(w))           
        j = PyFloat_AS_DOUBLE(w);   

    else if (!Py_IS_FINITE(i)) {
        if (PyLong_Check(w))
            j = 0.0;
        else
            goto Unimplemented;
    }
Now we know that if w failed these checks, it is not a Python float. Now the function checks if it's a Python integer. If this is the case, the easiest test is to extract the sign of v and the sign of w (return 0 if zero, -1 if negative, 1 if positive). If the signs are different, this is all the information needed to return the result of the comparison:
    else if (PyLong_Check(w)) {
        int vsign = i == 0.0 ? 0 : i < 0.0 ? -1 : 1;
        int wsign = _PyLong_Sign(w);
        size_t nbits;
        int exponent;

        if (vsign != wsign) {
            /* Magnitudes are irrelevant -- the signs alone
             * determine the outcome.
             */
            i = (double)vsign;
            j = (double)wsign;
            goto Compare;
        }
    }   
If this check failed, then v and w have the same sign. 

The next check counts the number of bits in the integer w. If it has too many bits then it can't possibly be held as a float and so must be larger in magnitude than the float v:
    nbits = _PyLong_NumBits(w);
    if (nbits == (size_t)-1 && PyErr_Occurred()) {
        /* This long is so large that size_t isn't big enough
         * to hold the # of bits.  Replace with little doubles
         * that give the same outcome -- w is so large that
         * its magnitude must exceed the magnitude of any
         * finite float.
         */
        PyErr_Clear();
        i = (double)vsign;
        assert(wsign != 0);
        j = wsign * 2.0;
        goto Compare;
    }
On the other hand, if the integer w has 48 or fewer bits, it can safely turned in a C double j and compared:
    if (nbits <= 48) {
        j = PyLong_AsDouble(w);
        /* It's impossible that <= 48 bits overflowed. */
        assert(j != -1.0 || ! PyErr_Occurred());
        goto Compare;
    }
From this point onwards, we know that w has 49 or more bits. It will be convenient to treat w as a positive integer, so change the sign and the comparison operator as necessary:
    if (nbits <= 48) {
        /* "Multiply both sides" by -1; this also swaps the
         * comparator.
         */
        i = -i;
        op = _Py_SwappedOp[op];
    }
Now the function looks at the exponent of the float. Recall that a float can be written (ignoring sign) as significand * 2^exponent and that the significand represents a number between 0.5 and 1:
    (void) frexp(i, &exponent);
    if (exponent < 0 || (size_t)exponent < nbits) {
        i = 1.0;
        j = 2.0;
        goto Compare;
    }
This checks two things. If the exponent is less than 0 then the float is smaller than 1 (and so smaller in magnitude than any integer). Or, if the exponent is less than the number of bits in w then we have that v < |w| since significand * 2^exponent is less than 2^nbits. 

Failing these two checks, the function looks to see whether the exponent is greater than the number of bit in w. This shows that significand * 2^exponent is greater than 2^nbits and so v > |w|:
    if ((size_t)exponent > nbits) {
        i = 2.0;
        j = 1.0;
        goto Compare;
    }
If this check did not succeed we know that the exponent of the float v is the same as the number of bits in the integer w.

The only way that the two values can be compared now is to construct two new Python integers from v and w. The idea is to discard the fractional part of v, double the integer part, and then add one. w is also doubled and these two new Python objects can be compared to give the correct return value. Using an example with small values, 4.65 < 4 would be determined by the comparison (2*4)+1 == 9 < 8 == (2*4) (returning false).
    {
        double fracpart;
        double intpart;
        PyObject *result = NULL;
        PyObject *one = NULL;
        PyObject *vv = NULL;
        PyObject *ww = w;

        // snip

        fracpart = modf(i, &intpart); // split i (the double that v mapped to)
        vv = PyLong_FromDouble(intpart);

        // snip

        if (fracpart != 0.0) {
            /* Shift left, and or a 1 bit into vv
             * to represent the lost fraction.
             */
            PyObject *temp;

            one = PyLong_FromLong(1);

            temp = PyNumber_Lshift(ww, one); // left-shift doubles an integer
            ww = temp;

            temp = PyNumber_Lshift(vv, one);
            vv = temp;

            temp = PyNumber_Or(vv, one); // a doubled integer is even, so this adds 1
            vv = temp;
        }
        // snip
    }
}
For brevity I've left out the additional error-checking and garbage-tracking Python has to do when it creates these new objects. Needless to say, this adds additional overhead and explains why the values highlighted in the question are significantly slower to compare than others.



Here is a summary of the checks that are performed by the comparison function.

Let v be a float and cast it as a C double. Now, if w is also a float:


Check whether w is nan or inf. If so, handle this special case separately depending on the type of w.
If not, compare v and w directly by their representations as C doubles.


If w is an integer:


Extract the signs of v and w. If they are different then we know v and w are different and which is the greater value.
(The signs are the same.) Check whether w has too many bits to be a float (more than size_t). If so, w has greater magnitude than v.
Check if w has 48 or fewer bits. If so, it can be safely cast to a C double without losing its precision and compared with v.
(w has more than 48 bits. We will now treat w as a positive integer having changed the compare op as appropriate.) 
Consider the exponent of the float v. If the exponent is negative, then v is less than 1 and therefore less than any positive integer. Else, if the exponent is less than the number of bits in w then it must be less than w.
If the exponent of v is greater than the number of bits in w then v is greater than w.
(The exponent is the same as the number of bits in w.)
The final check. Split v into its integer and fractional parts. Double the integer part and add 1 to compensate for the fractional part. Now double the integer w. Compare these two new integers instead to get the result.


                        这篇关于为什么有些浮子&lt;整数比较慢四倍？的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！


                    
                        查看全文

为什么有些浮子<整数比较慢四倍？ [英] Why are some float < integer comparisons four times slower than others?

问题描述

更详细的比较

The comparison in more detail

相关文章

Python最新文章

热门教程

热门工具

登录关闭

为什么有些浮子&lt;整数比较慢四倍？ [英] Why are some float &lt; integer comparisons four times slower than others?

问题描述

更详细的比较

The comparison in more detail

相关文章

Python最新文章

热门教程

热门工具

登录 关闭

为什么有些浮子<整数比较慢四倍？ [英] Why are some float < integer comparisons four times slower than others?

登录关闭