为什么size_t和unsigned int比int慢? [英] Why are size_t and unsigned int slower than int?

查看:129
本文介绍了为什么size_t和unsigned int比int慢?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在使用下面的简单交换排序算法在Windows的Visual Studio项目中尝试使用不同的整数类型.处理器是英特尔.该代码在x64版中进行了编译.优化设置为最大速度(/O2)".与编译设置对应的命令行为

I was experimenting with different integer types in Visual Studio project in Windows using a simple exchange sort algorithm below. The processor is Intel. The code was compiled in Release x64. The optimization setting is "Maximize Speed (/O2)". The command line corresponding to the compilation settings is

/permissive- /GS /GL /W3 /Gy /Zc:wchar_t /Zi /Gm- /O2 /sdl /Fd"x64\Release\vc141.pdb" /Zc:inline /fp:precise /D "NDEBUG" /D "_CONSOLE" /D "_UNICODE" /D "UNICODE" /errorReport:prompt /WX- /Zc:forScope /Gd /Oi /MD /Fa"x64\Release\" /EHsc /nologo /Fo"x64\Release\" /Fp"x64\Release\SpeedTestForIntegerTypes.pch" /diagnostics:classic 

代码本身:

#include <ctime>
#include <vector>
#include <iostream>

void sort(int N, int A[], int WorkArray[]) // exchange sort
{
    int i, j, index, val_min;
    for (j = 0; j < N; j++)
    {
        val_min = 500000;
        for (i = j; i < N; i++)
        {
            if (A[i] < val_min)
            {
                val_min = A[i];
                index = i;
            }
        }
        WorkArray[j] = A[j];
        A[j] = val_min;
        A[index] = WorkArray[j];
    }
}

int main()
{
    std::vector<int> A(400000), WorkArray(400000);
    for(size_t k = 0; k < 400000; k++)
        A[k] = 400000 - (k+1);

    clock_t begin = clock();

    sort(400000, &A[0], &WorkArray[0]);

    clock_t end = clock();
    double sortTime = double(end - begin) / CLOCKS_PER_SEC;
    std::cout << "Sort time: " << sortTime << std::endl;
    return 0;
}

WorkArray仅需要在排序之前保存向量. 关键是,这种排序花了我22.3秒才能完成.有趣的是,如果我将数组AWorkArray(在std::vector和函数sort的参数列表中)的类型从int更改为size_t,以及对于val_min ,时间增加到67.4!这慢了三倍!新代码如下:

The WorkArray is only needed to save the vector before sorting. The point is, this sorting took me 22.3 seconds to complete. The interesting part is that if I change type int to size_t for arrays A, WorkArray (both in std::vector and in the argument list of function sort), as well as for val_min, the time increases to 67.4! This is threefold slower! The new code is below:

#include <ctime>
#include <vector>
#include <iostream>

void sort(int N, size_t A[], size_t WorkArray[]) // exchange sort
{
    int i, j, index;
    size_t val_min;
    for (j = 0; j < N; j++)
    {
        val_min = 500000U;
        for (i = j; i < N; i++)
        {
            if (A[i] < val_min)
            {
                val_min = A[i];
                index = i;
            }
        }
        WorkArray[j] = A[j];
        A[j] = val_min;
        A[index] = WorkArray[j];
    }
}

int main()
{
    std::vector<size_t> A(400000), WorkArray(400000);
    for(size_t k = 0; k < 400000; k++)
        A[k] = 400000 - (k+1);

    clock_t begin = clock();

    sort(400000, &A[0], &WorkArray[0]);

    clock_t end = clock();
    double sortTime = double(end - begin) / CLOCKS_PER_SEC;
    std::cout << "Sort time: " << sortTime << std::endl;
    return 0;
}

请注意,对于函数局部变量ijindexN,我仍然保持类型int,因此,i++j++仅有的两个算术运算应采用在两种情况下执行相同的时间.因此,这种放缓与其他原因有关.它与内存对齐问题或寄存器大小有关吗?

Note that I still keep type int for function local variables i, j, index, N, and so the only two arithmetical operations that are i++ and j++ should take the same amount of time to perform in both cases. Therefore, this slowdown has to do with other reasons. Is it related to the memory alignment issue or register sizes or something else?

但是最令人发指的部分是当我将int更改为unsigned int时. unsigned intint占用的字节数相同,为4(sizeof表明).但是unsigned int的运行时间为65.8 s!虽然第一个结果可以接受,但第二个结果却使我完全困惑!为什么运行这种甚至不涉及符号检查的简单算法所需的时间差异如此显着?

But the most outrageous part was when I changed int to unsigned int. Both unsigned int and int occupy the same number of bytes which is 4 (sizeof showed that). But the runtime for unsigned int was 65.8 s! While the first outcome was somewhat ok to accept, the second one totally confuses me! Why is there such a significant difference in time it takes to run such a simple algorithm that does not even involve sign checks?

感谢所有人解决这两个问题.从哪里可以开始阅读更多有关这些硬件级优化特性的信息?我不在乎排序算法本身,它仅用于说明问题.

Thanks to all addressing both of these questions. Where can I start reading more about these hardware-level optimization peculiarities? I don't care about the sorting algorithm itself, it's here for illustration of the problem only.

更新:我再次强调以下事实:在所有三种情况下,我都使用整数作为数组索引.

UPDATE: once again, I stress the fact that I use ints for array indices in all three cases.

推荐答案

检查所有3个变体(intunsignedsize_t)的生成程序集,最大的区别是在int情况下sort函数中的循环被展开并使用SSE指令(一次处理8个int),而在其他2种情况下则不执行.有趣的是,在int情况下调用了sort函数,而在其他两个情况中将其内联到main中(可能是由于循环展开导致函数的大小增加了.)

Inspecting the generated assembly for all 3 variants (int, unsigned, size_t), the big difference is that in the int case the loop in the sort function is unrolled and uses SSE instructions (working on 8 ints at a time), while in the other 2 cases it does neither. Interestingly enough, the sort function is called in the int case, while it is inlined into main in the other two (likely due to the increased size of the function due to the loop unrolling).

我正在使用cl /nologo /W4 /MD /EHsc /Zi /Ox从命令行进行编译,并使用dumpbin通过工具集Microsoft (R) C/C++ Optimizing Compiler Version 19.12.25830.2 for x64进行反汇编.

I'm compiling from the command line using cl /nologo /W4 /MD /EHsc /Zi /Ox, using dumpbin to get the disassembly, with toolset Microsoft (R) C/C++ Optimizing Compiler Version 19.12.25830.2 for x64.

int的执行时间大约为30秒,其他两个时间为100秒.

I get execution times of around 30 seconds for int and 100 seconds for the other two.

这篇关于为什么size_t和unsigned int比int慢?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆