使用2个线程对2个数组进行排序要比对2个数组进行逐一排序花费更多的时间 [英] Sorting 2 arrays using 2 threads takes more time than sorting the 2 arrays one by one

查看:82
本文介绍了使用2个线程对2个数组进行排序要比对2个数组进行逐一排序花费更多的时间的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有2个未排序的数组和这些数组的2个副本.我使用两个不同的线程对两个数组进行排序,然后对其他两个未排序的数组一一进行排序.我以为线程处理会更快,但事实并非如此,那么线程如何花费更多时间呢?

I have 2 unsorted arrays and 2 copies of these arrays. I am using two different threads to sort two arrays, then I am sorting other two unsorted array one by one. What I thought was that the thread process would be faster but it's not, so how does threads take more time?

#include <stdio.h>
#include <stdlib.h>
#include <unistd.h>
#include <time.h>
#include <pthread.h>

struct thread_data
{
    int count;
    unsigned int *arr;
};

struct thread_data thread_data_array[2];

void insertionSort(unsigned int arr[], int n)
{
   int i, key, j;
   for (i = 1; i < n; i++)
   {
       key = arr[i];
       j = i-1;

       while (j >= 0 && arr[j] > key)
       {
           arr[j+1] = arr[j];
           j = j-1;
       }
       arr[j+1] = key;
   }
}

void *sortAndMergeArrays(void *threadarg)
{
    int count;
    unsigned int *arr;
    struct thread_data *my_data;

    my_data = (struct thread_data *) threadarg;
    count = my_data->count;
    arr =  my_data->arr;

    insertionSort(arr, count);

    pthread_exit(NULL);
}

int main(int argc, char *argv[])
{
    int count, i, rc;
    clock_t start, end, total_t;
    pthread_t threads[2];

    //get the loop count. If loop count is not provided take 10000 as default loop count.
    if(argc == 2){
        count = atoi(argv[1]);
    }
    else{
        count = 10000;
    }

    unsigned int arr1[count], arr2[count], copyArr1[count], copyArr2[count];    

    srand(time(0));

    for(i = 0; i<count; i++){
        arr1[i] = rand();
        arr2[i] = rand();

        copyArr1[i] = arr1[i];
        copyArr2[i] = arr2[i];
    }

    start = clock();
    for(int t=0; t<2; t++) {
        thread_data_array[t].count = count;
        if(t==0)
            thread_data_array[t].arr = arr1;
        else
            thread_data_array[t].arr = arr2;    

        rc = pthread_create(&threads[t], NULL, sortAndMergeArrays, (void *) &thread_data_array[t]);
        if (rc) {
                printf("ERROR; return code from pthread_create() is %d\n", rc);
                exit(-1);
            }
    }

    pthread_join(threads[0], NULL);
    pthread_join(threads[1], NULL);
    end = clock();

    total_t = (double)(end - start);
    printf("Total time taken by CPU to sort using threads: %d\n", total_t);


    start = clock();
    insertionSort(copyArr1, count);
    insertionSort(copyArr2, count);
    end = clock();

    total_t = (double)(end - start);
    printf("Total time taken by CPU to sort sequentially: %d\n", total_t);

    pthread_exit(NULL);
}

我正在使用Linux服务器执行代码.首先,我随机填充数组并将其复制到两个单独的数组中.对于前两个数组,我使用pthread创建两个线程,并将两个数组传递给它们,后者使用插入排序对它们进行排序.对于其他两个数组,我只是一个接一个地排序.

I am using Linux server to execute the code. First I am randomly populating the arrays and copying them to two separate arrays. For the first two arrays I am creating two threads using pthread and passing the two arrays to them, which uses insertion sort to sort them. And for the other two arrays I am just sorting one by one.

我希望通过使用线程可以减少执行时间,但实际上会花费更多时间.

I expected that by using threads I would reduce the execution time but actually takes more time.

推荐答案

诊断

您实际上获得相同时间的原因是-从线程代码获得的时间比从顺序代码获得的时间稍多-原因是

Diagnosis

The reason you get practically the same time — and slightly more time from the threaded code than from the sequential code — is that clock() measures CPU time, and the two ways of sorting take almost the same amount of CPU time because they're doing the same job (and the threading number is probably slightly bigger because of the time to setup and tear down threads).

clock()函数应将实现的最佳近似返回到自实现定义的仅与过程调用相关的时代开始以来,该过程所使用的处理器时间.

The clock() function shall return the implementation's best approximation to the processor time used by the process since the beginning of an implementation-defined era related only to the process invocation.

BSD(macOS)手册页:

BSD (macOS) man page:

clock()函数确定自调用进程调用以来使用的处理器时间量,以秒为单位CLOCKS_PER_SEC.

The clock() function determines the amount of processor time used since the invocation of the calling process, measured in CLOCKS_PER_SECs of a second.

对两个数组进行排序所花费的CPU时间基本上是相同的.区别在于线程的开销(或多或少).

The amount of CPU time it takes to sort the two arrays is basically the same; the difference is the overhead of threading (more or less).

我有一组可以使用 clock_gettime() 代替(在timer.c和timer.h中的代码" rel ="nofollow noreferrer"> GitHub ).这些指标衡量挂钟时间-经过的时间,而不是CPU时间.

I have a set of functions that can use clock_gettime() instead (code in timer.c and timer.h at GitHub). These measure wall clock time — elapsed time, not CPU time.

这是代码的一个经过微调的版本—实质性的更改是将排序函数中key的类型从int更改为unsigned int以匹配数组中的数据,并修复了转换规范%d%ld以匹配由GCC标识为clock_t的类型.我对参数处理和计时消息进行了微调,以使它们的长度一致,并添加了经过的时间测量代码:

Here's a mildly tweaked version of your code — the substantive changes were changing the type of key in the sort function from int to unsigned int to match the data in the array, and to fix the conversion specification of %d to %ld to match the type identified by GCC as clock_t. I mildly tweaked the argument handling, and the timing messages so that they're consistent in length, and added the elapsed time measurement code:

#include <stdio.h>
#include <stdlib.h>
#include <unistd.h>
#include <time.h>
#include <pthread.h>
#include "timer.h"

struct thread_data
{
    int count;
    unsigned int *arr;
};

struct thread_data thread_data_array[2];

static
void insertionSort(unsigned int arr[], int n)
{
    for (int i = 1; i < n; i++)
    {
        unsigned int key = arr[i];
        int j = i - 1;

        while (j >= 0 && arr[j] > key)
        {
            arr[j + 1] = arr[j];
            j = j - 1;
        }
        arr[j + 1] = key;
    }
}

static
void *sortAndMergeArrays(void *threadarg)
{
    int count;
    unsigned int *arr;
    struct thread_data *my_data;

    my_data = (struct thread_data *)threadarg;
    count = my_data->count;
    arr =  my_data->arr;

    insertionSort(arr, count);

    pthread_exit(NULL);
}

int main(int argc, char *argv[])
{
    int count = 10000;
    int i, rc;
    clock_t start, end, total_t;
    pthread_t threads[2];

    // get the loop count. If loop count is not provided take 10000 as default loop count.
    if (argc == 2)
        count = atoi(argv[1]);

    unsigned int arr1[count], arr2[count], copyArr1[count], copyArr2[count];

    srand(time(0));

    for (i = 0; i < count; i++)
    {
        arr1[i] = rand();
        arr2[i] = rand();

        copyArr1[i] = arr1[i];
        copyArr2[i] = arr2[i];
    }

    Clock clk;
    clk_init(&clk);

    start = clock();
    clk_start(&clk);
    for (int t = 0; t < 2; t++)
    {
        thread_data_array[t].count = count;
        if (t == 0)
            thread_data_array[t].arr = arr1;
        else
            thread_data_array[t].arr = arr2;

        rc = pthread_create(&threads[t], NULL, sortAndMergeArrays, (void *)&thread_data_array[t]);
        if (rc)
        {
            printf("ERROR; return code from pthread_create() is %d\n", rc);
            exit(-1);
        }
    }

    pthread_join(threads[0], NULL);
    pthread_join(threads[1], NULL);
    clk_stop(&clk);
    end = clock();

    char buffer[32];
    printf("Elapsed using threads:  %s s\n", clk_elapsed_us(&clk, buffer, sizeof(buffer)));

    total_t = (double)(end - start);
    printf("CPU time using threads: %ld\n", total_t);

    start = clock();
    clk_start(&clk);
    insertionSort(copyArr1, count);
    insertionSort(copyArr2, count);
    clk_stop(&clk);
    end = clock();

    printf("Elapsed sequentially:   %s s\n", clk_elapsed_us(&clk, buffer, sizeof(buffer)));
    total_t = (double)(end - start);
    printf("CPU time sequentially:  %ld\n", total_t);

    return 0;
}

结果

示例运行(程序inssortthread23)-在具有16 GiB RAM和2.7 GHz Intel Core i7 CPU的MacBook Pro(15"2016)上运行,运行macOS High Sierra 10.13,并使用GCC 7.2.0进行编译. 我有例行的后台程序在运行-例如浏览器未被积极使用,没有音乐或视频播放,没有正在进行的下载等.(这些对于基准测试很重要.)

Results

Example runs (program inssortthread23) — run on a MacBook Pro (15" 2016) with 16 GiB RAM and 2.7 GHz Intel Core i7 CPU, running macOS High Sierra 10.13, using GCC 7.2.0 for compilation. I had routine background programs running — e.g. browser not being actively used, no music or videos playing, no downloads in progress etc. (These things matter for benchmarking.)

$ inssortthread23 100000
Elapsed using threads:  1.060299 s
CPU time using threads: 2099441
Elapsed sequentially:   2.146059 s
CPU time sequentially:  2138465
$ inssortthread23 200000
Elapsed using threads:  4.332935 s
CPU time using threads: 8616953
Elapsed sequentially:   8.496348 s
CPU time sequentially:  8469327
$ inssortthread23 300000
Elapsed using threads:  9.984021 s
CPU time using threads: 19880539
Elapsed sequentially:   20.000900 s
CPU time sequentially:  19959341
$

结论

在这里,您可以清楚地看到:

Conclusions

Here, you can clearly see that:

  1. 非线程代码的运行时间大约是非线程代码的两倍.
  2. 线程和非线程代码的CPU时间几乎相同.
  3. 总时间是排序行数的二次方.

所有这些都非常符合预期-一旦您意识到clock()衡量的是CPU时间,而不是经过的时间.

All of which are very much in line with expectations — once you realize that clock() is measuring CPU time, not elapsed time.

您还可以看到,在某些时候,我得到的线程CPU时间比顺序排序的CPU时间略短.我对此没有任何解释-尽管效果会持续存在,但我认为它消失在噪音中":

You can also see that I'm getting the threaded CPU time as slightly smaller than the CPU time for sequential sorts, some of the time. I don't have an explanation for that — I deem it 'lost in the noise', though the effect is persistent:

$ inssortthread23 100000
Elapsed using threads:  1.051229 s
CPU time using threads: 2081847
Elapsed sequentially:   2.138538 s
CPU time sequentially:  2132083
$ inssortthread23 100000
Elapsed using threads:  1.053656 s
CPU time using threads: 2089886
Elapsed sequentially:   2.128908 s
CPU time sequentially:  2122983
$ inssortthread23 100000
Elapsed using threads:  1.058283 s
CPU time using threads: 2093644
Elapsed sequentially:   2.126402 s
CPU time sequentially:  2120625
$

$ inssortthread23 200000
Elapsed using threads:  4.259660 s
CPU time using threads: 8479978
Elapsed sequentially:   8.872929 s
CPU time sequentially:  8843207
$ inssortthread23 200000
Elapsed using threads:  4.463954 s
CPU time using threads: 8883267
Elapsed sequentially:   8.603401 s
CPU time sequentially:  8580240
$ inssortthread23 200000
Elapsed using threads:  4.227154 s
CPU time using threads: 8411582
Elapsed sequentially:   8.816412 s
CPU time sequentially:  8797965
$

这篇关于使用2个线程对2个数组进行排序要比对2个数组进行逐一排序花费更多的时间的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆