在64位机器上,我可以安全地并行操作64位四字的单个字节吗? [英] On a 64 bit machine, can I safely operate on individual bytes of a 64 bit quadword in parallel?

查看:193
本文介绍了在64位机器上,我可以安全地并行操作64位四字的单个字节吗?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在对图像中的行和列进行并行操作。我的图像是8位或16位像素,我在64位机器上。
当我对并行的列进行操作时,两个相邻的列可能共享相同的32位 int 或64位 long 。基本上,我想知道我是否可以安全地并行操作同一个四字的单个字节。

I am doing parallel operations on rows and columns in images. My images are 8 bit or 16 bit pixels and I'm on a 64 bit machine. When I do operations on columns in parallel, two adjacent columns may share the same 32 bit int or 64 bit long. Basically, I want to know whether I can safely operate on individual bytes of the same quadword in parallel.

我写了一个我无法失败的最小测试函数。对于64位 long 中的每个字节,我同时在有限字段 p 中执行连续乘法。我知道通过费马的小定理 a ^(p-1 )= 1 mod p p 为素数时。我为每个8个线程更改了 a p 的值,并执行 k *(p-1) a 的乘法。当线程完成每个字节应该是1.事实上,我的测试用例通过了。每次运行时,我得到以下输出:

I wrote a minimal test function that I have not been able to make fail. For each byte in a 64 bit long, I concurrently perform successive multiplications in a finite field of order p. I know that by Fermat's little theorem a^(p-1) = 1 mod p when p is prime. I vary the values a and p for each of my 8 threads, and I perform k*(p-1) multiplications of a. When the threads finish each byte should be 1. And in fact, my test cases pass. Each time I run, I get the following output:


8

101010101010101

101010101010101

8
101010101010101
101010101010101

我的系统是 Linux 4.13.0-041300-generic x86_64 ,其中 8核Intel (R)Core(TM)i7-7700HQ CPU @ 2.80GHz 。我用 g ++ 7.2.0 -O2 编译并检查了程序集。我添加了INNER LOOP的程序集并对其进行了评论。在我看来,生成的代码是安全的,因为存储只是将低8位写入目标而不是进行一些按位算术并存储到整个字或四字。 g ++ -O3生成了类似的代码。

My system is Linux 4.13.0-041300-generic x86_64 with an 8 core Intel(R) Core(TM) i7-7700HQ CPU @ 2.80GHz. I compiled with g++ 7.2.0 -O2 and examined the assembly. I added the assembly for the "INNER LOOP" and commented it. It seems to me that the code generated is safe because the stores are only writing the lower 8 bits to the destination instead of doing some bitwise arithmetic and storing to the entire word or quadword. g++ -O3 generated similar code.

我想知道这段代码是否总是线程 - 安全,如果没有,在什么条件下不会。也许我是非常偏执,但我觉得我需要一次操作四字,以确保安全。

I want to know if this code is always thread-safe, and if not, in what conditions would it not be. Maybe I am being very paranoid, but I feel that I would need to operate on quadwords at a time in order to be safe.

#include <iostream>
#include <pthread.h>

class FermatLTParams
{
public:
    FermatLTParams(unsigned char *_dst, unsigned int _p, unsigned int _a, unsigned int _k)
        : dst(_dst), p(_p), a(_a), k(_k) {}

    unsigned char *dst;
    unsigned int p, a, k;
};

void *PerformFermatLT(void *_p)
{  
    unsigned int j, i;
    FermatLTParams *p = reinterpret_cast<FermatLTParams *>(_p);
    for(j=0; j < p->k; ++j)
    {    
        //a^(p-1) == 1 mod p

        //...BEGIN INNER LOOP
        for(i=1; i < p->p; ++i)
        {
            p->dst[0] = (unsigned char)(p->dst[0]*p->a % p->p);
        }
        //...END INNER LOOP

        /* gcc 7.2.0 -O2  (INNER LOOP)

        .L4:
            movq    (%rdi), %r8             # r8 = dst
            xorl    %edx, %edx              # edx = 0
            addl    $1, %esi                # ++i
            movzbl  (%r8), %eax             # eax (lower 8 bits) = dst[0]
            imull   12(%rdi), %eax          # eax =  a * eax
            divl    %ecx                    # eax = eax / ecx;   edx = eax % ecx    
            movb    %dl, (%r8)              # dst[0] = edx (lower 8 bits)
            movl    8(%rdi), %ecx           # ecx = p
            cmpl    %esi, %ecx              # if (i < p)
            ja      .L4                     #   goto L4
        */

    }
    return NULL;
}

int main(int argc, const char **argv)
{
    int i;
    unsigned long val = 0x0101010101010101; //a^0 = 1
    unsigned int k = 10000000;
    std::cout << sizeof(val) << std::endl;
    std::cout << std::hex << val << std::endl;
    unsigned char *dst = reinterpret_cast<unsigned char *>(&val);
    pthread_t threads[8];
    FermatLTParams params[8] = 
    { 
        FermatLTParams(dst+0, 11, 5, k),
        FermatLTParams(dst+1, 17, 8, k),
        FermatLTParams(dst+2, 43, 3, k),
        FermatLTParams(dst+3, 31, 4, k),
        FermatLTParams(dst+4, 13, 3, k),
        FermatLTParams(dst+5, 7, 2, k),
        FermatLTParams(dst+6, 11, 10, k),
        FermatLTParams(dst+7, 13, 11, k)
    };

    for(i=0; i < 8; ++i)
    {
        pthread_create(threads+i, NULL, PerformFermatLT, params+i);
    }
    for(i=0; i < 8; ++i)
    {
        pthread_join(threads[i], NULL);
    }

    std::cout << std::hex << val << std::endl;
    return 0;
}


推荐答案

答案是肯定的,你可以通过不同的线程安全地对64位四字的单个字节进行并行操作。

The answer is YES, you can safely operate on individual bytes of a 64-bit quadword in parallel, by different threads.

令人惊讶的是,它可以工作,但如果确实如此,它将是一场灾难不。所有硬件都表现为核心在其自己的核心中写入一个字节,而不仅仅是缓存行是脏的,而是标记了哪些字节。当该高速缓存行(64或128或甚至256字节)最终被写入主存储器时,只有脏字节实际上修改了主存储器。这是必不可少的,因为当两个线程正在处理碰巧占据相同缓存行的独立数据时,它们会混淆对方的结果。

It is amazing that it works, but it would be a disaster if it did not. All hardware acts as if a core writing a byte in its own core marks not just that the cache line is dirty, but which bytes within it. When that cache line (64 or 128 or even 256 bytes) eventually gets written to main memory, only the dirty bytes actually modify the main memory. This is essential, because otherwise when two threads were working on independent data that happened to occupy the same cache line, they would trash each other's results.

这对于性能,因为它的工作方式部分是通过缓存一致性的魔力,当一个线程写入一个字节时,系统中具有相同数据行的所有缓存都会受到影响。如果它们是脏的,则需要写入主内存,然后删除缓存行,或从其他线程捕获更改。有各种不同的实现,但通常很昂贵。

This can be bad for performance, because the way it works is partly through the magic of "cache coherency," where when one thread writes a byte all the caches in the system that have that same line of data are affected. If they're dirty, they need to write to main memory, and then either drop the cache line, or capture the changes from the other thread. There are all kinds of different implementations, but it is generally expensive.

这篇关于在64位机器上,我可以安全地并行操作64位四字的单个字节吗?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆