C线程编程-递增共享变量 [英] C Threaded Programming - Incrementing a shared variable

查看:66
本文介绍了C线程编程-递增共享变量的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

伙计们...所以我想重温一下我的C线程,发现的问题是这样的:

给出一个全局变量int x = 0;实现函数void useless(int n),该函数创建n个线程,这些线程在循环中将x增加1,每个线程在x达到100时终止.

我只是没有处理线程的问题,需要一个可靠的例子来奠定我的基础.这必须尽可能使用pthread系统调用.

解决方案

首先,您需要确定要实现的目标,以及不同线程的指令可以通过哪些方式交错来防止这种情况的发生.

C ++x中的增量运算符通常实现为将i的值从内存中读取到寄存器中;增加寄存器;将值写入内存:

    r1 ← xglobal
    r1 ← r1 + 1
    xglobal ← r1

因此x global 的值增加一.

如果您有两个并行的线程,那么它们可以破坏性地交织

    initial xglobal = 99

    r1 ← xglobal     
    compare r1 100 
                    r2 ← xglobal          
                    compare r2 100 
    r1 ← r1 + 1  == 100 
                    r2 ← r2 + 1 == 100
    xglobal ← r1    == 100
                    xglobal ← r2     == 100
    r1 ← xglobal     
    compare r1 100 
    stop
                    r2 ← xglobal     
                    compare r1 100 
                    stop
    final xglobal = 100

因此,尽管两个线程都将x global 的值增加了一个.

(我正在消除缓存的影响,这意味着即使线程1的写入发生在读入之前,线程2中的变量的读取也可以表现得好像发生在线程1中的写入之前一样.挂钟时间.在pthread中获取和释放互斥锁会导致内存屏障,从而迫使所有读取的行为就像它们在获取或释放之后发生一样,而写入的行为就像在获取或释放之前发生一样.)

(以上等同于for ( int r; ( r = x ) < 100; x = r + 1 )而不是for (; x < 100; x = x + 1 ),它可能会额外读取x,因此还有另一点可能会干扰线程)

类似地,一个线程的增量可以破坏另一个线程的增量,从而使线程以i<结束. 100:

    initial xglobal = 98
                    r2 ← xglobal          
    r1 ← xglobal     
    compare r1 100 
    r1 ← r1 + 1  == 99
    xglobal ← r1     
    r1 ← xglobal     
    compare r1 100 
    r1 ← r1 + 1  == 100
    xglobal ← r1     
    r1 ← xglobal     
    compare r1 100 
    stop
                    compare r2 100 
                    r2 ← r2 + 1 == 99
                    xglobal ← r2      
                    ...
    final xglobal = 99

因此,左线程的第二个增量被第一个增量的增量覆盖,它将以全局可见值x<终止. 100.

您可能知道所有这些,并且可能想使用一种机制来防止这种情况发生.

我说可能",因为您的要求不明确-当x达到100时,上述线程终止.要求没有说不在那里.

因此,由于没有线程会在不编写x global ←的情况下终止. 100,实际上可以在没有任何锁定的情况下满足要求,但是x可以递增n * 100倍而不是100倍. (如果限制大于一个字节,那么在某些平台上x的写入可能不是原子的,如果将来自不同线程的字节混合在一起,则可能导致无限循环,但对于100的限制则不会发生)

一种技术是使用 mutex 阻止其他线程运行当一个线程持有互斥锁时.如果在读取x global 之前获得了该锁,并且直到写入x global 时才释放该锁,则该线程的读取和写入将无法交织.

    initial xglobal = 98

    lock (mutex) 
    mutex locked 
                    lock(mutex) 
                    blocked 
    r1 ← xglobal     
    compare r1 100 
    r1 ← r1 + 1  == 99
    xglobal ← r1     

    release ( mutex )
                    mutex locked

                    r2 ← xglobal          
                    compare r2 100 
                    r2 ← r2 + 1 == 100
                    xglobal ← r2      

                    release ( mutex )

    lock (mutex) 
    mutex locked 
    r1 ← xglobal     
    compare r1 100 
    release ( mutex )
    stop
                    ...
    final xglobal = 100

在pthread之外,您可能希望使用平台的比较和交换操作(__sync_val_compare_and_swap Atomic-Builtins.html#Atomic-Builtins"rel =" nofollow noreferrer> gcc ),该方法接受地址的旧值和新值,并自动将地址处的内存设置为新值.等于旧值.这使您可以将逻辑编写为:

for ( int v = 0; v < 100; ) {
    int x_prev = __sync_val_compare_and_swap ( &x, v, v + 1 );

    // if the CAS succeeds, the value of x has been set to is x_prev + 1
    // otherwise, try again from current last value
    if ( x_prev == v ) 
        v = x_prev + 1;
    else
        v = x_prev;
}

如果

    initial xglobal = 98
    initial v1  = 0
    initial v2  = 0

    cmp v1  100
    x_prev1 ← CASV ( xglobal, v1, v1 + 1 ) = 98 ( set fails with x == 98 )

                    cmp v2  100
                    x_prev2 ← CASV ( xglobal, v1, v1 + 1 ) = 98 ( set fails with x == 98 )

    v1 ← x_prev1 = 98 // x_prev != v
                    v2 ← x_prev2 = 98
                    cmp v2  100
                    x_prev2 ← CASV ( xglobal, v1, v1 + 1 ) = 98 ( set succeeds with x == 99 )

                    v2 ← x_prev2 + 1 = 99 // as x_prev == v

    cmp v1  100
    x_prev1 ← CASV ( xglobal, v1, v1 + 1 ) = 99 ( set fails with x == 99 )
    v1 ← x_prev1 = 99 // as x_prev != v

    cmp v1  100
    x_prev1 ← CASV ( xglobal, v1, v1 + 1 ) = 99 ( set succeeds with x == 100)
    v1 ← x_prev1 + 1 = 100 // as x_prev == v

                    cmp v2  100
                    x_prev2 ← CASV ( xglobal, v1, v1 + 1 ) = 100 ( set fails with x == 100 )

                    v2 ← x_prev2  = 100 // as x_prev != v
    cmp v1  100
                    cmp v2  100
    stop
                    stop

在每个循环中,当且仅当x global 的先前值是r 1 + 1时,才会将其自动设置为r 1 + 1的值. r 1 ;如果不是,则将r 1 设置为CASV操作期间测试的x global 的值.这减少了大多数实现上的锁定时间(尽管它仍然需要在CAS操作期间锁定存储器总线,但是只有那些操作才会被序列化.由于在多核上执行CAS成本很高,因此很可能会获胜)这样的简单情况不会更好.)

Hey guys...so I'm trying to brush up on my C threads and a question I've found is this:

Given a global variable int x = 0; implement the function void useless(int n) which creates n threads which in a loop increase x by 1, each thread terminates when x reaches 100.

I just don't have a handle on threads and need a solid example to base my foundation. This must use pthread system calls as much as possible.

解决方案

First you need to decide what it is you're trying to achieve, and what ways possible the instructions of different threads can interlace an prevent that happening.

The incrementation operator in C ++x is usually implemented as read the value of i from memory into an register; increment the register; write the value to memory:

    r1 ← xglobal
    r1 ← r1 + 1
    xglobal ← r1

So the value of xglobal is incremented by one.

If you have two threads in parallel, then they can interlace destructively

    initial xglobal = 99

    r1 ← xglobal     
    compare r1 100 
                    r2 ← xglobal          
                    compare r2 100 
    r1 ← r1 + 1  == 100 
                    r2 ← r2 + 1 == 100
    xglobal ← r1    == 100
                    xglobal ← r2     == 100
    r1 ← xglobal     
    compare r1 100 
    stop
                    r2 ← xglobal     
                    compare r1 100 
                    stop
    final xglobal = 100

So the value of xglobal is incremented by one, despite both threads incrementing it.

( I'm eliding the effects of caching, which mean that the read of a variable in thread 2 can behave as though it took place before a write in thread 1 even if the write by thread 1 happens before the read in by wall clock time. Acquiring and releasing of mutexes in pthreads cause memory barriers which force all reads to behave as though they happened after and writes to behave as if they happened before the acquire or release. )

( the above is equivalent of for ( int r; ( r = x ) < 100; x = r + 1 ) rather than for (; x < 100; x = x + 1 ) which may have an extra read of x and so has another point where threads can interfere )

Similarly, an increment by one thread can destroy the increment of another thread allowing threads to end with i < 100:

    initial xglobal = 98
                    r2 ← xglobal          
    r1 ← xglobal     
    compare r1 100 
    r1 ← r1 + 1  == 99
    xglobal ← r1     
    r1 ← xglobal     
    compare r1 100 
    r1 ← r1 + 1  == 100
    xglobal ← r1     
    r1 ← xglobal     
    compare r1 100 
    stop
                    compare r2 100 
                    r2 ← r2 + 1 == 99
                    xglobal ← r2      
                    ...
    final xglobal = 99

So the second increment by the left thread is overwritten by the increment by the first, and it would terminate with the global visible value of x < 100.

You probably know all that, and may want to use a mechanism to protect against it.

I say 'may' as your requirements aren't clear - the thread above terminated when x reached 100; the requirements don't say it doesn't say there.

So, since no thread will terminate without writing xglobal ← 100, the requirement may in fact be met without any locking, but x may be incremented n*100 times rather than 100 times. ( if the limit was larger than a byte then the writing of x might be non-atomic on some platforms, which could result in an infinite loop if bytes from different threads are mixed together, but for a limit of 100 that won't happen )

One technique is to use a mutex which blocks other threads from running when one thread holds the lock on the mutex. If the lock is acquired before xglobal is read, and not released until xglobal is written, then the reads and writes of the thread cannot interlace.

    initial xglobal = 98

    lock (mutex) 
    mutex locked 
                    lock(mutex) 
                    blocked 
    r1 ← xglobal     
    compare r1 100 
    r1 ← r1 + 1  == 99
    xglobal ← r1     

    release ( mutex )
                    mutex locked

                    r2 ← xglobal          
                    compare r2 100 
                    r2 ← r2 + 1 == 100
                    xglobal ← r2      

                    release ( mutex )

    lock (mutex) 
    mutex locked 
    r1 ← xglobal     
    compare r1 100 
    release ( mutex )
    stop
                    ...
    final xglobal = 100

Outside of pthreads, you might want to use your platform's compare-and-swap operation ( __sync_val_compare_and_swap in gcc ) which takes an address an old value and a new value, and atomically sets the memory at the address to the new value if it was equal to the old value. This lets you write the logic as:

for ( int v = 0; v < 100; ) {
    int x_prev = __sync_val_compare_and_swap ( &x, v, v + 1 );

    // if the CAS succeeds, the value of x has been set to is x_prev + 1
    // otherwise, try again from current last value
    if ( x_prev == v ) 
        v = x_prev + 1;
    else
        v = x_prev;
}

So if

    initial xglobal = 98
    initial v1  = 0
    initial v2  = 0

    cmp v1  100
    x_prev1 ← CASV ( xglobal, v1, v1 + 1 ) = 98 ( set fails with x == 98 )

                    cmp v2  100
                    x_prev2 ← CASV ( xglobal, v1, v1 + 1 ) = 98 ( set fails with x == 98 )

    v1 ← x_prev1 = 98 // x_prev != v
                    v2 ← x_prev2 = 98
                    cmp v2  100
                    x_prev2 ← CASV ( xglobal, v1, v1 + 1 ) = 98 ( set succeeds with x == 99 )

                    v2 ← x_prev2 + 1 = 99 // as x_prev == v

    cmp v1  100
    x_prev1 ← CASV ( xglobal, v1, v1 + 1 ) = 99 ( set fails with x == 99 )
    v1 ← x_prev1 = 99 // as x_prev != v

    cmp v1  100
    x_prev1 ← CASV ( xglobal, v1, v1 + 1 ) = 99 ( set succeeds with x == 100)
    v1 ← x_prev1 + 1 = 100 // as x_prev == v

                    cmp v2  100
                    x_prev2 ← CASV ( xglobal, v1, v1 + 1 ) = 100 ( set fails with x == 100 )

                    v2 ← x_prev2  = 100 // as x_prev != v
    cmp v1  100
                    cmp v2  100
    stop
                    stop

On each loop, xglobal will atomically be set to the value of r1 + 1 if and only if its previous value was r1; if not, r1 will be set to the value of xglobal tested during the CASV operation. This reduces the amount of time locks are held on most implementations ( though it still requires locking the memory bus for the duration of the CAS operation, only those operations will be serialised. As performing the CAS is expensive on multi-cores, it probably won't be much better for such a simple case as this. )

这篇关于C线程编程-递增共享变量的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆