“轻松”的内存一致性C ++ [英] ”Relaxed” memory consistency C++

查看:101
本文介绍了“轻松”的内存一致性C ++的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

嗨!

我对c ++中的多线程和轻松内存一致性有疑问,但我想它甚至适用于c#,java和许多其他命令式语言。我大学的一位教授在一个名为并行编程的课程中讲了以下两个课程,



课程1:

Hi!
I have a question about multithreading and "Relaxed" memory consistency in c++, but I guess it''s even apply to c#, java and a lot of other imperative languages. A professor at my university said at a lecture in a course called "parallel programming" that the following two programs,

program 1:

x=0           
x=1           
read y  



program2:


program2:

y=0
y=1
read x






$ b并行执行的$ b可能导致读取指令后都看到0。这应该是因为x和/或y的新值1没有从缓存/内存缓冲区写入实际共享内存,因此两个进程都能够读取0.



对我来说,这听起来像是一个很大的问题,因为即使使用了共享内存一致且最新的锁,怎么能保证呢?我想你在某种程度上需要像锁一样,即确保它的值被写入并直接从共享内存中读取。你怎么能实现这个或者你推荐什么方法。



非常感谢解释和/或解决这个问题的方法。

/ WaZoX




executed in parallel could result in both seeing 0 after the read instruction. This should be because the new value 1 of x and/or y hadn''t been written from the cache / memory buffer to the actually shared memory, and therefore both processes would be able to read 0.

For me this sounds like a huge problem because how can one then guaranteeing even if locks are used that the shared memory is consistent and up to date? I guess you somehow would need to do as you do with locks, i.e. make sure it''s value is written and read directly to and from the shared memory. How can you achieve that or what approach do you recommend.

Would be very thankful for an explanation and/or an approach to tackle this problem.
/ WaZoX

推荐答案

你说的话可能完全正确,但我相信这个问题比缓存一致更深入。

我认为这个问题与语言规范和优化编译器的当前技术水平有关。

C ++语言规范没有,与常见的假设相反,指定必须按照写入的顺序执行语句。事实上,根据规范所施加的约束,只要结果是正确的,编译器就可以随意重新排列语句。

编写编译器软件是最困难的挑战之一在计算和编写优化编译器时更是如此。

鉴于此,我所知道的所有优化器都是线性工作的,并且大多数工作范围有限。换句话说,它们在一个函数或一个文件中的函数范围内运行,并且它们在单个执行线程的假设下进行优化。



如果你是提到这可以导致将2段代码编译成相当于例如的机器代码:



What you say may be completely correct but I believe the problem goes deeper than one of cache coherency.
The issue I think is rather to do with the language specification and the current state of the art in optimising compilers.
The C++ language specification does not, contrary to common assumption, specify that statements must be executed in the order they are written. If fact the compiler is free to rearrange statements at will so long as the result is ''correct'' according to the constraints that the specification does impose.
Authoring compiler software is one of the most difficult challenges in computing and authoring an optimising compiler even more so.
Given this, alomost all the optimisers I''m aware of work linearly and most work with a limited scope. In other words they operate within the scope of one function or the functions in one file and they optimise under the assumption of a single thread of execution.

In the case you mention this can lead to the 2 piece of code being compiled to machine code equivalent to for example:

x=0
read y
x=1











and

read x
y=0
y=1





每个代码部分自己<如果它是执行的唯一代码,那么仍然保持正确并产生相同的结果。但是,如果多个执行线程正在通过此代码运行,则结果将高度依赖于执行的最终线性化指令序列。有许多可能的序列,例如





Each code section on its own is still gauranteed to be correct and produce the same result if it is the only piece of code executing. However if multiple threads of execution are running amok through this code then the result will be highly dependent on the final linearized sequence of instructions that is executed. There are many possible such sequences such as

x=0
read y
.
read x
y=0
.
x=1
.
y=1



(代表上下文切换的点数)



虽然我不能马上看到一个会导致x和y都为0的一个,由于存在至少x或y不可确定的序列,因此难以排除它。当你有真正的并行硬件实际可以同时执行读y y = 0 时,这会变得更糟。这给了很多可能的结果,其中很多都和随机一样好。



听起来好像情况是没有希望的,很少或根本没有任何东西可以被瞄准但事实并非如此。通过不在线程之间共享x或y,可以非常简单地使上述代码可靠并且同步无故障。使它成为设计问题而不是语言或编译器问题。

当需要这样的共享时,只能通过使用包括原子操作的硬件工具来保证安全性内存栅栏,管道刷新和总线锁定,以确保硬件级别的一致性。这是缓存可能进入图片但不会出现不可解决问题的地方。

过去,这种低级别的原子操作必须由预先用汇编语言构建的操作系统特定库提供,因为C ++语言规范独立于硬件级别的原子操作的可用性,既不需要它们也不依赖它们。但是,C ++ 11要求标准库的新原子操作部分,有效地需要底层硬件锁定支持才能使用完整语言。


(dots representing context switches)

While I can''t immediately see one that would result in both x and y being 0 it would be hard to rule it out given that there are sequences where at least either x or y is undeterminable. This gets even worse when you have truly parallel hardware that can actually do read y and y=0 at the same time. This give many mre possible outcomes and many more of them are as good as random.

It may sound like the situation is therefore hopeless and little or nothing can ever be gaurenteed but it is not so. The above code can very simply be made reliable and synchronisation trouble free by not sharing x or y between threads. Making it a design issue rather than a language or compiler problem.
When such sharing is necessary then it can only ever be made safe by using hardware facilities for atomic operations which include memory fences, pipeline flushes and bus locks to ensure consistency at the hardware level. This is where caching may enter the picture but does not present an insoluable problem.
In the past such low level atomic operations have had to be provided by operating system specific libraries pre-built in assembly language because the C++ language specification was independent of the availability of atomic operations at the hardware level and neither required them or relied on them. C++11 however mandates a new atomic operations section of the standard library, effectively requiring underlying hardware locking support in order to use the full language.


教授是正确的。



对你的问题的简短回答是,所示的代码序列需要将操作转换为原子操作,除其他外,它将刷新缓存并禁用/解决指令重新排序为描述。
The professor was correct.

The short answer to your question is that the code sequence illustrated needs to turn the operations into atomic operations which, amongst other things, will flush the cache and disable/workaround the instruction reordering as described.


经过一番研究后发现这对现代计算机来说不是问题。使用许多存储器模型中的一些来解决该问题在高速缓存的硬件实现中。但是它可能会影响程序的性能,因为如果核心已经被修改,核心将需要从共享内存而不是缓存中获取数据。如果将不相关的数据放在同一块中,例如,它也可能降低性能。如果同一块中的某个变量被修改,核心甚至不使用它,它仍然需要再次从共享内存加载数据。这对编译器来说是一个问题,我们通常无法做很多事情。这通常适用于单核多核处理器,如果你有更多的处理器,你可能需要小心,这在很大程度上取决于系统。
After some research it turns out this isn''t a problem on modern computers. The problem is solved in the hardware implementation of the cache using some of many memory model. It may however affect the performance of the program since the core will need to fetch the data from the shared memory instead of the cache if it has been modified. It may also decrease performance if unrelated data is placed in the same block, e.g. if some variable in the same block is modified which isn''t even used by the core, it will still need to load the data from the shared memory again. This is an issue for the compiler though and nothing we generally can do much about. This apply normally to single multicore processor, you may need to be careful if you have more processors, it highly depends on the system.


这篇关于“轻松”的内存一致性C ++的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆