不同的线程可以写入同一个 Vec 的不同部分吗? [英] Can different threads write to different sections of the same Vec?

查看:40
本文介绍了不同的线程可以写入同一个 Vec 的不同部分吗?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有 10 个线程和一个长度为 100 的 Vec.

I have 10 threads and a Vec of length 100.

我可以让线程 0 处理元素 0-9(例如对它们进行排序),而线程 1 处理元素 10-19 等吗?

Can I have thread 0 work on elements 0-9 (sort them, for example), while thread 1 is working on elements 10-19, etc.?

或者我必须为此使用 Vec> 吗?(我宁愿避免这种情况,因为元素在内存中将不再连续)

Or do I have to use a Vec<Vec<>> for this? (Which I would rather avoid, because the elements would no longer be contiguous in memory)

推荐答案

是的,你可以.您询问了可变情况,但我首先要说的是,如果 Vec 是只读的(例如用于减少),您可以安全地向每个线程中所需的特定切片发送不可变引用.您可以通过在循环中简单地使用诸如 &my_vec[idx1..idx2] 之类的东西来做到这一点.

Yes, you can. You asked about the mutable case, but I'll preface by saying that if the Vec is read only (e.g. for a reduction) you can safely send an immutable reference to the specific slice you want in each thread. You can do this by simply using something like &my_vec[idx1..idx2] in a loop.

对于可变情况,这有点棘手,因为借用跟踪器不够复杂,无法允许 Vec 的非重叠借用.但是,有许多方法,特别是 split_at_mut 您可以调用来获取这些子切片.到目前为止,最简单的是 chunks_mut 迭代器记录在 here.(请注意,对于不可变的情况,有一个匹配的 chunks 迭代器,因此在编写任一情况时,您只需要进行细微的更改.

For the mutable case it's a bit trickier since the borrow tracker is not sophisticated enough to allow non-overlapping borrows of a Vec. However, there are a number of methods, notably split_at_mut you can call to get these subslices. By far the easiest is the chunks_mut iterator documented here. (Note that there is a matching chunks iterator for the immutable case so you only need to make minor changes when writing either case).

请注意chunkschunks_mut 函数采用每个块的大小,而不是块的数量.然而,从另一个推导出一个是相当简单的.

Be aware that the chunks and chunks_mut functions take the size of each chunk, not the number of chunks. However, deriving one from the other is fairly straightforward.

但是,对于可变情况,我想提几点警告.如果你平均分割数据,你可能会得到糟糕的性能.原因是 CPU 不处理单个地址,而是处理称为 64 字节长的缓存行的内存块.如果多个线程工作在单个缓存行上,它们必须写入和读取较慢的内存,以确保线程之间的一致性.

I would like to give a few words of caution with the mutable case, however. If you split the data evenly you may get abysmal performance. The reason is that the CPU doesn't work on individual addresses, instead it works on blocks of memory known as cache lines which are 64-bytes long. If multiple threads work on a single cache line, they have to write and read slower memory in order to ensure consistency between threads.

不幸的是,在安全的 Rust 中,没有简单的方法可以确定 Vec 的缓冲区在缓存行的何处开始(因为缓冲区的开始可能已分配在 CPU 缓存行的中间),我知道的大多数检测方法都涉及使用实际指针地址的低字节.处理此问题的最简单方法是在您要使用的每个块之间简单地添加一个 64 字节的无意义数据填充.因此,例如,如果您有一个包含 1000 个 32 位浮点数和 10 个线程的 Vec,您只需添加 16 个浮点数和一个虚拟值(因为 32 位 = 4 字节,16*4=64 = 1 个缓存行)在每 100 个真实"浮点数之间,并在计算过程中忽略假人.

Unfortunately, in safe Rust there's no easy way to determine where on a cache line a Vec's buffer starts (because the buffer's start may have been allocated in the middle of a CPU cache line), most of the methods I know of to detect this involve twiddling with the lower bytes of the actual pointer address. The easiest way to handle this is to simply add a 64-byte pad of nonsense-data between each chunk you want to use. So, for instance, if you have a Vec containing 1000 32-bit floats and 10 threads, you simply add 16 floats with a dummy value (since 32-bits = 4-bytes, 16*4=64=1 cache line) between each 100 of your "real" floats and ignore the dummies during computation.

这被称为虚假共享,我鼓励您查找其他参考资料以了解处理此问题的其他方法.

This is known as false sharing, and I encourage you to look up other references to learn other methods of dealing with this.

请注意,在 x86 架构上保证 64 字节的行大小.如果您正在为 ARM、PowerPC、MIPS 或其他内容进行编译,则此值可能并且会有所不同.

Note that the 64-byte line size is guaranteed on x86 architectures. If you're compiling for ARM, PowerPC, MIPS, or something else this value can and will vary.

这篇关于不同的线程可以写入同一个 Vec 的不同部分吗?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆