同时可变地访问保证不相交的大向量的任意索引 [英] Simultaneous mutable access to arbitrary indices of a large vector that are guaranteed to be disjoint
问题描述
我有一种情况,其中多个线程必须更新存储在共享向量中的对象.但是,向量非常大,要更新的元素数量也相对较小.
I have a case where multiple threads must update objects stored in a shared vector. However, the vector is very large, and the number of elements to update is relatively small.
在最小的示例中,可以通过包含要更新元素索引的(哈希)集来标识要更新的元素集.因此,代码如下所示:
In a minimal example, the set of elements to update can be identified by a (hash-)set containing the indices of elements to update. The code could hence look as follows:
let mut big_vector_of_elements = generate_data_vector();
while has_things_to_do() {
let indices_to_update = compute_indices();
indices_to_update.par_iter() // Rayon parallel iteration
.map(|index| big_vector_of_elements[index].mutate())
.collect()?;
}
这显然在Rust中是不允许的:big_vector_of_elements
不能同时在多个线程中可变地借用.但是,将每个元素包装在例如Mutex
锁中似乎是不必要的:如果没有明确的同步,这种特定情况将是安全的.由于索引来自一组,因此可以保证它们是不同的. par_iter
中没有两次迭代接触向量的相同元素.
This is obviously disallowed in Rust: big_vector_of_elements
cannot be borrowed mutably in multiple threads at the same time. However, wrapping each element in, e.g., a Mutex
lock seems unnecessary: this specific case would be safe without explicit synchronization. Since the indices come from a set, they are guaranteed to be distinct. No two iterations in the par_iter
touch the same element of the vector.
编写一种并行修改向量中元素的程序的最佳方法是什么,在这种情况下,同步已经通过选择索引来解决,但编译器却不理解后者?
What would be the best way of writing a program that mutates elements in a vector in parallel, where the synchronization is already taken care of by the selection of indices, but where the compiler does not understand the latter?
一种接近最佳的解决方案是将所有元素都包装在big_vector_of_elements
中某个假设的UncontendedMutex
锁中,这将是Mutex
的变体,在没有竞争的情况下这快得可笑,并且可能花费很长的时间当发生争用(甚至发生恐慌)时.理想情况下,对于任何T
,UncontendedMutex<T>
的大小和对齐方式也应与T
相同.
A near-optimal solution would be to wrap all elements in big_vector_of_elements
in some hypothetical UncontendedMutex
lock, which would be a variant of Mutex
which is ridiculously fast in the uncontended case, and which may take arbitrarily long when contention occurs (or even panics). Ideally, an UncontendedMutex<T>
should also be of the same size and alignment as T
, for any T
.
可以使用使用人造丝并行迭代器",使用chunks_mut
"或使用split_at_mut
"回答多个问题:
Multiple questions can be answered with "use Rayon's parallel iterator", "use chunks_mut
", or "use split_at_mut
":
- 如何并行运行分区数组上的计算线程?
- 在并行:如何安全地进行操作,或不使用不稳定的功能?
- 如何通过不相交的方法从向量到不同线程的切片?
- 可以将不同的线程写入不同的部分相同的Vec?
- 如何允许每个CPU内核可变地访问Vec的一部分?
- How do I run parallel threads of computation on a partitioned array?
- Processing vec in parallel: how to do safely, or without using unstable features?
- How do I pass disjoint slices from a vector to different threads?
- Can different threads write to different sections of the same Vec?
- How to give each CPU core mutable access to a portion of a Vec?
这些答案在这里似乎无关紧要,因为这些解决方案意味着迭代整个big_vector_of_elements
,然后针对每个元素找出是否需要更改任何内容.从本质上讲,这意味着这样的解决方案如下所示:
These answers do not seem relevant here, since those solutions imply iterating over the entire big_vector_of_elements
, and then for each element figuring out whether anything needs to be changed. Essentially, this means that such a solution would look as follows:
let mut big_vector_of_elements = generate_data_vector();
while has_things_to_do() {
let indices_to_update = compute_indices();
for (index, mut element) in big_vector_of_elements.par_iter().enumerate() {
if indices_to_update.contains(index) {
element.mutate()?;
}
}
}
此解决方案花费的时间与big_vector_of_elements
的大小成比例,而第一个解决方案仅在与indices_to_update
的大小成比例的多个元素上循环.
This solution takes time proportionate to the size of big_vector_of_elements
, whereas the first solution loops only over a number of elements proportionate to the size of indices_to_update
.
推荐答案
When the compiler can't enforce that mutable references to a slice elements aren't exclusive, Cell
is pretty nice.
您可以使用 Cell::as_slice_of_cells
.所有这些都是零成本的:它只是指导类型系统的地方.
You can transform a &mut [T]
into a &Cell<[T]>
using Cell::from_mut
, and then a &Cell<[T]>
into a &[Cell<T>]
using Cell::as_slice_of_cells
. All of this is zero-cost: It's just there to guide the type-system.
&[Cell<T>]
几乎类似于&[&mut T]
,但是您可以对元素进行的操作仅限于读取或替换,您无法获得对元素本身的引用(无论是否可变).这保证了一切都是安全的,没有任何动态费用.
A &[Cell<T>]
is almost like a &[&mut T]
, but what you can do with the elements is limited to read or replace, you can't get a reference, mutable or not, to the elements themselves. This guarantees that everything is safe, at no dynamic cost.
fn main() {
use std::cell::Cell;
let slice: &mut [i32] = &mut [1, 2, 3];
let cell_slice: &Cell<[i32]> = Cell::from_mut(slice);
let slice_cell: &[Cell<i32>] = cell_slice.as_slice_of_cells();
let two = &slice_cell[1];
let another_two = &slice_cell[1];
println!("This is 2: {:?}", two);
println!("This is also 2: {:?}", another_two);
two.set(42);
println!("This is now 42!: {:?}", another_two);
}
这篇关于同时可变地访问保证不相交的大向量的任意索引的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!