有许多向量排序在一起 [英] Have many vectors sorted together

查看:146
本文介绍了有许多向量排序在一起的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有三个大小相同的向量(〜1百万个项目):

I have three vectors of the same size (~ 1 million items):

std::vector<wstring> name;
std::vector<int> x;
std::vector<int> y;

可以看作是三个列".

如何对向量name的A-> Z进行排序:

How to sort A->Z the vector name:

std::sort(name.begin(), name.end())

,但是向量xy是否已相应排序?

but having the vectors x and y sorted accordingly?

示例:

name  x  y                 name  x  y
BCD   7  9                 ABC   4  3
ZYX   1  4        =>       BCD   7  9
ABC   4  3                 ZYX   1  4


使用std::vector的好处是,通过仅保留一个索引列表,我可以轻松地选择/过滤大型vector中的一些项目(例如:让我们保留项目12、1872、2834 ,1831年). 我曾考虑过使用std::map,但是我担心这样做不那么有效:如何在地图中保留元素列表?


The good thing about using a std::vector, is that I can easily select/filter a few items in the big vector by taking just a list of index to keep (example: let's keep items 12, 1872, 2834, 1831). I thought about using a std::map but I fear it won't be as efficient for this: how to keep a list of elements to keep in a map?

推荐答案

有两种可能的方法.最简单的方法是将namexy包装在结构中:

There are a couple possible ways to do this. The easiest way is to wrap name, x, and y in a struct:

struct Person {
    std::wstring name;
    int x;
    int y;
};

然后您可以得到一个std::vector<Person> people并将其排序(假设C ++ 14)

Then you can have a std::vector<Person> people and sorting it would be (assuming C++14)

std::sort(people.begin(), people.end(),
    [](auto const& lhs, auto const& rhs) { return lhs.name < rhs.name; });


但是,如果您知道由于缓存中包含的元素较少而将导致性能问题(也就是说,您经常仅对xy进行迭代,并且您处于非常受限的环境中,例如作为高性能游戏),我建议仅对一个向量进行排序.除非您知道自己在做什么,否则就需要对这两个选项进行基准测试.


However, if you know that this will cause performance problems due to fewer elements fitting in the cache (that is, you'd frequently iterate over only x or only y and you are in a very constrained environment such as high performance gaming), I'd suggest only sorting one vector. Unless you know what you're doing, you'd need to benchmark both options.

基本上,有一个矢量可以跟踪顺序:

Basically, have a vector that keeps track of the ordering:

std::vector<std::wstring> name;
std::vector<int> x;
std::vector<int> y

std::vector<std::size_t> ordering(name.size());
std::iota(ordering.begin(), ordering.end(), 0);

std::sort(ordering.begin(), ordering.end(),
    [&](auto const& lhs, auto const& rhs) {
        return name[lhs] < name[rhs];
    });

然后,您可以简单地遍历ordering以新的顺序遍历每个并行向量.

Then you can simply iterate over ordering to go through each parallel vector in the new order.

额外的间接级别可能会降低其效率.例如,CPU可能认为存在数据依赖性,而没有数据依赖性.此外,我们在ordering中跟踪的额外数据可以轻松地在缓存中占用足够的空间,以抵消分离namexy的好处.您需要确定目标体系结构和配置文件的规范才能确定.

It's possible that the extra level of indirection will make it less efficient. For example, the CPU might think that there's a data dependency where there is none. Furthermore, the extra data we are keeping track of in ordering could easily take enough room in the cache to counteract the benefit of separating name, x, and y; you'd need to know the specifications of your target architecture and profile to be sure.

如果您想以新的顺序对其进行迭代,则希望使用此ordering向量对其他向量进行排序,因为对元素的访问将变得随机.这样做会抵消将向量分开的好处(除非向量足够小,无论如何都不能放入缓存中).

If you would want to keep iterating over them in this new order, you would want to use this ordering vector to sort the other vectors, because the access to the elements would become random. That would counteract the benefit of keeping the vectors separate (unless the vectors are small enough to fit in the cache anyway).

最简单的方法是创建一个新向量:

The easiest way to do that would be to create a new vector:

std::vector<std::wstring> newNames;
newNames.reserve(name.size());

for (auto i : ordering) {
    newNames.push_back(name[i]);
}

如果在初始化过程中发生排序,则可能要像这样重构向量.

Reconstructing the vectors like this is probably what you want to do if the sorting happens during initialization.

这篇关于有许多向量排序在一起的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆