std :: vector差异 [英] std::vector differences

查看:147
本文介绍了std :: vector差异的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

如何确定2个向量的差异是什么?



我有向量< int> v1 向量< int> v2 ;



我正在寻找的是一个向量< int&仅包含在 v1 v2 中的元素的vDifferences


解决方案

这是完整的和正确答案。在可以使用 set_symmetric_difference 算法之前,必须对源区域必须进行排序:

  using namespace std; //为了简洁,不要在你自己的代码中这样做... 

vector< int> v1;
vector< int> v2;

// ... Populate v1和v2

//对于set_symmetric_difference算法,
//源范围必须排序!
vector< int> sortedV1(v1);
vector< int> sortedV2(v2);

sort(sortedV1.begin(),sortedV.end());
sort(sortedV2.begin(),sortedV2.end());

//现在我们已经排序了范围(即,容器),找到差异
vector< int> vDifferences;

set_symmetric_difference(
sortedV1.begin(),
sortedV1.end(),
sortedV2.begin(),
sortedV2.end
back_inserter(vDifferences));

// ...做差别的事情

注意,排序是一种昂贵的操作(即,对于常见的STL实现, O(n log n))。特别是对于一个或两个容器非常大(即,数百万个整数或更多)的情况,基于算法复杂度,使用散列表的不同算法可能是优选的。下面是此算法的高级描述:



  1. 将每个容器加载到哈希表中。

  2. 如果两个容器大小不同,则对应于较小的一个的散列表将用于步骤3中的遍历。否则,将使用两个散列表中的第一个。

  3. 遍历步骤2中选择的散列表,检查每个项是否存在于两个散列表中。如果是,从它们中删除它。较小的散列表是
    优选用于遍历的原因是因为散列表查找在平均为O(1),而不考虑容器大小。
    因此,遍历的时间是n的线性函数(即, O(n)),其中n是正在遍历的哈希表的大小。

  4. 取得哈希表中剩余项目的并集,并将结果存储在差异
    容器中。


C ++ 11通过标准化 unordered_multiset 容器,为我们提供了一些解决方案。我还使用了 auto 关键字的新用法进行显式初始化,以使基于散列表的以下解决方案更简洁:

  using namespace std; //为简洁起见,不要在自己的代码中这样做... 

// remove_common_items函数模板删除出现在两者中的一些和/或所有
//项的传递给它的多重集。它使用第一个multiset中的
//项作为多重存在测试的标准。
template< typename tVal>
void remove_common_items(unordered_multiset< tVal>& ms1,
unordered_multiset< tVal>& ms2)
{
//通过第一个哈希表
auto cims1 = ms1.cbegin(); cims1!= ms1.cend();)
{
//查找第二个哈希表中的当前项
auto cims2 = ms2.find * cims1);

//是否存在?
if(cims2!= ms2.end())
{
//如果是这样,从两个哈希表中删除它
cims1 = ms1.erase(cims1);
ms2.erase(cims2);
}
else //如果不是
++ cims1; //移动到下一个项目
}
}

int main()
{
vector< int> v1;
vector< int> v2;

// ... Populate v1和v2

//从它们各自的初始容器创建两个哈希表,其中包含值
//
unordered_multiset< int> ms1(v1.begin(),v1.end());
unordered_multiset< int> ms2(v2.begin(),v2.end());

//基于最小的
从两个容器中删除公共项目if(v1.size()< = v2.size)
remove_common_items(ms1,ms2);
else
remove_common_items(ms2,ms1);

//创建剩余项目的并集的向量
向量< int> vDifferences(ms1.begin(),ms1.end());

vDifferences.insert(vDifferences.end(),ms2.begin(),ms2.end());

// ...执行差异操作
}

为了确定哪种解决方案对于特定情况更好,对这两种算法进行分析将是一个聪明的行动方案。尽管基于哈希表的解决方案在O(n)中,但是它需要更多的代码并且在每个重复发现(即,哈希表移除)方面执行更多的工作。它也(可悲地)使用自定义差分函数,而不是标准的STL算法。



应该注意,两个解决方案表示的顺序,不同于原始容器中出现的元素的顺序。有一种方法通过使用散列表解决方案的变体。以下是高级描述(仅在步骤4中与上述解决方案不同):



  1. 加载每个容器

  2. 如果两个容器大小不同,则较小的散列表将用于步骤3中的遍历。否则,将使用两个容器中的第一个。 li>
  3. 遍历步骤2中选择的散列表,检查每个项是否存在于两个散列表中。如果是,从它们中删除它。

  4. 要形成差异容器,请按顺序遍历原始容器(即第二个容器之前的第一个容器)。在每个容器的相应哈希表中查找每个项目。如果找到,则将该项目添加到差异容器中并从其哈希表中删除。不存在于各个哈希表中的项目将被跳过。因此,只有存在于哈希表中的项目将在差异容器中结束,并且它们的出现顺序将保持与原始容器中的相同,因为那些容器指示最终遍历的顺序。


为了保持原始订单,步骤4比上一个解决方案更昂贵,删除的项目数量高。这是因为:


  1. 所有项目都将进行第二次测试,以便资格显示在差异容器中,哈希表。

  2. 哈希表将差异容器形成时其余项目删除一个,作为差异中存在的的一部分测试第1项。


How does one determine what the differences of 2 vectors are?

I have vector<int> v1 and vector<int> v2;

What I am looking for is a vector<int> vDifferences that contains only elements that are only in v1 or v2.

Is there a standard way to do this?

解决方案

Here is the complete and correct answer. Before the set_symmetric_difference algorithm can be used, the source ranges must be ordered:

  using namespace std; // For brevity, don't do this in your own code...

  vector<int> v1;
  vector<int> v2;

  // ... Populate v1 and v2

  // For the set_symmetric_difference algorithm to work, 
  // the source ranges must be ordered!    
  vector<int> sortedV1(v1);
  vector<int> sortedV2(v2);

  sort(sortedV1.begin(),sortedV1.end());
  sort(sortedV2.begin(),sortedV2.end());

  // Now that we have sorted ranges (i.e., containers), find the differences    
  vector<int> vDifferences;

  set_symmetric_difference(
    sortedV1.begin(),
    sortedV1.end(),
    sortedV2.begin(),
    sortedV2.end(),
    back_inserter(vDifferences));

  // ... do something with the differences

It should be noted that sorting is an expensive operation (i.e., O(n log n) for common STL implementations). Especially for the case of one or both of the containers being very large (i.e., millions of integers or more), a different algorithm using hash tables may be preferable based on algorithmic complexity. Here is a high level description of this algorithm:

  1. Load each container into a hash table.
  2. If the two containers differ in size, the hash table corresponding to the smaller one will be used for traversal in Step 3. Otherwise, the first of the two hash tables will be used.
  3. Traverse the hash table chosen in Step 2, checking to see if each item is present in both hash tables. If it is, remove it from both of them. The reason that the smaller hash table is preferred for traversal is because hash table lookups are on the average O(1) regardless of container size. Therefore, the time to traverse is a linear function of n (i.e., O(n)), where n is the size of the hash table being traversed.
  4. Take the union of the remaining items in the hash tables and store the result in a difference container.

C++11 offers us some capability for such a solution by standardizing the unordered_multiset container. I also employed the new usage of the auto keyword for explicit initializations to make the following hash table based solution more concise:

using namespace std; // For brevity, don't do this in your own code...

// The remove_common_items function template removes some and / or all of the
// items that appear in both of the multisets that are passed to it. It uses the
// items in the first multiset as the criteria for the multi-presence test.
template <typename tVal>
void remove_common_items(unordered_multiset<tVal> &ms1, 
                         unordered_multiset<tVal> &ms2)
{
  // Go through the first hash table
  for (auto cims1=ms1.cbegin();cims1!=ms1.cend();)
  {
    // Find the current item in the second hash table
    auto cims2=ms2.find(*cims1);

    // Is it present?
    if (cims2!=ms2.end())
    {
      // If so, remove it from both hash tables
      cims1=ms1.erase(cims1);
      ms2.erase(cims2);
    }
    else // If not
      ++cims1; // Move on to the next item
  }
}

int main()
{
  vector<int> v1;
  vector<int> v2;

  // ... Populate v1 and v2

  // Create two hash tables that contain the values
  // from their respective initial containers    
  unordered_multiset<int> ms1(v1.begin(),v1.end());
  unordered_multiset<int> ms2(v2.begin(),v2.end());

  // Remove common items from both containers based on the smallest
  if (v1.size()<=v2.size)
    remove_common_items(ms1,ms2);
  else
    remove_common_items(ms2,ms1);

  // Create a vector of the union of the remaining items
  vector<int> vDifferences(ms1.begin(),ms1.end());

  vDifferences.insert(vDifferences.end(),ms2.begin(),ms2.end());

  // ... do something with the differences
}

In order to determine which solution is better for a particular situation, profiling both algorithms would be a smart course of action. Although the hash table based solution is in O(n), it requires more code and does more work per duplicate found (i.e., hash table removals). It also (sadly) uses a custom differencing function rather than a standard STL algorithm.

It should be noted that both solutions present the differences in an order that is most likely quite different from the order that the elements appeared in the original containers. There is a way around this by using a variant of the hash table solution. A high level description follows (which only differs in Step 4 from the preceding solution):

  1. Load each container into a hash table.
  2. If the two containers differ in size, the smaller hash table will be used for traversal in Step 3. Otherwise, the first of the two will be used.
  3. Traverse the hash table chosen in Step 2, checking to see if each item is present in both hash tables. If it is, remove it from both of them.
  4. To form the difference container, traverse the original containers in order (i.e., the first container before the second). Look up each item from each container in its respective hash table. If it is found, the item is to be added to the difference container and removed from its hash table. Items not present in the respective hash tables will be skipped. Thus, only the items that are present in the hash tables will wind up in the difference container and their order of appearance will remain the same as it was in the original containers, because those containers dictate the order of the final traversal.

In order to maintain original order, Step 4 has become more expensive than in the previous solution, especially if the number of items removed is high. This is because:

  1. All items will be tested a second time for eligibility to appear in the difference container, via a presence test in their respective hash tables.
  2. The hash tables will have the remainder of their items removed one at a time when the difference container is formed, as part of the present in the difference testing of Item 1.

这篇关于std :: vector差异的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆