确定2D阵列的独特行(向量<向量>>) [英] Determining the unique rows of a 2D array (vector<vector<T> >)

查看:103
本文介绍了确定2D阵列的独特行(向量<向量>>)的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我使用的数据类型 std :: vector< std :: vector< T& > 以存储2D矩阵/阵列。我想确定这个矩阵的唯一行。我正在寻找任何建议或指针如何去做这个操作。

I am using a datatype of std::vector<std::vector<T> > to store a 2D matrix/array. I would like to determine the unique rows of this matrix. I am looking for any advice or pointers on how to go about doing this operation.

我尝试过两种方法。

方法1:稍微复杂。我为每一行保留一个具有0/1的索引,指示该行是否为重复值,并且通过矩阵工作,将每个唯一行的索引存储在 deque 中。我想将结果存储在< vector< vector< T& > ,所以从索引的deque中,我预先分配,然后将矩阵中的行分配到返回值。

Method 1: slightly convoluted. I keep an index for each row with 0/1 indicating whether the row is a duplicate value, and work through the matrix, storing the index of each unique row in a deque. I want to store the results in a <vector<vector<T> >, and so from this deque of indices, I pre-allocate and then assign the rows from the matrix into the return value.

方法2:更容易阅读,并且在许多情况下比方法1更快。我保留已经找到的唯一行的deque,并将每一行与 deque 中的所有条目进行比较。

Method 2: Is easier to read, and in many cases faster than method 1. I keep a deque of the unique rows that have been found, and just loop through the rows and compare each row to all the entries in this deque.

我将这两种方法与matlab ,并且这些C ++例程是更慢的数量级。有谁有任何聪明的想法如何我可以加快这项工作?我希望对可能有数百万行的矩阵执行此操作。

I am comparing both of these methods to matlab, and these C++ routines are orders of magnitude slower. Does anyone have any clever ideas on how I might speed this operation up? I am looking to do this operation on matrices that potentially have millions of rows.

我在循环期间将唯一的行存储在deque中,以避免调整大小的成本向量,然后将 deque 复制到向量<向量< T> > 。我对这个操作进行了严格的基准测试,并且没有任何靠近减速操作的地方,它占用了例如具有100,000行的矩阵的运行时的少于0.5%。

I am storing the unique rows in a deque during the loop to avoid the cost of resizing a vector, and then copying the deque to the vector<vector<T> > for the results. I've benchmarked this operation closely, and it is not anywhere near slowing operation down, it accounts for less than .5% of the runtime on a matrix with 100,000 rows for example.

感谢,

鲍勃

这是代码。如果任何人有兴趣在一个更完整的例子显示用法,给我一个评论,我可以把东西在一起。

Here is the code. If anyone is interested in a more complete example showing the usage, drop me a comment and I can put something together.

方法1:

  template <typename T>
      void uniqueRows( const std::vector<std::vector<T> > &A,
                       std::vector<std::vector<T> > &ret) {
    // Go through a vector<vector<T> > and find the unique rows
    // have a value ind for each row that is 1/0 indicating if a value
    // has been previously searched.

    // cur : current item being compared to every item
    // num : number of values searched for.  Once all the values in the
    //  matrix have been searched, terminate.

    size_t N = A.size();
    size_t num=1,cur=0,it=1;
    std::vector<unsigned char> ind(N,0);
    std::deque<size_t> ulist;  // create a deque to store the unique inds

    ind[cur] = 1;
    ulist.push_back(0); // ret.push_back(A[0]);

    while(num < N ) {

      if(it >= N ) {
        ++cur;   // find next non-duplicate value, push back
        while(ind[cur])
          ++cur;

        ulist.push_back(cur); //ret.push_back(A[cur]);
        ++num;
        it = cur+1; // start search for duplicates at the next row

        if(it >= N && num == N)
          break;
      }

      if(!ind[it] && A[cur]==A[it]) {        
        ind[it] = 1; // mark as duplicate
        ++num;
      }
      ++it;
    } // ~while num

    // loop over the deque and .push_back the unique vectors    
    std::deque<size_t>::iterator iter;
    const std::deque<size_t>::iterator end = ulist.end();
    ret.reserve(ulist.size());

    for(iter= ulist.begin(); iter != end; ++iter) {
      ret.push_back(A[*iter]);
    }
  }

这是方法2的代码:

  template <typename T>
      inline bool isInList(const std::deque< std::vector<T> > &A,
                    const std::vector<T> &b) {
    typename std::deque<std::vector<T> >::const_iterator it;
    const typename std::deque<std::vector<T> >::const_iterator end = A.end();

    for(it = A.begin(); it != end; ++it) {
      if(*it == b)
        return true;
    }
    return false;
  }

  template <typename T>
  void uniqueRows1(const::std::vector<std::vector<T> > &A,
                   std::vector<std::vector<T> > &ret) {    
    typename std::deque<std::vector<T> > ulist;
    typename std::vector<std::vector<T> >::const_iterator it = A.begin();
    const typename std::vector<std::vector<T> >::const_iterator end = A.end();

    ulist.push_back(*it);

    for(++it; it != end; ++it) {
      if(!isInList(ulist,*it)) {
        ulist.push_back(*it);
      }
    }
    ret.reserve(ulist.size());

    for(size_t i = 0; i != ulist.size(); ++i) {
      ret.push_back(ulist[i]);
    }
  }


推荐答案

也应考虑使用散列,它保持行排序 可以更快(摊销 n) 如果允许更改原件,则 O(2 * m * n) code> sort / unique - 对于大矩阵尤其明显(在小矩阵上,你可能更喜欢Billy的解决方案,因为他需要没有额外的内存分配来跟踪哈希值。)

You should also consider using hashing, it preserves row ordering and could be faster (amortized O(m*n) if alteration of the original is permitted, O(2*m*n) if a copy is required) than sort/unique -- especially noticeable for large matrices (on small matrices you are probably better off with Billy's solution since his requires no additional memory allocation to keep track of the hashes.)

无论如何,利用 Boost.Unordered ,您可以执行以下操作:

Anyway, taking advantage of Boost.Unordered, here's what you can do:

#include <vector>
#include <boost/foreach.hpp>
#include <boost/ref.hpp>
#include <boost/typeof/typeof.hpp>
#include <boost/unordered_set.hpp>

namespace boost {
  template< typename T >
  size_t hash_value(const boost::reference_wrapper< T >& v) {
    return boost::hash_value(v.get());
  }
  template< typename T >
  bool operator==(const boost::reference_wrapper< T >& lhs, const boost::reference_wrapper< T >& rhs) {
    return lhs.get() == rhs.get();
  }
}

// destructive, but fast if the original copy is no longer required
template <typename T>
void uniqueRows_inplace(std::vector<std::vector<T> >& A)
{
  boost::unordered_set< boost::reference_wrapper< std::vector< T > const > > unique(A.size());
  for (BOOST_AUTO(it, A.begin()); it != A.end(); ) {
    if (unique.insert(boost::cref(*it)).second) {
      ++it;
    } else {
      A.erase(it);
    }
  }
}

// returning a copy (extra copying cost)
template <typename T>
void uniqueRows_copy(const std::vector<std::vector<T> > &A,
                 std::vector< std::vector< T > > &ret)
{
  ret.reserve(A.size());
  boost::unordered_set< boost::reference_wrapper< std::vector< T > const > > unique;
  BOOST_FOREACH(const std::vector< T >& row, A) {
    if (unique.insert(boost::cref(row)).second) {
      ret.push_back(row);
    }
  }
}

这篇关于确定2D阵列的独特行(向量&lt;向量&gt;&gt;)的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆