分区/批号/块容器到使用std算法相同大小的块 [英] Partitioning/batch/chunk a container into equal sized pieces using std algorithms

查看:149
本文介绍了分区/批号/块容器到使用std算法相同大小的块的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我碰到的情况下,我不得不批处理一组记录了一个数据库。我想知道我如何与 STD算法

做到这一点

由于10002的记录,我希望它分割成100条记录进行处理箱,其余为2的纸盒。

幸运地是,code将更好地说明什么,我试图完成。我对涉及迭代器解决方案完全开放的,lambda表达式任何形式的现代C ++的乐趣。

 的#include<&了cassert GT;
#包括LT&;矢量>
#包括LT&;&算法GT;模板<类型名T>
的std ::矢量<的std ::矢量< T> >块(性病::矢量< T>常量和放大器;集装箱,为size_t CHUNK_SIZE)
{
  返回的std ::矢量<的std ::矢量< T> >();
}诠释的main()
{
  INT I = 0;
  常量为size_t test_size = 11;
  的std ::矢量<&INT GT;容器(test_size);
  的std :: generate_n(性病::开始(集装箱),test_size,[&安培; I](){返回++我;});  汽车块=块(容器,3);  断言(chunks.size()== 4安培;&安培;应该是四个单元);
  断言(块[0] .size()== 3及&安培;第几个块中应该有理想的块大小);
  断言(chunks.back()大小()== 2及。&放大器;最后一个块应具有的其余2个元素);  返回0;
}


解决方案

这个问题似乎是在的std :: for_each的的变化时,每一个你想去的地方操作上是您的收藏的间隔。因此,你会preFER写一个lambda(或功能),它有两个迭代器定义每个间隔的开始和结束,传递的lambda /功能,你的算法。

这就是我想出了...

  //(头略)模板< TYPENAME迭代器>
无效for_each_interval(
    迭代开始
  ,迭代结束
  ,为size_t interval_size
  ,性病::功能<无效(迭代器,迭代器)GT;操作)
{
  自动为=开始;  同时(以!=结束)
  {
    从=自动;    自动计数器= interval_size;
    而(计数器大于0和放大器;&安培;!为=结束)
    {
      ++来;
       - 计数器;
    }    操作(从,到);
  }
}

(我希望的std ::提前将需要使用计数来增加内部循环的护理,但不幸的是盲目超越步结束[我很想写我自己的 smart_advance 模板来封装这种]如果,将工作,这将减少约一半code的量!)

现在一些code,以测试它...

  //(头略)INT主(INT ARGC,CHAR *的argv [])
{
  //一些测试数据
  INT foo的[10] = {0,1,2,3,4,5,6,7,8,9};
  的std ::矢量<&INT GT; my_data(富,富+ 10);
  为size_t常量间隔= 3;  的typedef decltype(my_data.begin())iter_t;
  for_each_interval< iter_t>(my_data.begin(),my_data.end(),间隔时间,
    [](iter_t从,iter_t到)
    {
      性病::法院LT&;< 间隔:;
      的std :: for_each的(从,到,
        [&放大器;(INT VAL)
        {
          性病::法院LT&;< << VAL;
        });
      性病::法院LT&;<的std :: ENDL;
    });
}

这将产生以下输出,这点我觉得再presents你想要什么:


间隔:0 1 2
间隔:3 4 5
间隔:6 7 8
间隔:9

I came across a situation where I had to batch process a set of records off to a database. I am wondering how I could accomplish this with std algorithms.

Given 10002 records I want partition it into bins of 100 records for processing, with the remainder being a bin of 2.

Hopefully the following code will better illustrate what I'm trying to accomplish. I'm completely open to solutions involving iterators, lambdas any sort of modern C++ fun.

#include <cassert>
#include <vector>
#include <algorithm>

template< typename T >
std::vector< std::vector< T > > chunk( std::vector<T> const& container, size_t chunk_size )
{
  return std::vector< std::vector< T > >();
}

int main()
{
  int i = 0;
  const size_t test_size = 11;
  std::vector<int> container(test_size);
  std::generate_n( std::begin(container), test_size, [&i](){ return ++i; } );

  auto chunks = chunk( container, 3 );

  assert( chunks.size() == 4 && "should be four chunks" );
  assert( chunks[0].size() == 3 && "first several chunks should have the ideal chunk size" );
  assert( chunks.back().size() == 2 && "last chunk should have the remaining 2 elements" );

  return 0;
}

解决方案

The problem seems to be a variation on std::for_each, where the "each" you want to operate on is an interval of your collection. Thus you would prefer to write a lambda (or function) that takes two iterators defining the start and end of each interval and pass that lambda/function to your algorithm.

Here's what I came up with...

// (Headers omitted)

template < typename Iterator >
void for_each_interval(
    Iterator begin
  , Iterator end
  , size_t interval_size
  , std::function<void( Iterator, Iterator )> operation )
{
  auto to = begin;

  while ( to != end )
  {
    auto from = to;

    auto counter = interval_size;
    while ( counter > 0 && to != end )
    {
      ++to;
      --counter;
    }

    operation( from, to );
  }
}

(I wish that std::advance would take care of the inner loop that uses counter to increment to, but unfortunately it blindly steps beyond the end [I'm tempted to write my own smart_advance template to encapsulate this]. If that would work, it would reduce the amount of code by about half!)

Now for some code to test it...

// (Headers omitted)

int main( int argc, char* argv[] )
{
  // Some test data
  int foo[10] = { 0, 1, 2, 3, 4, 5, 6, 7, 8, 9 };
  std::vector<int> my_data( foo, foo + 10 );
  size_t const interval = 3;

  typedef decltype( my_data.begin() ) iter_t;
  for_each_interval<iter_t>( my_data.begin(), my_data.end(), interval,
    []( iter_t from, iter_t to )
    {
      std::cout << "Interval:";
      std::for_each( from, to,
        [&]( int val )
        {
          std::cout << " " << val;
        } );
      std::cout << std::endl;
    } );
}

This produces the following output, which I think represents what you want:

Interval: 0 1 2
Interval: 3 4 5
Interval: 6 7 8
Interval: 9

这篇关于分区/批号/块容器到使用std算法相同大小的块的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆