生成所有多集size-n分区的算法 [英] Algorithm to generate all multiset size-n partitions

查看:50
本文介绍了生成所有多集size-n分区的算法的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我一直试图找到一种方法来生成多集的所有不同的size-n分区,但到目前为止,都是空手而归.首先,让我展示一下我要归档的内容.

假设我们的输入向量为uint32_t:

std::vector<uint32_t> input = {1, 1, 2, 2}

假设我们要创建所有不同的2尺寸分区.其中只有两个,即:

[[1, 1], [2, 2]], [[1, 2], [1, 2]]

请注意顺序无关紧要,即以下所有内容都是重复的,不正确的解决方案.

  • 重复,因为排列组中的顺序无关紧要:

    [[2, 1], [1, 2]]
    

  • 重复,因为组的顺序无关紧要:

    [[2, 2], [1, 1]]
    

不是某种形式的BTW作业.我在工作时编写代码时遇到了这个问题,但是现在我想知道如何处理这个问题已经出于个人利益.与工作相关的问题的参数足够小,以至于生成数千个重复的解决方案并不重要.

当前解决方案(生成重复项)

为了说明我不仅在没有尝试提出解决方案的情况下提问,让我尝试解释一下我当前的算法(与多集一起使用时会生成重复的解决方案).

它的工作原理如下:状态具有一个位集,每个分区块的n位都设置为1.位组的长度为size(input) - n * index_block(),例如.如果输入向量具有8个元素且n = 2,则第一个分区块使用2位设置为1的8位位集,下一个分区块使用2位设置为1的6位位集,依此类推. /p>

通过依次迭代每个位集并提取索引等于当前位集中1位位置的输入向量的元素,从这些位集中创建分区.

为了生成下一个分区,我以相反的顺序遍历了位集.计算下一个位集排列(使用Gosper的破解方法).如果未设置当前位集中的第一位(即未选择矢量索引0),则该位集将重置为其初始状态.强制始终设置第一位可防止在创建大小为n的set分区时产生重复项(上面显示的第二种重复项).如果当前位集等于其起始值,则对先前(更长)的位集重复此步骤.

这对集合非常有用(而且非常快).但是,当与多集一起使用时,它会生成重复的解,因为它不知道两个元素在输入向量中都出现多次.这是一些示例输出:

std::vector<uint32_t> input = {1, 2, 3, 4};
printAllSolutions(myCurrentAlgo(input, 2));
=> [[2, 1], [4, 3]], [[3, 1], [4, 2]], [[4, 1], [3, 2]]

std::vector<uint32_t> input = {1, 1, 2, 2};
printAllSolutions(myCurrentAlgo(input, 2));
=> [[1, 1], [2, 2]], [[2, 1], [2, 1]], [[2, 1], [2, 1]]

生成最后一个(重复的)解决方案是因为该算法没有意识到输入中的重复项,它在两个示例中都生成了完全相同的内部状态(即选择哪个索引).

想要的解决方案

我想现在很清楚我要最终解决的问题.为了完整起见,它看起来如下所示:

std::vector<uint32_t> multiset = {1, 1, 2, 2};
MagicClass myGenerator(multiset, 2);
do {
  std::vector<std::vector<uint32_t> > nextSolution = myGenerator.getCurrent();
  std::cout << nextSolution << std::endl;
} while (myGenerator.calcNext());
=> [[1, 1], [2, 2]]
   [[1, 2], [1, 2]]

即该代码的工作方式类似于std::next_permutation,通知已生成所有解决方案,并返回到第一个"解决方案(对于您要使用的first的任何定义,可能是按字典顺序,但不必如此).

我发现的最接近的相关算法是Knuth的《计算机编程的艺术》,第4卷第1部分,第7.2.1.5节(第430页)中的算法M.但是,这会生成所有可能的多集分区.该书中还练习了如何修改Alg(7.2.1.5.69,第778页的解决方案). M,以便仅生成最多r个分区的解决方案.但是,这仍然允许使用不同大小的分区(例如[[1, 2, 2], [1]]将是r = 2的有效输出).

关于如何解决这个问题的任何想法/技巧/现有算法?请注意,解决方案应该是高效的,即跟踪所有先前生成的解决方案,弄清当前生成的解决方案是否是排列,如果跳过,则是不可行的,因为解决方案空间爆炸的速率对于更长的输入(更多)而言是不可行的.重复.

解决方案

这是一个有效的解决方案,它利用了HervéBrönnimann在next_combination函数. org/jtc1/sc22/wg21/docs/papers/2008/n2639.pdf"rel =" nofollow> N2639 .评论应该使它很不言自明. "herve/combinatorics.hpp"文件包含herve命名空间中的nofollow> N2639 .在C ++ 11/14中,转换为较旧的标准应该是微不足道的.

请注意,我只是快速测试了该解决方案.另外,我是在几分钟前从基于类的实现中提取出来的,因此可能会漏出一些额外的错误.快速的初步测试似乎证实了它的有效性,但是在某些极端情况下,它是行不通的.

#include <cstdint>
#include <iterator>

#include "herve/combinatorics.hpp"

template <typename BidirIter>
bool next_combination_partition (BidirIter const & startIt,
  BidirIter const & endIt, uint32_t const groupSize) {
  // Typedefs
  using tDiff = typename std::iterator_traits<BidirIter>::difference_type;

  // Skip the last partition, because is consists of the remaining elements.
  // Thus if there's 2 groups or less, the start should be at position 0.
  tDiff const totalLength = std::distance(startIt, endIt);
  uint32_t const numTotalGroups = std::max(static_cast<uint32_t>((totalLength - 1) / groupSize + 1), 2u);
  uint32_t curBegin = (numTotalGroups - 2) * groupSize;
  uint32_t const lastGroupBegin = curBegin - 1;
  uint32_t curMid = curBegin + groupSize;
  bool atStart = (totalLength != 0);

  // Iterate over combinations from back of list to front. If a combination ends
  // up at its starting value, update the previous one as well.
  for (; (curMid != 0) && (atStart);
    curMid = curBegin, curBegin -= groupSize) {
    // To prevent duplicates, first element of each combination partition needs
    // to be fixed. So move start iterator to the next element. This is not true
    // for the starting (2nd to last) group though.
    uint32_t const startIndex = std::min(curBegin + 1, lastGroupBegin + 1);
    auto const iterStart = std::next(startIt, startIndex);
    auto const iterMid = std::next(startIt, curMid);
    atStart = !herve::next_combination(iterStart, iterMid, endIt);
  }

  return !atStart;
}

编辑下面是我迅速抛出的测试代码("combopart.hpp"显然是包含上述功能的文件).

#include "combopart.hpp"

#include <algorithm>
#include <cstdint>
#include <iostream>
#include <iterator>
#include <vector>

int main (int argc, char* argv[]) {
  uint32_t const groupSize = 2;

  std::vector<uint32_t> v;
  v = {0, 1, 2, 3, 4, 5, 6, 7, 8, 9};
  v = {0, 0, 0, 1, 1, 1, 2, 2, 2, 3};
  v = {1, 1, 2, 2};

  // Make sure contents are sorted
  std::sort(v.begin(), v.end());

  uint64_t count = 0;
  do {
    ++count;

    std::cout << "[ ";
    uint32_t elemCount = 0;
    for (auto it = v.begin(); it != v.end(); ++it) {
      std::cout << *it << " ";
      elemCount++;
      if ((elemCount % groupSize == 0) && (it != std::prev(v.end()))) {
        std::cout << "| ";
      }
    }
    std::cout << "]" << std::endl;
  } while (next_combination_partition(v.begin(), v.end(), groupSize));

  std::cout << std::endl << "# elements: " << v.size() << " - group size: " <<
    groupSize << " - # combination partitions: " << count << std::endl;

  return 0;
}

编辑2 改进的算法.用条件移动(使用std::max)和将atStart布尔值设置为false的组合替换了提前退出分支.不过未经测试,请注意.

编辑3 需要进行额外的修改,以免固定"倒数第二个分区中的第一个元素.附加代码应该作为有条件的动作进行编译,因此不应有与之相关的分支成本.

PS:我知道@Howard Hinnant生成组合的代码(在 https://howardhinnant.github.io/combinations.html )的速度比HervéBrönnimann的速度快得多.但是,该代码无法处理输入中的重复项(据我所知,它甚至从未取消对迭代器的引用),这是我的问题明确要求的.另一方面,如果您确定您的输入不会包含重复项,那么绝对是您想要与上述函数一起使用的代码.

I've been trying to figure out a way to generate all distinct size-n partitions of a multiset, but so far have come up empty handed. First let me show what I'm trying to archieve.

Let's say we have an input vector of uint32_t:

std::vector<uint32_t> input = {1, 1, 2, 2}

An let's say we want to create all distinct 2-size partitions. There's only two of these, namely:

[[1, 1], [2, 2]], [[1, 2], [1, 2]]

Note that order does not matter, i.e. all of the following are duplicate, incorrect solutions.

  • Duplicate because order within a permutation group does not matter:

    [[2, 1], [1, 2]]
    

  • Duplicate because order of groups does not matter:

    [[2, 2], [1, 1]]
    

Not homework of some kind BTW. I encountered this while coding something at work, but by now it is out of personal interest that I'd like to know how to deal with this. The parameters for the work-related problem were small enough that generating a couple thousand duplicate solutions didn't really matter.

Current solution (generates duplicates)

In order to illustrate that I'm not just asking without having tried to come up with a solution, let me try to explain my current algorithm (which generates duplicate solutions when used with multisets).

It works as follows: the state has a bitset with n bits set to 1 for each partition block. The length of the bitsets is size(input) - n * index_block(), e.g. if the input vector has 8 elements and n = 2, then the first partition block uses an 8-bit bitset with 2 bits set to 1, the next partition block uses a 6-bit bitset with 2 bits set to 1, etc.

A partition is created from these bitsets by iterating over each bitset in order and extracting the elements of the input vector with indices equal to the position of 1-bits in the current bitset.

In order to generate the next partition, I iterate over the bitsets in reverse order. The next bitset permutation is calculated (using a reverse of Gosper's hack). If the first bit in the current bitset is not set (i.e. vector index 0 not selected), then that bitset is reset to its starting state. Enforcing that the first bit is always set prevents generating duplicates when creating size-n set partitions (duplicates of the 2nd kind shown above). If the current bitset is equal to its starting value, this step is then repeated for the previous (longer) bitset.

This works great (and very fast) for sets. However, when used with multisets it generates duplicate solutions, since it is unaware that both elements appear more than once in the input vector. Here's some example output:

std::vector<uint32_t> input = {1, 2, 3, 4};
printAllSolutions(myCurrentAlgo(input, 2));
=> [[2, 1], [4, 3]], [[3, 1], [4, 2]], [[4, 1], [3, 2]]

std::vector<uint32_t> input = {1, 1, 2, 2};
printAllSolutions(myCurrentAlgo(input, 2));
=> [[1, 1], [2, 2]], [[2, 1], [2, 1]], [[2, 1], [2, 1]]

That last (duplicate) solution is generated simply because the algorithm is unaware of duplicates in the input, it generates the exact same internal states (i.e. which indices to select) in both examples.

Wanted solution

I guess it's pretty clear by now what I'm trying to end up with. Just for the sake of completeness, it would look somewhat as follows:

std::vector<uint32_t> multiset = {1, 1, 2, 2};
MagicClass myGenerator(multiset, 2);
do {
  std::vector<std::vector<uint32_t> > nextSolution = myGenerator.getCurrent();
  std::cout << nextSolution << std::endl;
} while (myGenerator.calcNext());
=> [[1, 1], [2, 2]]
   [[1, 2], [1, 2]]

I.e. the code would work somewhat like std::next_permutation, informing that is has generated all solutions and has ended back at the "first" solution (for whatever definition of first you want to use, probably lexicographically, but doesn't need to be).

The closest related algorithm I found is Algorithm M from Knuth's The Art of Computer Programming, Volume 4 Part 1, section 7.2.1.5 (p. 430). However, that generates all possible multiset partitions. There is also an exercise in the book (7.2.1.5.69, solution on p. 778) about how to modify Alg. M in order to generate only solutions with at most r partitions. However, that still allows partitions of different sizes (e.g. [[1, 2, 2], [1]] would be a valid output for r = 2).

Any ideas/tricks/existing algorithms on how to go about this? Note that the solution should be efficient, i.e. keeping track of all previously generated solutions, figuring out if the currently generated one is a permutation and if so skipping it, is infeasible because of the rate by which the solution space explodes for longer inputs with more duplicates.

解决方案

Here's a working solution that makes use of the next_combination function presented by Hervé Brönnimann in N2639. The comments should make it pretty self-explanatory. The "herve/combinatorics.hpp" file contains the code listed in N2639 inside the herve namespace. It's in C++11/14, converting to an older standard should be pretty trivial.

Note that I only quickly tested the solution. Also, I extracted it from a class-based implementation just a couple of minutes ago, so some extra bugs might have crept in. A quick initial test seems to confirm it works, but there might be corner cases for which it won't.

#include <cstdint>
#include <iterator>

#include "herve/combinatorics.hpp"

template <typename BidirIter>
bool next_combination_partition (BidirIter const & startIt,
  BidirIter const & endIt, uint32_t const groupSize) {
  // Typedefs
  using tDiff = typename std::iterator_traits<BidirIter>::difference_type;

  // Skip the last partition, because is consists of the remaining elements.
  // Thus if there's 2 groups or less, the start should be at position 0.
  tDiff const totalLength = std::distance(startIt, endIt);
  uint32_t const numTotalGroups = std::max(static_cast<uint32_t>((totalLength - 1) / groupSize + 1), 2u);
  uint32_t curBegin = (numTotalGroups - 2) * groupSize;
  uint32_t const lastGroupBegin = curBegin - 1;
  uint32_t curMid = curBegin + groupSize;
  bool atStart = (totalLength != 0);

  // Iterate over combinations from back of list to front. If a combination ends
  // up at its starting value, update the previous one as well.
  for (; (curMid != 0) && (atStart);
    curMid = curBegin, curBegin -= groupSize) {
    // To prevent duplicates, first element of each combination partition needs
    // to be fixed. So move start iterator to the next element. This is not true
    // for the starting (2nd to last) group though.
    uint32_t const startIndex = std::min(curBegin + 1, lastGroupBegin + 1);
    auto const iterStart = std::next(startIt, startIndex);
    auto const iterMid = std::next(startIt, curMid);
    atStart = !herve::next_combination(iterStart, iterMid, endIt);
  }

  return !atStart;
}

Edit Below is my quickly thrown together test code ("combopart.hpp" obviously being the file containing the above function).

#include "combopart.hpp"

#include <algorithm>
#include <cstdint>
#include <iostream>
#include <iterator>
#include <vector>

int main (int argc, char* argv[]) {
  uint32_t const groupSize = 2;

  std::vector<uint32_t> v;
  v = {0, 1, 2, 3, 4, 5, 6, 7, 8, 9};
  v = {0, 0, 0, 1, 1, 1, 2, 2, 2, 3};
  v = {1, 1, 2, 2};

  // Make sure contents are sorted
  std::sort(v.begin(), v.end());

  uint64_t count = 0;
  do {
    ++count;

    std::cout << "[ ";
    uint32_t elemCount = 0;
    for (auto it = v.begin(); it != v.end(); ++it) {
      std::cout << *it << " ";
      elemCount++;
      if ((elemCount % groupSize == 0) && (it != std::prev(v.end()))) {
        std::cout << "| ";
      }
    }
    std::cout << "]" << std::endl;
  } while (next_combination_partition(v.begin(), v.end(), groupSize));

  std::cout << std::endl << "# elements: " << v.size() << " - group size: " <<
    groupSize << " - # combination partitions: " << count << std::endl;

  return 0;
}

Edit 2 Improved algorithm. Replaced early exit branch with combination of conditional move (using std::max) and setting atStart boolean to false. Untested though, be warned.

Edit 3 Needed an extra modification so as not to "fix" the first element in the 2nd to last partition. The additional code should compile as a conditional move, so there should be no branching cost associated with it.

P.S.: I am aware that the code to generate combinations by @Howard Hinnant (available at https://howardhinnant.github.io/combinations.html) is much faster than the one by Hervé Brönnimann. However, that code can not handle duplicates in the input (because as far as I can see, it never even dereferences an iterator), which my problem explicitly requires. On the other hand, if you know for sure your input won't contain duplicates, it is definitely the code you want use with my function above.

这篇关于生成所有多集size-n分区的算法的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆