逆笛卡尔积 [英] Reverse Cartesian Product
问题描述
给定以下数据集:
a | b | c | d
1 | 3 | 7 | 11
1 | 5 | 7 | 11
1 | 3 | 8 | 11
1 | 5 | 8 | 11
1 | 6 | 8 | 11
执行逆笛卡尔积得到:
a | b | c | d
1 | 3,5 | 7,8 | 11
1 | 6 | 8 | 11
我目前正在使用 Scala,我的输入/输出数据类型目前是:
I am currently working with scala, and my input/output data type is currently:
ListBuffer[Array[Array[Int]]]
我想出了一个解决方案(见下文),但觉得它可以优化.我对优化我的方法和全新的方法持开放态度.首选 Scala 和 C# 中的解决方案.
I have come up with a solution (seen below), but feel it could be optimized. I am open to optimizations of my approach, and completely new approaches. Solutions in scala and c# are preferred.
我也很好奇这是否可以在 MS SQL 中完成.
I am also curious if this could be done in MS SQL.
我目前的解决方案:
def main(args: Array[String]): Unit = {
// Input
val data = ListBuffer(Array(Array(1), Array(3), Array(7), Array(11)),
Array(Array(1), Array(5), Array(7), Array(11)),
Array(Array(1), Array(3), Array(8), Array(11)),
Array(Array(1), Array(5), Array(8), Array(11)),
Array(Array(1), Array(6), Array(8), Array(11)))
reverseCartesianProduct(data)
}
def reverseCartesianProduct(input: ListBuffer[Array[Array[Int]]]): ListBuffer[Array[Array[Int]]] = {
val startIndex = input(0).size - 1
var results:ListBuffer[Array[Array[Int]]] = input
for (i <- startIndex to 0 by -1) {
results = groupForward(results, i, startIndex)
}
results
}
def groupForward(input: ListBuffer[Array[Array[Int]]], groupingIndex: Int, startIndex: Int): ListBuffer[Array[Array[Int]]] = {
if (startIndex < 0) {
val reduced = input.reduce((a, b) => {
mergeRows(a, b)
})
return ListBuffer(reduced)
}
val grouped = if (startIndex == groupingIndex) {
Map(0 -> input)
}
else {
groupOnIndex(input, startIndex)
}
val results = grouped.flatMap{
case (index, values: ListBuffer[Array[Array[Int]]]) =>
groupForward(values, groupingIndex, startIndex - 1)
}
results.to[ListBuffer]
}
def groupOnIndex(list: ListBuffer[Array[Array[Int]]], index: Int): Map[Int, ListBuffer[Array[Array[Int]]]] = {
var results = Map[Int, ListBuffer[Array[Array[Int]]]]()
list.foreach(a => {
val key = a(index).toList.hashCode()
if (!results.contains(key)) {
results += (key -> ListBuffer[Array[Array[Int]]]())
}
results(key) += a
})
results
}
def mergeRows(a: Array[Array[Int]], b: Array[Array[Int]]): Array[Array[Int]] = {
val zipped = a.zip(b)
val merged = zipped.map{ case (array1: Array[Int], array2: Array[Int]) =>
val m = array1 ++ array2
quickSort(m)
m.distinct
.array
}
merged
}
它的工作方式是:
- 从右到左循环列(groupingIndex 指定要在哪一列上运行.这一列是唯一一个不必具有彼此相等的值以合并行.)
- 递归地将所有其他列上的数据分组(不是 groupingIndex).
- 在对所有列进行分组后,假设每个组中的数据在除分组列之外的每一列中都有等价的值.
- 将行与匹配的列合并.取每一列的不同值并对每一列进行排序.
如果其中某些内容没有意义,我深表歉意,我的大脑今天无法正常工作.
I apologize if some of this does not make sense, my brain is not functioning today.
推荐答案
这是我的看法.代码使用 Java,但可以轻松转换为 Scala 或 C#.
Here is my take on this. Code is in Java but could easily be converted into Scala or C#.
我在 n-1
的所有组合上运行 groupingBy
并使用具有最低计数的组合,这意味着最大的合并深度,所以这有点贪婪方法.但是,不能保证您会找到最佳解决方案,这意味着将 k
的数量最小化,这是 np-hard
要做的,请参阅链接 此处 的解释,但您会找到一个有效的解决方案,并且执行速度相当快.
I run groupingBy
on all combinations of n-1
and go with the one that has the lowest count, meaning largest merge depth, so this is kind of a greedy approach. However it is not guaranteed that you will find the optimal solution, meaning minimize the number k
which is np-hard
to do, see link here for an explanation, but you will find a solution that is valid and do it rather fast.
完整示例:https://github.com/jbilander/ReverseCartesianProduct/tree/master/src
Main.java
import java.util.*;
import java.util.stream.Collectors;
public class Main {
public static void main(String[] args) {
List<List<Integer>> data = List.of(List.of(1, 3, 7, 11), List.of(1, 5, 7, 11), List.of(1, 3, 8, 11), List.of(1, 5, 8, 11), List.of(1, 6, 8, 11));
boolean done = false;
int rowLength = data.get(0).size(); //4
List<Table> tables = new ArrayList<>();
// load data into table
for (List<Integer> integerList : data) {
Table table = new Table(rowLength);
tables.add(table);
for (int i = 0; i < integerList.size(); i++) {
table.getMap().get(i + 1).add(integerList.get(i));
}
}
// keep track of count, needed so we know when to stop iterating
int numberOfRecords = tables.size();
// start algorithm
while (!done) {
Collection<List<Table>> result = getMinimumGroupByResult(tables, rowLength);
if (result.size() < numberOfRecords) {
tables.clear();
for (List<Table> tableList : result) {
Table t = new Table(rowLength);
tables.add(t);
for (Table table : tableList) {
for (int i = 1; i <= rowLength; i++) {
t.getMap().get(i).addAll(table.getMap().get(i));
}
}
}
numberOfRecords = tables.size();
} else {
done = true;
}
}
tables.forEach(System.out::println);
}
private static Collection<List<Table>> getMinimumGroupByResult(List<Table> tables, int rowLength) {
Collection<List<Table>> result = null;
int min = Integer.MAX_VALUE;
for (List<Integer> keyCombination : getKeyCombinations(rowLength)) {
switch (rowLength) {
case 4: {
Map<Tuple3<TreeSet<Integer>, TreeSet<Integer>, TreeSet<Integer>>, List<Table>> map =
tables.stream().collect(Collectors.groupingBy(t -> new Tuple3<>(
t.getMap().get(keyCombination.get(0)),
t.getMap().get(keyCombination.get(1)),
t.getMap().get(keyCombination.get(2))
)));
if (map.size() < min) {
min = map.size();
result = map.values();
}
}
break;
case 5: {
//TODO: Handle n = 5
}
break;
case 6: {
//TODO: Handle n = 6
}
break;
}
}
return result;
}
private static List<List<Integer>> getKeyCombinations(int rowLength) {
switch (rowLength) {
case 4:
return List.of(List.of(1, 2, 3), List.of(1, 2, 4), List.of(2, 3, 4), List.of(1, 3, 4));
//TODO: handle n = 5, n = 6, etc...
}
return List.of(List.of());
}
}
tables.forEach(System.out::println)
Table{1=[1], 2=[3, 5, 6], 3=[8], 4=[11]}
Table{1=[1], 2=[3, 5], 3=[7], 4=[11]}
或重写以提高可读性:
a | b | c | d
--|-------|---|---
1 | 3,5,6 | 8 | 11
1 | 3,5 | 7 | 11
如果您要在 sql (mysql) 中完成所有这些操作,您可能会使用 group_concat()
,我认为 MS SQL 在这里有类似的东西:simulating-group-concat 或 STRING_AGG
如果 SQL Server 2017,但我认为您必须使用文本列,在这种情况下这有点令人讨厌:
If you were to do all this in sql (mysql) you could possibly use group_concat()
, I think MS SQL has something similar here: simulating-group-concat or STRING_AGG
if SQL Server 2017, but I think you would have to work with text columns which is a bit nasty in this case:
例如
create table my_table (A varchar(50) not null, B varchar(50) not null,
C varchar(50) not null, D varchar(50) not null);
insert into my_table values ('1','3,5','4,15','11'), ('1','3,5','3,10','11');
select A, B, group_concat(C order by C) as C, D from my_table group by A, B, D;
将给出下面的结果,因此您必须解析、排序和更新逗号分隔的结果,以便任何下一次合并迭代(分组依据)正确.
Would give the result below, so you would have to parse and sort and update the comma separated result for any next merge iteration (group by) to be correct.
['1', '3,5', '3,10,4,15', '11']
这篇关于逆笛卡尔积的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!