将字符串列表与公共元素组合 [英] combining list of strings with a common element

查看:43
本文介绍了将字符串列表与公共元素组合的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

假设我有一个字符串哈希集的向量:

Let's say I have a vector of a hashset of strings:

Vector<HashSet<String>> strSetVector = new Vector<HashSet<String>>();

我有 4 个包含以下字符串的哈希集:

I have 4 hashsets containing the following strings:

"A", "B"
"C", "D"
"B", "C"
"E", "F"

我想组合至少具有一个共同值的集合,以便我最终得到:

I want to combine the sets that have at least one common value so that I end up with:

"A", "B","C", "D"
"E", "F"

显而易见的解决方案是通过向量和每个哈希集迭代多次以找到共同值,但这需要一段时间来处理大小为 1000+ 的向量和大小为 100 的哈希集.我也必须去如果我合并一个哈希集以查看现在是否还有其他可以合并的哈希集,则再次通过该过程.例如,第一次向量迭代会将 B,C 组合到 A,B,这样我就会得到:

The obvious solution is to iterate multiple times thru the vector and each hashset to find common values but this will take a while to process with a vector size of 1000+ and HashSets of size up to 100. I would also have to go thru the process again if i merge a hashset to see if there are now other hashsets that can be merged. For example, first vector iteration would combine B,C to A,B so that I would end up with:

"A", "B", "C"
"C", "D"
"E", "F"

向量/哈希集的下一次迭代:

Next iteration of the vector/hashset:

"A", "B", "C", "D"
"E", "F"

向量/散列集的下一次迭代将找不到任何公共字符串,因此没有任何可合并的内容,我会完成.

Next iteration of the vector/hashset would not find any common strings so there would be nothing to merge and I would be done.

对于看似简单的问题,我想要一个更优雅的解决方案.有什么想法吗?

I would like a more elegant solution to what seems like a simple problem. Any ideas?

推荐答案

我不确定我是否理解正确.而且我认为最佳"解决方案也可能取决于集合和列表的大小.(也就是说,列表是否包含 10 个集合,每个集合包含 100000 个元素,或者它是否包含 100000 个集合,每个集合包含 10 个元素).

I'm not sure if I understood everything correctly. And I think that the "best" solution might also depend on the sizes of the sets and the list. (That is, whether the list contains 10 sets where each set contains 100000 elements, or whether it contains 100000 sets where each set contains 10 elements).

但对于目前提到的数字(1000 个集合,100 个元素),我认为可以使用一个相对简单的解决方案:

But for the numbers mentioned so far (1000 sets with 100 elements), I think that one could use a comparatively simple solution:

  • 浏览集合列表
  • 对于每个集合,遍历它的元素
  • 对于每个元素,存储从元素到它所包含的集合的映射
  • 当遇到一个已经有关联集合的元素时,将当前集合和这个现有集合合并,并存储一个从这个合并集合的所有元素到合并集合的映射

此代码片段基于给定的示例,并打印了一些调试信息,这可能会使过程更清晰.它还存储了一个 compactMap,它将(可能合并的)集合的第一个元素映射到集合本身,以具有每个集合仅出现一次的集合的表示.

This code snippet is based on the given example, and prints some debugging information, which might make the process clearer. It additionally stores a compactMap that maps the first element of a (potentially merged) set to the set itself, to have a representation of the sets where each set occurs only once.

import java.util.ArrayList;
import java.util.Arrays;
import java.util.Collection;
import java.util.LinkedHashMap;
import java.util.LinkedHashSet;
import java.util.List;
import java.util.Map;
import java.util.Set;

public class MergeListsTest
{
    public static void main(String[] args)
    {
        List<Set<String>> sets = new ArrayList<Set<String>>();
        sets.add(new LinkedHashSet<String>(Arrays.asList("A", "B")));
        sets.add(new LinkedHashSet<String>(Arrays.asList("C", "D")));
        sets.add(new LinkedHashSet<String>(Arrays.asList("B", "C")));
        sets.add(new LinkedHashSet<String>(Arrays.asList("E", "F")));
        //sets.add(new LinkedHashSet<String>(Arrays.asList("D")));
        //sets.add(new LinkedHashSet<String>(Arrays.asList("D", "X")));
        //sets.add(new LinkedHashSet<String>());

        Collection<Set<String>> merged = computeMerged(sets);

        System.out.println("Resulting sets:");
        for (Set<String> s : merged)
        {
            System.out.println(s);
        }
    }

    private static <T> Collection<Set<T>> computeMerged(List<Set<T>> sets)
    {
        Map<T, Set<T>> compactMap = new LinkedHashMap<T, Set<T>>();
        Map<T, Set<T>> map = new LinkedHashMap<T, Set<T>>();
        for (Set<T> set : sets)
        {
            System.out.println("Handle set "+set);

            Set<T> combinedSet = new LinkedHashSet<T>(set);
            for (T t : set)
            {
                Set<T> innerSet = map.get(t);
                if (innerSet != null && !innerSet.isEmpty())
                {
                    System.out.println("Element "+t+" was previously mapped to "+innerSet);

                    T first = innerSet.iterator().next();
                    compactMap.remove(first);
                    combinedSet.addAll(innerSet);

                    System.out.println("Combined set is now "+combinedSet);
                }
            }
            if (!combinedSet.isEmpty())
            {
                System.out.println("Store a mapping from each element in "+combinedSet+" to this set");
                T first = combinedSet.iterator().next();
                compactMap.put(first, combinedSet);
                for (T t : combinedSet)
                {
                    map.put(t, combinedSet);
                }
            }
        }
        return compactMap.values();
    }

}

这篇关于将字符串列表与公共元素组合的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆