解决依赖群体 [英] Working out dependent groups

查看:90
本文介绍了解决依赖群体的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

使用以下函数我可以生成一些测试数据.

导入随机,字符串a = 列表(string.ascii_lowercase)def gen_test_data():s = []对于 xrange(15) 中的 i:p = random.randint(1,3)xs = [random.choice(a) for i in xrange(p)]s.append(xs)返回

这是我的测试数据.

<预><代码>[['作为'],['f', 'c'],['w', 'z'],['z', 'p'],['z', 'u', 'g'],['v', 'q', 'w'],['y', 'w'],['d', 'x', 'i'],['l', 'f', 's'],['z', 'g'],['h', 'x', 'k'],['b'],['t'],['s', 'd', 'c'],['s', 'w', 'd']]

如果一个字母与另一个字母共享列表,则它依赖于该字母,以及任何该字母的其他依赖关系.例如

x 依赖于 ['a','c','d','g','f','i','h','k','l','q','p','s','u','w','v','y','x','z']

这些依赖也是两种方式.k 依赖于一切,包括 x

但是 x 不依赖于 bt.这些可以放在不同的组中.

我需要将列表分成尽可能多的非依赖组.

每个组将是该组所依赖的所有字母的集合.非依赖字母将是一组.

上面的示例输出是

<预><代码>[['t'],['b'],['a','c','d','g','f','i','h','k','l','q','p','s','u','w','v','y','x','z']]

我正在尝试编写一个函数来执行此操作,但无法找出正确组合所有内容的正确方法.

解决方案

这是一个经典的connected组件问题.有许多有效的线性时间或接近线性时间的算法可以解决它,例如使用深度优先搜索等图搜索算法或使用联合查找数据结构.

<小时>

对于基于搜索的算法,您将根据您的输入设置一个图,将输入子列表中的节点连接起来,然后运行该图的搜索以找出哪些节点可以相互访问.像 NetworkX 这样的图形库可以为您处理大部分实现,或者您可以自己编写.例如,

导入集合def build_graph(数据):图 = collections.defaultdict(list)对于数据中的子列表:# 将每个子列表元素连接到第一个元素就足够了.# 无需将每个子列表元素连接到其他子列表元素.对于子列表 [1:] 中的项目:图[sublist[0]].append(item)图[item].append(sublist[0])如果 len(sublist) == 1:# 确保我们添加节点,即使它们没有边.graph.setdefault(sublist[0], [])返回图def dfs_connected_component(graph, node):边界 = [节点]访问 = 设置()而边境:frontier_node = frontier.pop()如果访问了frontier_node:继续访问.添加(frontier_node)前沿.扩展(图[前沿_节点])回访定义依赖组(数据):图 = build_graph(数据)组件 = []nodes_with_component_known = set()对于图中的节点:如果nodes_with_component_known 中的节点:继续组件 = dfs_connected_component(图形,节点)components.append(组件)node_with_component_known.update(组件)返回组件

这将使运行时间与输入的大小成线性关系.

<小时>

您还可以使用 union-find 数据结构.union-find 数据结构将项目与集合相关联,每个集合由一个代表元素表示.它们支持两种操作:find,找到一个元素的代表,以及union,它接受两个元素并将它们的集合连接成一个集合.

您可以设置一个 union-find 数据结构,并且对于您输入中的每个子列表,将子列表的每个元素与子列表的第一个元素联合.这将有效地将所有依赖元素分组,而无需将任何独立元素连接在一起.

通过将联合查找数据结构标准实现为具有按秩和路径压缩联合的不相交集森林,运行时间基本上与输入的大小呈线性关系.它会因输入的 逆阿克曼函数而变慢,即对于所有实际输入大小,基本不变.

Using the following function I can generate some test data.

import random, string
a = list(string.ascii_lowercase)
def gen_test_data():
    s = []
    for i in xrange(15):
        p = random.randint(1,3)
        xs = [random.choice(a) for i in xrange(p)]
        s.append(xs)
    return s

This is my test data.

[
    ['a', 's'],
    ['f', 'c'],
    ['w', 'z'],
    ['z', 'p'],
    ['z', 'u', 'g'],
    ['v', 'q', 'w'],
    ['y', 'w'],
    ['d', 'x', 'i'],
    ['l', 'f', 's'],
    ['z', 'g'],
    ['h', 'x', 'k'],
    ['b'],
    ['t'],
    ['s', 'd', 'c'],
    ['s', 'w', 'd']
]

If a letter shares the list with another letter it is dependent on that letter, and any of that letters other dependencies. For example

x is dependant on ['a','c','d','g','f','i','h','k','l','q','p','s','u','w','v','y', 'x','z']

These dependencies are also two way. k is dependent on everything including x

but x is not dependant on b or t. These can be placed in separate groups.

I need to divide the list into as MANY non dependant groups as possible.

Each group will be a set of all letters that the group depends on. Non dependent letters will be a set of one.

An example output to the one above is

[
    ['t'],
    ['b'],
    ['a','c','d','g','f','i','h','k','l','q','p','s','u','w','v','y', 'x','z']
]

I am trying to write a function to do this but can't figure out the right way to group everything correctly.

解决方案

This is a classic connected components problem. There are a number of efficient linear-time or nearly-linear-time algorithms to solve it, such as with a graph search algorithm like depth-first search or with a union-find data structure.


For a search-based algorithm, you would set up a graph based on your input, with nodes in the input sublists connected, then run a search of the graph to find which nodes are reachable from each other. Graph libraries like NetworkX can handle most of the implementation for you, or you could write it yourself. For example,

import collections

def build_graph(data):
    graph = collections.defaultdict(list)
    for sublist in data:
        # It's enough to connect each sublist element to the first element.
        # No need to connect each sublist element to each other sublist element.
        for item in sublist[1:]:
            graph[sublist[0]].append(item)
            graph[item].append(sublist[0])
        if len(sublist) == 1:
            # Make sure we add nodes even if they don't have edges.
            graph.setdefault(sublist[0], [])
    return graph

def dfs_connected_component(graph, node):
    frontier = [node]
    visited = set()
    while frontier:
        frontier_node = frontier.pop()
        if frontier_node in visited:
            continue
        visited.add(frontier_node)
        frontier.extend(graph[frontier_node])
    return visited

def dependent_groups(data):
    graph = build_graph(data)
    components = []
    nodes_with_component_known = set()
    for node in graph:
        if node in nodes_with_component_known:
            continue
        component = dfs_connected_component(graph, node)
        components.append(component)
        nodes_with_component_known.update(component)
    return components

This would have runtime linear in the size of the input.


You could also use a union-find data structure. A union-find data structure associates items with sets, each set represented by a representative element. They support two operations: find, which finds the representative for an element, and union, which takes two elements and joins their sets into one set.

You can set up a union-find data structure, and for each sublist in your input, union each element of the sublist with the first element of the sublist. This will efficiently group all dependent elements without joining any independent elements together.

With the standard implementation of a union-find data structure as a disjoint-set forest with union by rank and path compression, the runtime would be essentially linear in the size of the input. It'd be slower by a factor of the inverse Ackermann function of the input, which is essentially constant for all practical input sizes.

这篇关于解决依赖群体的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆