查找所有重叠的字典键组 [英] Finding all the overlapping groups of dictionary keys

查看:311
本文介绍了查找所有重叠的字典键组的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

说我在Python中有一个列表的字典。我想找到所有具有共同项目的密钥组,对于每个这样的组,相应的项目。



例如,假设项目是简单的整数:

  dct = dict()
dct ['a'] = [ 0,5,7]
dct ['b'] = [1,2,5]
dct ['c'] = [3,2]
dct ['d'] = [3]
dct ['e'] = [0,5]

组将是:

  groups = dict()
groups [0] = ['a','e' ]
groups [1] = ['b','c']
groups [2] = ['c','d']
groups [3] = ['a' ,'b','e']

这些组的共同元素是: / p>

  common = dict()
common [0] = [0,5]
common [1] = [2]
common [2] = [3]
common [3] = [5]

为了解决这个问题,我相信建立一个矩阵的价值在下面,但我不知道如何从这一点出发。是否有任何Python库可以帮助解决这类问题?

  a b c d e | 
| a | x x |
| b | x x x |
| c | x x x |
| d | x x |
| e | x x x |



更新



@NickBurn在函数中提供的解决方案,但是我在复制解决方案时遇到问题:

  dct = {'a' [0,5,7],'b':[1,2,5],'c':[3,2],'d':[3],'e':[0,5]} 

组,common_items = get_groups(dct)
打印'组',组
打印'常用项',common_items

我得到:

 组:defaultdict(< type'list '> {0:['a','e'],2:['c','b'],3:['c','d'],5:['a' ','e']})

常见项目:{0:无,2:无,3:无,5:无}

这里是函数

 从集合import defaultdict 
def common(query_group,dct):
递归地查找组内的公共元素$
if len(query_group)< = 1:
return
#从组中提取元素
#从dct中拉出原始值
#获取这些
的第一个,第二个= set(dct [query_group [0]]),set(dct [query_group [1]])的交集
#print(first.intersection(second))
return common(query_group [2:],dct)


def get_groups(dct):
groups = defaultdict(list)

,dct.items()中的值:
值中的值:
groups [value] .append(key)

#清理组:
键入group.keys():
#即该值通常为多于一个组
如果len(groups [key])< = 1:
del groups [key]

#识别常用元素:
common_items = dict()
for k,v in groups.iteritems():
if len(v)> 1:
common_items [k] = common(v,dct)

返回组,common_items


解决方案

我将尝试创建一个代表每个列表的交集的第二个字典( groups )原来的 dct 。例如,你可以使用defaultdict这样做:

 从集合导入defaultdict 
groups = defaultdict(list )
dct = {'a':[0,5,7],'b':[1,2,5],'c':[3,2],'d':[3] 'e':[0,5]}
为键,dct.items()中的值为
值为
groups [value] .append(key)

为group.keys()中的键:
如果len(groups [key])> 1:#即值大于1组
print(key,groups [key])

(0,['a','e'])
(2,['c','b'])
(3,['c','d'])
(5,['a','b','e' ])

查找常见元素有点麻烦,您需要运行每个组,并找到与原来的 dct 相交。可能这样的递归例程可以工作:

  def common(query_group,dct,have_common = []):
递归地找到组内的共同元素

if len(query_group)< = 1:
return have_common

#组,然后从dct
#中获取原始值,然后获取这些
的第一,第二= set(dct [query_group [0]]),set(dct [query_group [1]])的交集b $ b have_common.extend(first.intersection(second))

return common(query_group [2:],dct,have_common)

for groups_value中的query_group ):
如果len(query_group)> 1:
print(query_group,'=>',common(query_group,dct,have_common = []))

['e','a'] => [0,5]
['b','c'] => [2]
['d','c'] => [3]
['e','b','a'] => [5]]

显然,它需要一些更漂亮的格式化,但我认为它完成了工作。希望有帮助。


Say I have a dictionary of lists in Python. I would like to find all the groups of keys that have items in common, and for each such group, the corresponding items.

For example, assuming that the items are simple integers:

dct      = dict()
dct['a'] = [0, 5, 7]
dct['b'] = [1, 2, 5]
dct['c'] = [3, 2]
dct['d'] = [3]
dct['e'] = [0, 5]

The groups would be:

groups    = dict()
groups[0] = ['a', 'e']
groups[1] = ['b', 'c']
groups[2] = ['c', 'd']
groups[3] = ['a', 'b', 'e']

And the elements in common for those groups would be:

common    = dict()
common[0] = [0, 5]
common[1] = [2]
common[2] = [3]
common[3] = [5]

To solve this problem, I believe that there is value in building a matrix like the one below, but I am not sure how to proceed from this point. Are there any Python libraries that facilitate solving this type of problem?

   | a  b  c  d  e |
|a|  x           x |
|b|     x  x     x |
|c|     x  x  x    |
|d|        x  x    |
|e|  x  x        x |

Update

I tried to wrap up the solution that @NickBurns provided within a function, but I am having problems reproducing the solution:

dct = { 'a' : [0, 5, 7], 'b' : [1, 2, 5], 'c' : [3, 2], 'd' : [3], 'e' : [0, 5]}

groups, common_items = get_groups(dct)
print 'Groups', groups
print 'Common items',  common_items

I get:

Groups: defaultdict(<type 'list'>, {0: ['a', 'e'], 2: ['c', 'b'], 3: ['c', 'd'], 5: ['a', 'b', 'e']})                                                        

Common items: {0: None, 2: None, 3: None, 5: None}

And here are the functions

from collections import defaultdict
def common(query_group, dct):
    """ Recursively find the common elements within groups """
    if len(query_group) <= 1:
        return
    # Extract the elements from groups,
    # Pull their original values from dct
    # Get the intersection of these
    first, second = set(dct[query_group[0]]), set(dct[query_group[1]])  
    # print(first.intersection(second))
    return common(query_group[2:], dct)


def get_groups (dct):
  groups = defaultdict(list)

  for key, values in dct.items():
    for value in values:
      groups[value].append(key)

  # Clean up the groups:      
  for key in groups.keys():
    # i.e. the value is common to more than 1 group
    if len(groups[key]) <= 1:    
      del groups[key]

  # Identify common elements:
  common_items = dict()
  for k,v in groups.iteritems():
    if len(v) > 1:
      common_items[k] = common(v, dct)

  return groups, common_items

解决方案

I would try to create a second dictionary (groups) that represents the intersection of each list in the original dct. For example, youu could do this using a defaultdict something like:

from collections import defaultdict
groups = defaultdict(list)
dct = { 'a' : [0, 5, 7], 'b' : [1, 2, 5], 'c' : [3, 2], 'd' : [3], 'e' : [0, 5]}
for key, values in dct.items():
    for value in values:
        groups[value].append(key)

for key in groups.keys():
    if len(groups[key]) > 1:    # i.e. the value is common to more than 1 group
        print(key, groups[key])

(0, ['a', 'e'])
(2, ['c', 'b'])
(3, ['c', 'd'])
(5, ['a', 'b', 'e'])

Finding the common elements is a little messier, you need to run through each group and find the intersection from the original dct. Perhaps a recursive routine like this would work:

def common(query_group, dct, have_common=[]):
    """ Recursively find the common elements within groups """

    if len(query_group) <= 1:
        return have_common

    # extract the elements from groups, and pull their original values from dct
    # then get the intersection of these
    first, second = set(dct[query_group[0]]), set(dct[query_group[1]])
    have_common.extend(first.intersection(second))

    return common(query_group[2:], dct, have_common)

for query_group in groups.values():
    if len(query_group) > 1:
        print(query_group, '=>', common(query_group, dct, have_common=[]))

['e', 'a'] => [0, 5]    
['b', 'c'] => [2]    
['d', 'c'] => [3]    
['e', 'b', 'a'] => [5}]

Clearly it needs some prettier formatting, but I think it gets the job done. Hopefully that helps.

这篇关于查找所有重叠的字典键组的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆