如果它是任何其他键中的子字符串,请删除字典键 [英] Remove a dictionary key if it is a substring in any other key

查看:114
本文介绍了如果它是任何其他键中的子字符串,请删除字典键的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在学习Python。我有一个性能问题。对于单个字典,如果




  • ,则要删除密钥a键是另一个键中的子字符串



如果




  • ,我不想删除密钥。本身



我的键是长度在3-50个字符之间的唯一字符串。我正在使用的字典有十万种以上的项目,进行了数十亿次的比较。由于这是一个O(n ^ 2)问题,我应该停止尝试优化这个代码?还是有空间在这里取得进展?



字典是最好的,但我对其他类型是开放的。



例如:'hello'包含'他'和'ell'。我想删除键'他'和'呃',同时保持'你好'。我想在其他键的中间删除前缀,后缀和键 - 子串。



密钥一个接一个生成并添加到字典中。然后 reduce_dict(dictionary)运行。我的假设是:一个测试,当他们被添加到一个字典将是一样的功能测试之后,如下面的代码。

  def reduce_dict(dictionary):
reduced = dictionary.copy()
用于字典中的键:
在字典中的key2:
如果key!= key2:
如果key2在键中:
reduced.pop(key2,0)
返回减少


解决方案

我认为您可以通过稍微优化的方式创建一个好键列表(=不是他人的子串):

 #keys = yourDict.keys(),例如
keys = ['low','el','helloworld','something','ellow','thing','blah','thingy']

#flt按键长度排序的[[key,is_substring],...]
flt = [[x,0] for x in sorted(keys,key = len,reverse = True)]

for i in range(len(flt)):
p = flt [i]
如果p [1]:#已删除
继续
为范围内的j + 1,len(flt)):#迭代更短的字符串
q = flt [j]
如果不是q [1]和p [0]中的q [0]:#如果尚未删除,是子串
q [1] = 1#remove

goodkeys = set(x [0] for x in flt if not x [1])$ ​​b $ b print goodkeys#eg [' helloworld','something','thingy','blah']

现在删除是微不足道的:

  newdict = {k:olddict [k] for k in goodkeys} 
/ pre>

I'm learning Python. I've got a performance issue. For a single dictionary, I want to remove keys if

  • The a key is a substring in another key

I don't want to remove keys if

  • The key substring is itself

My keys are unique strings mostly between 3-50 characters in length. The dictionary I'm working with has 100,000 or more items, making billions of comparisons. Since this is an O(n^2) problem, should I stop trying to optimize this code? Or is there room to make headway here?

A dictionary is preferable, but I'm open to other types.

For example: 'hello' contains 'he' and 'ell'. I want to remove the keys 'he' and 'ell' while keeping 'hello'. I'd like to remove prefixes, suffixes, and key-substrings in the middle of other keys.

The keys are generated one-by-one and added to a dictionary. Then reduce_dict(dictionary) runs. My assumption is: a test while they're added to a dictionary would be as slow as a function testing after, as in the code below.

def reduce_dict(dictionary):
    reduced = dictionary.copy()
    for key in dictionary:
        for key2 in dictionary:
            if key != key2:
                if key2 in key:
                    reduced.pop(key2, 0)
    return reduced

解决方案

I think you can create a list of "good" keys (=those that are not substrings of others) in a slightly optimized way:

# keys = yourDict.keys(), e.g.
keys = ['low', 'el', 'helloworld', 'something', 'ellow', 'thing', 'blah', 'thingy']

# flt is [[key, is_substring],...] sorted by key length reversed
flt = [[x, 0] for x in sorted(keys, key=len, reverse=True)]

for i in range(len(flt)):
    p = flt[i]
    if p[1]:  # already removed
        continue
    for j in range(i + 1, len(flt)): # iterate over shorter strings
        q = flt[j]
        if not q[1] and q[0] in p[0]: # if not already removed and is substring
            q[1] = 1  # remove

goodkeys = set(x[0] for x in flt if not x[1])
print goodkeys # e.g ['helloworld', 'something', 'thingy', 'blah']

Now the removal is trivial:

newdict = {k:olddict[k] for k in goodkeys}

这篇关于如果它是任何其他键中的子字符串,请删除字典键的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆