如果它是任何其他键中的子字符串，请删除字典键 [英] Remove a dictionary key if it is a substring in any other key

查看：114 发布时间：2017/5/21 23:35:42 python performance dictionary

本文介绍了如果它是任何其他键中的子字符串，请删除字典键的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我正在学习Python。我有一个性能问题。对于单个字典，如果

，则要删除密钥a键是另一个键中的子字符串

如果

，我不想删除密钥。本身

我的键是长度在3-50个字符之间的唯一字符串。我正在使用的字典有十万种以上的项目，进行了数十亿次的比较。由于这是一个O（n ^ 2）问题，我应该停止尝试优化这个代码？还是有空间在这里取得进展？

字典是最好的，但我对其他类型是开放的。

例如：'hello'包含'他'和'ell'。我想删除键'他'和'呃'，同时保持'你好'。我想在其他键的中间删除前缀，后缀和键 - 子串。

密钥一个接一个生成并添加到字典中。然后 reduce_dict（dictionary）运行。我的假设是：一个测试，当他们被添加到一个字典将是一样的功能测试之后，如下面的代码。

  def reduce_dict（dictionary）：
 reduced = dictionary.copy（）
用于字典中的键：
在字典中的key2：
如果key！= key2：
如果key2在键中：
 reduced.pop（key2，0）
返回减少

解决方案

我认为您可以通过稍微优化的方式创建一个好键列表（=不是他人的子串）：

 ＃keys = yourDict.keys（），例如
 keys = ['low'，'el'，'helloworld'，'something'，'ellow'，'thing'，'blah'，'thingy'] 
 
＃flt按键长度排序的[[key，is_substring]，...] 
 flt = [[x，0] for x in sorted（keys，key = len，reverse = True）] 
 
 for i in range（len（flt））：
p = flt [i] 
如果p [1]：＃已删除
继续
为范围内的j + 1，len（flt））：＃迭代更短的字符串
q = flt [j] 
如果不是q [1]和p [0]中的q [0]：＃如果尚未删除，是子串
q [1] = 1＃remove 
 
 goodkeys = set（x [0] for x in flt if not x [1]）$ b $ b print goodkeys＃eg [' helloworld'，'something'，'thingy'，'blah']

现在删除是微不足道的：

  newdict = {k：olddict [k] for k in goodkeys} 
  / pre> 
I'm learning Python. I've got a performance issue. For a single dictionary, I want to remove keys if


The a key is a substring in another key


I don't want to remove keys if


The key substring is itself


My keys are unique strings mostly between 3-50 characters in length. The dictionary I'm working with has 100,000 or more items, making billions of comparisons. Since this is an O(n^2) problem, should I stop trying to optimize this code? Or is there room to make headway here?

A dictionary is preferable, but I'm open to other types. 

For example: 'hello' contains 'he' and 'ell'. I want to remove the keys 'he' and 'ell' while keeping 'hello'. I'd like to remove prefixes, suffixes, and key-substrings in the middle of other keys. 

The keys are generated one-by-one and added to a dictionary. Then reduce_dict(dictionary) runs. My assumption is: a test while they're added to a dictionary would be as slow as a function testing after, as in the code below.
def reduce_dict(dictionary):
    reduced = dictionary.copy()
    for key in dictionary:
        for key2 in dictionary:
            if key != key2:
                if key2 in key:
                    reduced.pop(key2, 0)
    return reduced

 解决方案 
I think you can create a list of "good" keys (=those that are not substrings of others) in a slightly optimized way:
# keys = yourDict.keys(), e.g.
keys = ['low', 'el', 'helloworld', 'something', 'ellow', 'thing', 'blah', 'thingy']

# flt is [[key, is_substring],...] sorted by key length reversed
flt = [[x, 0] for x in sorted(keys, key=len, reverse=True)]

for i in range(len(flt)):
    p = flt[i]
    if p[1]:  # already removed
        continue
    for j in range(i + 1, len(flt)): # iterate over shorter strings
        q = flt[j]
        if not q[1] and q[0] in p[0]: # if not already removed and is substring
            q[1] = 1  # remove

goodkeys = set(x[0] for x in flt if not x[1])
print goodkeys # e.g ['helloworld', 'something', 'thingy', 'blah']
Now the removal is trivial:
newdict = {k:olddict[k] for k in goodkeys}


                        
这篇关于如果它是任何其他键中的子字符串，请删除字典键的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

如果它是任何其他键中的子字符串，请删除字典键 [英] Remove a dictionary key if it is a substring in any other key

问题描述

相关文章

Python最新文章

热门教程

热门工具

登录关闭

如果它是任何其他键中的子字符串，请删除字典键 [英] Remove a dictionary key if it is a substring in any other key

问题描述

相关文章

Python最新文章

热门教程

热门工具

登录 关闭

登录关闭