如果它是任何其他键中的子字符串,请删除字典键 [英] Remove a dictionary key if it is a substring in any other key
问题描述
- ,则要删除密钥a键是另一个键中的子字符串
如果
- ,我不想删除密钥。本身
我的键是长度在3-50个字符之间的唯一字符串。我正在使用的字典有十万种以上的项目,进行了数十亿次的比较。由于这是一个O(n ^ 2)问题,我应该停止尝试优化这个代码?还是有空间在这里取得进展?
字典是最好的,但我对其他类型是开放的。
例如:'hello'包含'他'和'ell'。我想删除键'他'和'呃',同时保持'你好'。我想在其他键的中间删除前缀,后缀和键 - 子串。
密钥一个接一个生成并添加到字典中。然后 reduce_dict(dictionary)
运行。我的假设是:一个测试,当他们被添加到一个字典将是一样的功能测试之后,如下面的代码。
def reduce_dict(dictionary):
reduced = dictionary.copy()
用于字典中的键:
在字典中的key2:
如果key!= key2:
如果key2在键中:
reduced.pop(key2,0)
返回减少
我认为您可以通过稍微优化的方式创建一个好键列表(=不是他人的子串):
#keys = yourDict.keys(),例如
keys = ['low','el','helloworld','something','ellow','thing','blah','thingy']
#flt按键长度排序的[[key,is_substring],...]
flt = [[x,0] for x in sorted(keys,key = len,reverse = True)]
for i in range(len(flt)):
p = flt [i]
如果p [1]:#已删除
继续
为范围内的j + 1,len(flt)):#迭代更短的字符串
q = flt [j]
如果不是q [1]和p [0]中的q [0]:#如果尚未删除,是子串
q [1] = 1#remove
goodkeys = set(x [0] for x in flt if not x [1])$ b $ b print goodkeys#eg [' helloworld','something','thingy','blah']
现在删除是微不足道的:
newdict = {k:olddict [k] for k in goodkeys}
/ pre>
I'm learning Python. I've got a performance issue. For a single dictionary, I want to remove keys if
- The a key is a substring in another key
I don't want to remove keys if
- The key substring is itself
My keys are unique strings mostly between 3-50 characters in length. The dictionary I'm working with has 100,000 or more items, making billions of comparisons. Since this is an O(n^2) problem, should I stop trying to optimize this code? Or is there room to make headway here?
A dictionary is preferable, but I'm open to other types.
For example: 'hello' contains 'he' and 'ell'. I want to remove the keys 'he' and 'ell' while keeping 'hello'. I'd like to remove prefixes, suffixes, and key-substrings in the middle of other keys.
The keys are generated one-by-one and added to a dictionary. Then reduce_dict(dictionary)
runs. My assumption is: a test while they're added to a dictionary would be as slow as a function testing after, as in the code below.
def reduce_dict(dictionary):
reduced = dictionary.copy()
for key in dictionary:
for key2 in dictionary:
if key != key2:
if key2 in key:
reduced.pop(key2, 0)
return reduced
I think you can create a list of "good" keys (=those that are not substrings of others) in a slightly optimized way:
# keys = yourDict.keys(), e.g.
keys = ['low', 'el', 'helloworld', 'something', 'ellow', 'thing', 'blah', 'thingy']
# flt is [[key, is_substring],...] sorted by key length reversed
flt = [[x, 0] for x in sorted(keys, key=len, reverse=True)]
for i in range(len(flt)):
p = flt[i]
if p[1]: # already removed
continue
for j in range(i + 1, len(flt)): # iterate over shorter strings
q = flt[j]
if not q[1] and q[0] in p[0]: # if not already removed and is substring
q[1] = 1 # remove
goodkeys = set(x[0] for x in flt if not x[1])
print goodkeys # e.g ['helloworld', 'something', 'thingy', 'blah']
Now the removal is trivial:
newdict = {k:olddict[k] for k in goodkeys}
这篇关于如果它是任何其他键中的子字符串,请删除字典键的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!