如何合并在一个列表中类似的物品 [英] How to merge similar items in a list
问题描述
我还没有发现对谷歌的任何相关性,所以我希望能在这里找到一些帮助:)
I haven't found anything relevant on Google, so I'm hoping to find some help here :)
我有一个Python列表如下:
I've got a Python list as follows:
[['hoose', 200], ["Bananphone", 10], ['House', 200], ["Bonerphone", 10], ['UniqueValue', 777] ...]
我有一个函数,返回2字符串之间的Levenshtein距离,为众议院和hoose它将返回2,等等。
I have a function that returns the Levenshtein distance between 2 strings, for House and hoose it would return 2, etc.
现在我想合并有铁的莱文斯坦得分列表元素小于5,同时增加他们的成绩,所以对于结果列表我想下面的(!):
Now I want to merge list elements that have a levenshtein score of f.e. <5, while (!) adding their scores, so for the resulting list I want the following:
[['hoose', 400], ["Bananaphone", 20], ['UniqueValue', 777], ...]
或
[['House', 400], ["Bonerphone", 20], ['UniqueValue', 777], ...]
等。
没关系,只要他们的价值得到补充。
It doesn't matter as long as their values get added.
就只有永远2项列表中的非常相似,所以类似于很多其他任何一个项目的吃起来都一个连锁效应时未预料。
There will only ever be 2 items in the list that are very similar, so a chain effect of any one item similar to a lot of others eating them all up isn't expected.
推荐答案
在共同与其他意见,我不知道,这样做使多大意义,但这里有一个解决方案,你想要做什么,我想。这是非常低效的 - 为O(n 2 )其中n为您的列表中的单词数 - 但我不知道有这样做的更好的方式:
In common with the other comments, I'm not sure that doing this makes much sense, but here's a solution that does what you want, I think. It's very inefficient - O(n2) where n is the number of words in your list - but I'm not sure there's a better way of doing it:
data = [['hoose', 200],
["Bananphone", 10],
['House', 200],
["Bonerphone", 10],
['UniqueValue', 777]]
already_merged = []
for word, score in data:
added_to_existing = False
for merged in already_merged:
for potentially_similar in merged[0]:
if levenshtein(word, potentially_similar) < 5:
merged[0].add(word)
merged[1] += score
added_to_existing = True
break
if added_to_existing:
break
if not added_to_existing:
already_merged.append([set([word]),score])
print already_merged
的输出是:
[[set(['House', 'hoose']), 400], [set(['Bonerphone', 'Bananphone']), 20], [set(['UniqueValue']), 777]]
一个这种方法的明显的问题是,你正在考虑可能是足够接近的许多套不同的话,你已经考虑了字,但这code只是忍受它进入第一个发现。我已经投+1 Space_C0wb0y的答案;)
这篇关于如何合并在一个列表中类似的物品的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!