根据每个子列表中的第三个项目删除列表列表中的重复项 [英] Remove duplicates in a list of lists based on the third item in each sublist
问题描述
c = [['470','4189.0',' asdfgw','fds'],
['470','4189.0','qwer','fds'],
['470','4189.0','qwer','dsfs fdv ']
...]
c
有大约30,000个室内列表。我想做的是根据每个内部清单上的第四个项目来消除重复。所以上面列表的列表如下所示:
c = [['470','4189.0','asdfgw' 'fds'],['470','4189.0','qwer','dsfs fdv'] ...]
这是我到目前为止所有的:
d = [] #list将包含浓缩c
d.append(c [0])#append第一个元素,所以我可以比较列表
为bact在c:#c是我的列表的列表与30,000内部列表
的项目在d :
如果bact [3]!= items [3]:
d.append(bact)
我认为这应该工作,但它只是运行和运行。我让它运行30分钟,然后杀死它。我不认为这个程序应该花费这么长时间,所以我猜我的逻辑有问题。
我有一种感觉,创建一个全新的列表的列表是非常愚蠢的。任何帮助将不胜感激,并且随着我正在学习,请随时nit。。另外请纠正我的词汇,如果它不正确。
我会这样做:
seen = set()
cond = [x for c in c if x [3] not in seen and not seen.add(x [ 3])]
说明:
看到
是一个跟踪已经遇到的每个子列表的第四个元素的集合。
cond
是精简列表。如果 x [3]
(其中 x
是 c $ c中的子列表code>)不在
中看到
, x
将被添加到 cond
和 x [3]
将被添加到看到
。
seen.add(x [3])
将返回无
,所以 not seen.add(x [3])
将永远是 True
,但那部分将只有当$ x [3]看不到
是 True
,因为Python使用短路评估。如果第二个条件得到评估,它将始终返回 True
,并具有添加 x [3]
的副作用看到
。这是另一个例子,发生了什么(打印
返回无
并具有打印东西的副作用):
>>> False而不打印('hi')
False
>>>真的不打印('嗨')
hi
True
I have a list of lists that looks like:
c = [['470', '4189.0', 'asdfgw', 'fds'],
['470', '4189.0', 'qwer', 'fds'],
['470', '4189.0', 'qwer', 'dsfs fdv']
...]
c
has about 30,000 interior lists. What I'd like to do is eliminate duplicates based on the 4th item on each interior list. So the list of lists above would look like:
c = [['470', '4189.0', 'asdfgw', 'fds'],['470', '4189.0', 'qwer', 'dsfs fdv'] ...]
Here is what I have so far:
d = [] #list that will contain condensed c
d.append(c[0]) #append first element, so I can compare lists
for bact in c: #c is my list of lists with 30,000 interior list
for items in d:
if bact[3] != items[3]:
d.append(bact)
I think this should work, but it just runs and runs. I let it run for 30 minutes, then killed it. I don't think the program should take so long, so I'm guessing there is something wrong with my logic.
I have a feeling that creating a whole new list of lists is pretty stupid. Any help would be much appreciated, and please feel free to nitpick as I am learning. Also please correct my vocabulary if it is incorrect.
I'd do it like this:
seen = set()
cond = [x for x in c if x[3] not in seen and not seen.add(x[3])]
Explanation:
seen
is a set which keeps track of already encountered fourth elements of each sublist.
cond
is the condensed list. In case x[3]
(where x
is a sublist in c
) is not in seen
, x
will be added to cond
and x[3]
will be added to seen
.
seen.add(x[3])
will return None
, so not seen.add(x[3])
will always be True
, but that part will only be evaluated if x[3] not in seen
is True
since Python uses short circuit evaluation. If the second condition gets evaluated, it will always return True
and have the side effect of adding x[3]
to seen
. Here's another example of what's happening (print
returns None
and has the "side-effect" of printing something):
>>> False and not print('hi')
False
>>> True and not print('hi')
hi
True
这篇关于根据每个子列表中的第三个项目删除列表列表中的重复项的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!