python最快捷的方法来合并基于键匹配的字典 [英] python quickest way to merge dictionaries based on key match
问题描述
在列表A中:
a ['things'] = []
for b在listB中:
如果['ID'] == b ['ID']:
a ['things']。append(b)
这是一种可能有帮助的方法。我会留给你填写细节。
你的代码很慢,因为它是一个O(n ^ 2)算法,比较每个A与每个B
如果您首先使用id(这些是O(nlogn))操作对listA和listB进行排序,那么可以通过A和B的排序版本轻松地进行迭代(这将是线性时间)。
当您在非常大的数据集上进行外部合并时,这种方法很常见。 Mihai的答案是更好的内部合并,您只需通过id(内存)中的所有内容进行索引。如果你有内存来容纳这些额外的结构,并且字典查找是恒定的时间,这种方法可能会更快,更不用说更简单了。 :)
作为例子,我们假设A在排序后有以下id = em
acfgjp
,B有这些ID,
aaaabbbbcccddeeeefffggiikknnnnppppqqrrr
这个想法奇怪的是,将索引保持为A和B(我知道这听起来不是很好的Pythonic)。起初你正在查看A中的 a
和B.在B中的 a
。所以你通过B将所有的a添加到您的东西数组为 a
。一旦你在B中用完了a,你可以将A中的一个移动到 c
。但是B中的下一个项目是 b
,它小于 c
,所以你必须跳过b。然后,您到达B中的 c
,所以您可以开始向c添加事物。以这种方式继续,直到两个列表都用尽。只有一个传球。 :)
I have 2 lists of dictionaries. List A is 34,000 long, list B is 650,000 long. I am essentially inserting all the List B dicts into the List A dicts based on a key match. Currently, I am doing the obvious, but its taking forever (seriously, like a day). There must be a quicker way!
for a in listA:
a['things'] = []
for b in listB:
if a['ID'] == b['ID']:
a['things'].append(b)
Here's an approach that may help. I'll leave it to you to fill in the details.
Your code is slow because it is a O(n^2) algorithm, comparing every A against every B.
If you sort each of listA and listB by id first (these are O(nlogn)) operations, then you can iterate easily through the sorted versions of A and B (this will be in linear time).
This approach is common when you have to do external merges on very large data sets. Mihai's answer is better for internal merging, where you simply index everything by id (in memory). If you have the memory to hold these additional structures, and dictionary lookup is constant time, that approach will likely be faster, not to mention simpler. :)
By way of example let's say A had the following ids after sorting
acfgjp
and B had these ids, again after sorting
aaaabbbbcccddeeeefffggiikknnnnppppqqqrrr
The idea is, strangely enough, to keep indexes into A and B (I know that does not sound very Pythonic). At first you are looking at a
in A and a
in B. So you walk through B adding all the a's to your "things" array for a
. Once you exhaust the a's in B, you move up one in A, to c
. But the next item in B is b
, which is less than c
, so you have to skip the b's. Then you arrive at a c
in B, so you can start adding into "things" for c. Continue in this fashion until both lists are exhausted. Just one pass. :)
这篇关于python最快捷的方法来合并基于键匹配的字典的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!