python最快捷的方法来合并基于键匹配的字典 [英] python quickest way to merge dictionaries based on key match

查看:296
本文介绍了python最快捷的方法来合并基于键匹配的字典的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有2个字典列表。名单A是34000长,名单B是65万长。我基本上将所有列表B的列表插入到列表A中,基于一个关键的匹配。目前,我正在做的很明显,但是它永远是(一个严重的,像一天)。必须有一个更快的方式!

 在列表A中:
a ['things'] = []
for b在listB中:
如果['ID'] == b ['ID']:
a ['things']。append(b)


解决方案

这是一种可能有帮助的方法。我会留给你填写细节。



你的代码很慢,因为它是一个O(n ^ 2)算法,比较每个A与每个B



如果您首先使用id(这些是O(nlogn))操作对listA和listB进行排序,那么可以通过A和B的排序版本轻松地进行迭代(这将是线性时间)。



当您在非常大的数据集上进行外部合并时,这种方法很常见。 Mihai的答案是更好的内部合并,您只需通过id(内存)中的所有内容进行索引。如果你有内存来容纳这些额外的结构,并且字典查找是恒定的时间,这种方法可能会更快,更不用说更简单了。 :)



作为例子,我们假设A在排序后有以下id = em

  acfgjp 

,B有这些ID,

  aaaabbbbcccddeeeefffggiikknnnnppppqqrrr 

这个想法奇怪的是,将索引保持为A和B(我知道这听起来不是很好的Pythonic)。起初你正在查看A中的 a 和B.在B中的 a 。所以你通过B将所有的a添加到您的东西数组为 a 。一旦你在B中用完了a,你可以将A中的一个移动到 c 。但是B中的下一个项目是 b ,它小于 c ,所以你必须跳过b。然后,您到达B中的 c ,所以您可以开始向c添加事物。以这种方式继续,直到两个列表都用尽。只有一个传球。 :)


I have 2 lists of dictionaries. List A is 34,000 long, list B is 650,000 long. I am essentially inserting all the List B dicts into the List A dicts based on a key match. Currently, I am doing the obvious, but its taking forever (seriously, like a day). There must be a quicker way!

for a in listA:
    a['things'] = []
    for b in listB:
        if a['ID'] == b['ID']:
            a['things'].append(b)

解决方案

Here's an approach that may help. I'll leave it to you to fill in the details.

Your code is slow because it is a O(n^2) algorithm, comparing every A against every B.

If you sort each of listA and listB by id first (these are O(nlogn)) operations, then you can iterate easily through the sorted versions of A and B (this will be in linear time).

This approach is common when you have to do external merges on very large data sets. Mihai's answer is better for internal merging, where you simply index everything by id (in memory). If you have the memory to hold these additional structures, and dictionary lookup is constant time, that approach will likely be faster, not to mention simpler. :)

By way of example let's say A had the following ids after sorting

acfgjp

and B had these ids, again after sorting

aaaabbbbcccddeeeefffggiikknnnnppppqqqrrr

The idea is, strangely enough, to keep indexes into A and B (I know that does not sound very Pythonic). At first you are looking at a in A and a in B. So you walk through B adding all the a's to your "things" array for a. Once you exhaust the a's in B, you move up one in A, to c. But the next item in B is b, which is less than c, so you have to skip the b's. Then you arrive at a c in B, so you can start adding into "things" for c. Continue in this fashion until both lists are exhausted. Just one pass. :)

这篇关于python最快捷的方法来合并基于键匹配的字典的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆