Python:用相同的键和大小连接许多numpy数组的字典 [英] Python: Concatenate many dicts of numpy arrays with same keys and size

查看:72
本文介绍了Python:用相同的键和大小连接许多numpy数组的字典的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个在循环内调用的函数,该函数返回包含大约50个变量的dict(dsst_mean).所有变量都是长度为10的numpy数组.

I have a function called within a loop that returns a dict (dsst_mean) with roughly 50 variables. All variables are numpy arrays of length 10.

该循环大约重复3000次.我当前正在串联每个循环的末尾,以便获得一个"dsst_mean_all"字典,该字典在每次迭代中都会变得更大.

The loop iterates roughly 3000 times. I'm current concatenating towards the end of each loop so that I have an 'dsst_mean_all' dict that grows larger on each iteration.

source = [dsst_mean_all, dsst_mean]                
for key in source[0]:                    
    dsst_mean_all[key] = np.concatenate([d[key] for d in source])

它可以工作,但是我知道这没有效率.我也有初始化'dsst_mean_all'字典的问题. (我目前正在使用dict.fromkeys()来做到这一点.)

It works, but I know this isn't efficient. I also have problems with initialization of the 'dsst_mean_all' dict. (I'm current using dict.fromkeys() to do this.)

我的问题是:有什么选择可以更有效地做到这一点?我想我可以将dsst_mean字典存储在列表中,然后在最后进行连接.但是我不确定在内存中保存3000+ dict的numpy数组是否是个好主意.我知道这取决于大小,但是不幸的是,现在我没有每个'dsst_mean'dict大小的估计.

My question is: what are some options to do this more efficiently? I'm thinking I could store the dsst_mean dicts in a list and then do one concatenate at the end. But I'm not sure if holding 3000+ dicts of numpy arrays in memory is a good idea. I know this depends on the size, but unfortunately right now I dont have an estimate of the size of each 'dsst_mean' dict.

谢谢.

推荐答案

通常,我们建议收集列表中的值,并在最后创建一个数组.这里的新事物是,我们需要迭代字典的键来进行此收集.

Normally we recommend collecting values in a list, and making an array once, at the end. The new thing here is we need to iterate on the keys of a dictionary to do this collection.

例如:

制作单个词典的功能:

In [804]: def foo(i):
     ...:     return {k:np.arange(5) for k in ['A','B','C']}
     ...: 
In [805]: foo(0)
Out[805]: 
{'A': array([0, 1, 2, 3, 4]),
 'B': array([0, 1, 2, 3, 4]),
 'C': array([0, 1, 2, 3, 4])}

收藏家字典:

In [806]: dd = {k:[] for k in ['A','B','C']}

迭代,在列表中收集数组:

Iteration, collecting arrays in the lists:

In [807]: for _ in range(3):
     ...:     x = foo(None)
     ...:     for k,v in dd.items():
     ...:         v.append(x[k])
     ...:         
In [808]: dd
Out[808]: 
{'A': [array([0, 1, 2, 3, 4]), array([0, 1, 2, 3, 4]), array([0, 1, 2, 3, 4])],
 'B': [array([0, 1, 2, 3, 4]), array([0, 1, 2, 3, 4]), array([0, 1, 2, 3, 4])],
 'C': [array([0, 1, 2, 3, 4]), array([0, 1, 2, 3, 4]), array([0, 1, 2, 3, 4])]}

在字典上进行另一次迭代以将列表转换为某种数组(stackconcatenate,由您选择):

Another iteration on the dictionary to turn the lists into some sort of array (stack, concatenate, your choice):

In [810]: for k,v in dd.items():
     ...:     dd[k]=np.stack(v,axis=0)
     ...:     
In [811]: dd
Out[811]: 
{'A': array([[0, 1, 2, 3, 4],
        [0, 1, 2, 3, 4],
        [0, 1, 2, 3, 4]]), 'B': array([[0, 1, 2, 3, 4],
        [0, 1, 2, 3, 4],
        [0, 1, 2, 3, 4]]), 'C': array([[0, 1, 2, 3, 4],
        [0, 1, 2, 3, 4],
        [0, 1, 2, 3, 4]])}

3000个长度为10的数组的列表将比一个30,000个数字的数组占用更多的内存,但不会大大增加.

A list of 3000 arrays of length 10 will take up somewhat more memory than one array of 30,000 numbers, but not drastically more.

您可以在第一时间将整个词典收集在一个列表中,但是您仍然需要像这样将它们合并到字典中.

You could collect the whole dictionaries in one list the first time around, but you still need to combine those into on dictionary like this.

这篇关于Python:用相同的键和大小连接许多numpy数组的字典的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆