连接不同长度的numpy数组的字典(如果可能,避免手动循环) [英] Concatenating dictionaries of numpy arrays of different lengths (avoiding manual loops if possible)
问题描述
?我有一个类似于这里讨论的问题
连接numpy数组的字典(避免手动循环,如果可能)
I have a question similar to the one discussed here Concatenating dictionaries of numpy arrays (avoiding manual loops if possible)
我正在寻找一种连接值的方式两个python字典包含任意大小的numpy数组,同时避免手动循环使用字典键。例如:
I am looking for a way to concatenate the values in two python dictionaries that contain numpy arrays of arbitrary size whilst avoiding having to manually loop over the dictionary keys. For example:
import numpy as np
# Create first dictionary
n1 = 3
s = np.random.randint(1,101,n1)
n2 = 2
r = np.random.rand(n2)
d = {"r":r,"s":s}
print "d = ",d
# Create second dictionary
n3 = 1
s = np.random.randint(1,101,n3)
n4 = 3
r = np.random.rand(n4)
d2 = {"r":r,"s":s}
print "d2 = ",d2
# Some operation to combine the two dictionaries...
d = SomeOperation(d,d2)
# Updated dictionary
print "d3 = ",d
给出输出
>> d = {'s': array([75, 25, 88]), 'r': array([ 0.1021227 , 0.99454874])}
>> d2 = {'s': array([78]), 'r': array([ 0.27610587, 0.57037473, 0.59876391])}
>> d3 = {'s': array([75, 25, 88, 78]), 'r': array([ 0.1021227 , 0.99454874, 0.27610587, 0.57037473, 0.59876391])}
ie所以如果密钥已经存在,则存放在该密钥下的numpy数组被附加到。
i.e. so that if the key already exists, the numpy array stored under that key is appended to.
前面讨论中使用包大熊猫提出的解决方案不起作用,因为它需要具有相同长度的数组(n1 = n2和n3 = n4)。
The solution proposed in the previous discussion using the package pandas does not work as it requires arrays having the same length (n1=n2 and n3=n4).
有没有人知道最好的方式来做到这一点,同时最大限度地减少使用慢,手动为
循环? (我想避免循环,因为我想要组合的字典可以有数百个键)。
Does anybody know the best way to do this, whilst minimising the use of slow, manual for
loops? (I would like to avoid loops because the dictionaries I would like to combine could have hundreds of keys).
感谢(也是为了Aim来制定一个非常明确的问题)!
Thanks (also to "Aim" for formulating a very clear question)!
推荐答案
一种方法是使用系列字典(即值为串行而不是数组):
One way is to go is use a dictionary of Series (i.e. the values are Series rather than arrays):
In [11]: d2
Out[11]: {'r': array([ 0.3536318 , 0.29363604, 0.91307454]), 's': array([46])}
In [12]: d2 = {name: pd.Series(arr) for name, arr in d2.iteritems()}
In [13]: d2
Out[13]:
{'r': 0 0.353632
1 0.293636
2 0.913075
dtype: float64,
's': 0 46
dtype: int64}
您可以将其传递到DataFrame构造函数中:
That way you can pass it into the DataFrame constructor:
In [14]: pd.DataFrame(d2)
Out[14]:
r s
0 0.353632 46
1 0.293636 NaN
2 0.913075 NaN
这篇关于连接不同长度的numpy数组的字典(如果可能,避免手动循环)的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!