Python Pandas:使用列中的数组进行展平 [英] Python pandas: flatten with arrays in column
本文介绍了Python Pandas:使用列中的数组进行展平的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!
问题描述
我有一个熊猫数据框,其中的一列包含数组.我想通过对数组的每个元素重复其他列的值来展平"它.
I have a pandas Data Frame having one column containing arrays. I'd like to "flatten" it by repeating the values of the other columns for each element of the arrays.
我通过遍历每一行来构建临时值列表来成功实现此目标,但是它使用的是纯python",而且速度很慢.
I succeed to make it by building a temporary list of values by iterating over every row, but it's using "pure python" and is slow.
有没有办法在pandas/numpy中做到这一点?换句话说,我尝试在下面的示例中改进flatten功能.
Is there a way to do this in pandas/numpy? In other words, I try to improve the flatten function in the example below.
非常感谢.
toConvert = pd.DataFrame({
'x': [1, 2],
'y': [10, 20],
'z': [(101, 102, 103), (201, 202)]
})
def flatten(df):
tmp = []
def backend(r):
x = r['x']
y = r['y']
zz = r['z']
for z in zz:
tmp.append({'x': x, 'y': y, 'z': z})
df.apply(backend, axis=1)
return pd.DataFrame(tmp)
print(flatten(toConvert).to_string(index=False))
哪个给:
x y z
1 10 101
1 10 102
1 10 103
2 20 201
2 20 202
推荐答案
这是基于NumPy的解决方案-
Here's a NumPy based solution -
np.column_stack((toConvert[['x','y']].values.\
repeat(map(len,toConvert.z),axis=0),np.hstack(toConvert.z)))
样品运行-
In [78]: toConvert
Out[78]:
x y z
0 1 10 (101, 102, 103)
1 2 20 (201, 202)
In [79]: np.column_stack((toConvert[['x','y']].values.\
...: repeat(map(len,toConvert.z),axis=0),np.hstack(toConvert.z)))
Out[79]:
array([[ 1, 10, 101],
[ 1, 10, 102],
[ 1, 10, 103],
[ 2, 20, 201],
[ 2, 20, 202]])
这篇关于Python Pandas:使用列中的数组进行展平的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!
查看全文