如何在 pandas 数据框中将单元格的值拆分为多行? [英] How to split values of a cell in multiple rows in pandas data frame?
本文介绍了如何在 pandas 数据框中将单元格的值拆分为多行?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!
问题描述
我有一个下面的数据框,它是使用代码获得的:
I have a following data frame, which was obtained using the code:
df1=df.groupby('id')['x,y'].apply(lambda x: rdp(x.tolist(), 5.0)).reset_index()
请在此处
获得的结果数据帧:
id x,y
0 1 [(0, 0), (1, 2)]
1 2 [(1, 3), (1, 2)]
2 3 [(2, 5), (4, 6)]
是否有可能得到这样的东西:
Is it possible to get something like this:
id x,y
0 1 (0, 0)
1 1 (1, 2)
2 2 (1, 3)
3 2 (1, 2)
4 3 (2, 5)
5 3 (4, 6)
在这里,作为上一个df结果获得的坐标列表将根据其各自的ID分成新的行.
Here, the list of coordinates obtained as a result in previous df is split into new rows against their respective ids.
推荐答案
You can use DataFrame
constructor with stack
:
df2 = pd.DataFrame(df1['x,y'].values.tolist(), index=df1['id'])
.stack()
.reset_index(level=1, drop=True)
.reset_index(name='x,y')
print (df2)
id x,y
0 1 (0, 0)
1 1 (1, 2)
2 2 (1, 3)
3 2 (1, 2)
4 3 (2, 5)
5 3 (4, 6)
numpy
解决方案使用 numpy.repeat
通过lengths
通过 x,y
列由 numpy.ndarray.sum
:
numpy
solution use numpy.repeat
by lengths
of values by str.len
, x,y
column is flattenig by numpy.ndarray.sum
:
df2 = pd.DataFrame({'id': np.repeat(df1['id'].values, df1['x,y'].str.len()),
'x,y': df1['x,y'].values.sum()})
print (df2)
id x,y
0 1 (0, 0)
0 1 (1, 2)
1 2 (1, 3)
1 2 (1, 2)
2 3 (2, 5)
2 3 (1, 9)
2 3 (4, 6)
时间:
In [54]: %timeit pd.DataFrame(df1['x,y'].values.tolist(), index=df1['id']).stack().reset_index(level=1, drop=True).reset_index(name='x,y')
1000 loops, best of 3: 1.49 ms per loop
In [55]: %timeit pd.DataFrame({'id': np.repeat(df1['id'].values, df1['x,y'].str.len()), 'x,y': df1['x,y'].values.sum()})
1000 loops, best of 3: 562 µs per loop
#piRSquared solution
In [56]: %timeit pd.DataFrame({'id': df1['id'].repeat(df1['x,y'].str.len()), 'x,y': df1['x,y'].sum() })
1000 loops, best of 3: 712 µs per loop
这篇关于如何在 pandas 数据框中将单元格的值拆分为多行?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!
查看全文