pandas 在数据框中组合稀疏列 [英] Pandas combining sparse columns in dataframe
本文介绍了 pandas 在数据框中组合稀疏列的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!
问题描述
我正在使用Python和Pandas进行数据分析。我在不同的列中稀疏分布了数据,例如以下
I am using Python, Pandas for data analysis. I have sparsely distributed data in different columns like following
| id | col1a | col1b | col2a | col2b | col3a | col3b |
|----|-------|-------|-------|-------|-------|-------|
| 1 | 11 | 12 | NaN | NaN | NaN | NaN |
| 2 | NaN | NaN | 21 | 86 | NaN | NaN |
| 3 | 22 | 87 | NaN | NaN | NaN | NaN |
| 4 | NaN | NaN | NaN | NaN | 545 | 32 |
我想将此稀疏分布的数据在不同的列中合并为紧紧填充的列,如下所示。
I want to combine this sparsely distributed data in different columns to tightly packed column like following.
| id | group | cola | colb |
|----|-------|-------|-------|
| 1 | g1 | 11 | 12 |
| 2 | g2 | 21 | 86 |
| 3 | g1 | 22 | 87 |
| 4 | g3 | 545 | 32 |
我尝试了以下操作,但无法正确完成
What I have tried is doing following, but not able to do it properly
df['cola']=np.nan
df['colb']=np.nan
df['cola'].fillna(df.col1a,inplace=True)
df['colb'].fillna(df.col1b,inplace=True)
df['cola'].fillna(df.col2a,inplace=True)
df['colb'].fillna(df.col2b,inplace=True)
df['cola'].fillna(df.col3a,inplace=True)
df['colb'].fillna(df.col3b,inplace=True)
但是我认为必须有更简洁有效的方法。
But I think there must be more concise and efficient way way of doing this. How to do this in better way?
推荐答案
您可以使用 df.stack()
假定'id'
是您的索引,否则将'id'
设置为索引。然后使用 pd.pivot_table
。
You can use df.stack()
assuming 'id'
is your index else set 'id'
as index. Then use pd.pivot_table
.
df = df.stack().reset_index(name='val',level=1)
df['group'] = 'g'+ df['level_1'].str.extract('col(\d+)')
df['level_1'] = df['level_1'].str.replace('col(\d+)','')
df.pivot_table(index=['id','group'],columns='level_1',values='val')
level_1 cola colb
id group
1 g1 11.0 12.0
2 g2 21.0 86.0
3 g1 22.0 87.0
4 g3 545.0 32.0
这篇关于 pandas 在数据框中组合稀疏列的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!
查看全文