根据 pandas 中一列的值从行创建列 [英] Create columns from rows based on value of one column in pandas

查看:90
本文介绍了根据 pandas 中一列的值从行创建列的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个看起来像这样的数据框:

I have a dataframe looks like this:

PERIOD_START_TIME       ID    temp_ID  value1  value2
06.28.2017 22:00:00     88      1        4       2
06.28.2017 22:00:00     88      2        0       7
06.28.2017 22:00:00     89      2        0       9
06.28.2017 22:00:00     89      1        5       4
06.28.2017 22:00:00     90      1        12      13
06.28.2017 22:00:00     90      2        18      4

现在我需要除去一半的行,但要多获得两倍的列.实际上,将column双击,然后将temp_ID分配给column name.简而言之,temp_id从行转换为列.

Now I need to get rid of half of the rows, but get twice more columns. Actually, double column and assign temp_ID to the name of columns. Simply, temp_id transform from rows to columns.

所需的输出

PERIOD_START_TIME    ID  value1_tpID1 vauel1_tpID2  vauel2_tpID1 value2_tpID2
06.28.2017 22:00:00  88          4       0            2            7
06.28.2017 22:00:00  89          5       0            4            9
06.28.2017 22:00:00  90          12      18           13           4

<class 'pandas.core.frame.DataFrame'>
Int64Index: 189604 entries, 0 to 10595
Data columns (total 12 columns):
PERIOD_START_TIME         189604 non-null object
ID                       189604 non-null int64
temp_ID                  189604 non-null int64
dtypes: float64(4), int64(6), object(2)
memory usage: 18.8+ MB

推荐答案

您可以使用 unstack :

You can use set_index with unstack:

#if necessary convert to str
df['temp_ID'] = df['temp_ID'].astype(str)
df = df.set_index(['PERIOD_START_TIME','ID','temp_ID']).unstack()
df.columns = df.columns.map('_'.join)
df = df.reset_index()
print (df)
     PERIOD_START_TIME  ID  value1_1  value1_2  value2_1  value2_2
0  06.28.2017 22:00:00  88         4         0         2         7
1  06.28.2017 22:00:00  89         5         0         4         9
2  06.28.2017 22:00:00  90        12        18        13         4

或者:

df = df.set_index(['PERIOD_START_TIME','ID','temp_ID']).unstack()
df.columns = ['_'.join((x[0], str(x[1]))) for x in df.columns]
df = df.reset_index()
print (df)
     PERIOD_START_TIME  ID  value1_1  value1_2  value2_1  value2_2
0  06.28.2017 22:00:00  88         4         0         2         7
1  06.28.2017 22:00:00  89         5         0         4         9
2  06.28.2017 22:00:00  90        12        18        13         4

如果重复三元组​​PERIOD_START_TIMEIDtemp_ID,则需要 pivot_table ,带有一些聚合函数,例如meansum ...:

If duplicates in triples PERIOD_START_TIME, ID, temp_ID then need pivot_table with some aggregate function like mean, sum...:

print (df)
     PERIOD_START_TIME  ID  temp_ID  value1  value2
0  06.28.2017 22:00:00  88        1       4       2 < same PERIOD_START_TIME  ID  temp_ID
1  06.28.2017 22:00:00  88        1       5       3 < same PERIOD_START_TIME  ID  temp_ID
2  06.28.2017 22:00:00  88        2       0       7
3  06.28.2017 22:00:00  89        2       0       9
4  06.28.2017 22:00:00  89        1       5       4
5  06.28.2017 22:00:00  90        1      12      13
6  06.28.2017 22:00:00  90        2      18       4

df = df.pivot_table(index=['PERIOD_START_TIME','ID'], 
                    columns='temp_ID', 
                    values=['value1','value2'],
                    aggfunc='mean')
df.columns = ['_'.join((x[0], str(x[1]))) for x in df.columns]
df = df.reset_index()
print (df)
     PERIOD_START_TIME  ID  value1_1  value1_2  value2_1  value2_2
0  06.28.2017 22:00:00  88       4.5       0.0       2.5       7.0
1  06.28.2017 22:00:00  89       5.0       0.0       4.0       9.0
2  06.28.2017 22:00:00  90      12.0      18.0      13.0       4.0

替代解决方案:

df = df.groupby(['PERIOD_START_TIME','ID','temp_ID']).mean().unstack()
df.columns = ['_'.join((x[0], str(x[1]))) for x in df.columns]
df = df.reset_index()
print (df)
     PERIOD_START_TIME  ID  value1_1  value1_2  value2_1  value2_2
0  06.28.2017 22:00:00  88       4.5       0.0       2.5       7.0
1  06.28.2017 22:00:00  89       5.0       0.0       4.0       9.0
2  06.28.2017 22:00:00  90      12.0      18.0      13.0       4.0

这篇关于根据 pandas 中一列的值从行创建列的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆