根据 pandas 中一列的值从行创建列 [英] Create columns from rows based on value of one column in pandas
本文介绍了根据 pandas 中一列的值从行创建列的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!
问题描述
我有一个看起来像这样的数据框:
I have a dataframe looks like this:
PERIOD_START_TIME ID temp_ID value1 value2
06.28.2017 22:00:00 88 1 4 2
06.28.2017 22:00:00 88 2 0 7
06.28.2017 22:00:00 89 2 0 9
06.28.2017 22:00:00 89 1 5 4
06.28.2017 22:00:00 90 1 12 13
06.28.2017 22:00:00 90 2 18 4
现在我需要除去一半的行,但要多获得两倍的列.实际上,将column双击,然后将temp_ID分配给column name.简而言之,temp_id从行转换为列.
Now I need to get rid of half of the rows, but get twice more columns. Actually, double column and assign temp_ID to the name of columns. Simply, temp_id transform from rows to columns.
所需的输出
PERIOD_START_TIME ID value1_tpID1 vauel1_tpID2 vauel2_tpID1 value2_tpID2
06.28.2017 22:00:00 88 4 0 2 7
06.28.2017 22:00:00 89 5 0 4 9
06.28.2017 22:00:00 90 12 18 13 4
<class 'pandas.core.frame.DataFrame'>
Int64Index: 189604 entries, 0 to 10595
Data columns (total 12 columns):
PERIOD_START_TIME 189604 non-null object
ID 189604 non-null int64
temp_ID 189604 non-null int64
dtypes: float64(4), int64(6), object(2)
memory usage: 18.8+ MB
推荐答案
您可以使用 unstack
:
You can use set_index
with unstack
:
#if necessary convert to str
df['temp_ID'] = df['temp_ID'].astype(str)
df = df.set_index(['PERIOD_START_TIME','ID','temp_ID']).unstack()
df.columns = df.columns.map('_'.join)
df = df.reset_index()
print (df)
PERIOD_START_TIME ID value1_1 value1_2 value2_1 value2_2
0 06.28.2017 22:00:00 88 4 0 2 7
1 06.28.2017 22:00:00 89 5 0 4 9
2 06.28.2017 22:00:00 90 12 18 13 4
或者:
df = df.set_index(['PERIOD_START_TIME','ID','temp_ID']).unstack()
df.columns = ['_'.join((x[0], str(x[1]))) for x in df.columns]
df = df.reset_index()
print (df)
PERIOD_START_TIME ID value1_1 value1_2 value2_1 value2_2
0 06.28.2017 22:00:00 88 4 0 2 7
1 06.28.2017 22:00:00 89 5 0 4 9
2 06.28.2017 22:00:00 90 12 18 13 4
如果重复三元组PERIOD_START_TIME
,ID
,temp_ID
,则需要 pivot_table
,带有一些聚合函数,例如mean
,sum
...:
If duplicates in triples PERIOD_START_TIME
, ID
, temp_ID
then need pivot_table
with some aggregate function like mean
, sum
...:
print (df)
PERIOD_START_TIME ID temp_ID value1 value2
0 06.28.2017 22:00:00 88 1 4 2 < same PERIOD_START_TIME ID temp_ID
1 06.28.2017 22:00:00 88 1 5 3 < same PERIOD_START_TIME ID temp_ID
2 06.28.2017 22:00:00 88 2 0 7
3 06.28.2017 22:00:00 89 2 0 9
4 06.28.2017 22:00:00 89 1 5 4
5 06.28.2017 22:00:00 90 1 12 13
6 06.28.2017 22:00:00 90 2 18 4
df = df.pivot_table(index=['PERIOD_START_TIME','ID'],
columns='temp_ID',
values=['value1','value2'],
aggfunc='mean')
df.columns = ['_'.join((x[0], str(x[1]))) for x in df.columns]
df = df.reset_index()
print (df)
PERIOD_START_TIME ID value1_1 value1_2 value2_1 value2_2
0 06.28.2017 22:00:00 88 4.5 0.0 2.5 7.0
1 06.28.2017 22:00:00 89 5.0 0.0 4.0 9.0
2 06.28.2017 22:00:00 90 12.0 18.0 13.0 4.0
替代解决方案:
df = df.groupby(['PERIOD_START_TIME','ID','temp_ID']).mean().unstack()
df.columns = ['_'.join((x[0], str(x[1]))) for x in df.columns]
df = df.reset_index()
print (df)
PERIOD_START_TIME ID value1_1 value1_2 value2_1 value2_2
0 06.28.2017 22:00:00 88 4.5 0.0 2.5 7.0
1 06.28.2017 22:00:00 89 5.0 0.0 4.0 9.0
2 06.28.2017 22:00:00 90 12.0 18.0 13.0 4.0
这篇关于根据 pandas 中一列的值从行创建列的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!
查看全文