转置列时按唯一值分组 [英] Grouping by unique values while transposing column
本文介绍了转置列时按唯一值分组的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!
问题描述
前几天,我用来自两列的数据问了一个类似的问题:
现在我有三列.它们需要按A列分组,其中B列作为标题值,C列正确排序.
我的数据框如下:
A B C
25115 20 45
25115 30 154
25115 40 87
25115 70 21
25115 90 74
26200 10 48
26200 20 414
26200 40 21
26200 50 288
26200 80 174
26200 90 54
但是我需要结束这个:
10 20 30 40 50 70 80 90
25115 45 154 87 21 74
26200 48 414 21 288 174 54
这将获取列C的值,但不使用列B作为行名.
import pandas as pd
df = pd.DataFrame({'A':[25115,25115,25115,25115,25115,26200,26200,26200,26200,26200,26200],'B':[20,30,40,70,90,10,20,40,50,80,90],'C':[45,154,87,21,74,48,414,21,288,174,54]})
a = df.groupby('A')['C'].apply(lambda x:' '.join(x.astype(str)))
任何想法都将不胜感激.
解决方案
- 选项1:
使用数据透视表:
df.pivot_table(values='C',index='A',columns='B')
输出
B 10 20 30 40 50 70 80 90
A
25115 NaN 45.0 154.0 87.0 NaN 21.0 NaN 74.0
26200 48.0 414.0 NaN 21.0 288.0 NaN 174.0 54.0
- 选项2:
使用set_index/取消堆叠:
df.set_index(['A','B'])['C'].unstack()
输出:
B 10 20 30 40 50 70 80 90
A
25115 NaN 45.0 154.0 87.0 NaN 21.0 NaN 74.0
26200 48.0 414.0 NaN 21.0 288.0 NaN 174.0 54.0
I asked a similar question the other day with data from two columns:
Grouping columns by unique values in Python
Now I have three columns. They need to be grouped by column A with column B as the header values and column C sorted properly.
My data frame looks like:
A B C
25115 20 45
25115 30 154
25115 40 87
25115 70 21
25115 90 74
26200 10 48
26200 20 414
26200 40 21
26200 50 288
26200 80 174
26200 90 54
But I need to end up with this:
10 20 30 40 50 70 80 90
25115 45 154 87 21 74
26200 48 414 21 288 174 54
This gets the values of column C, but not with column B as the row names.
import pandas as pd
df = pd.DataFrame({'A':[25115,25115,25115,25115,25115,26200,26200,26200,26200,26200,26200],'B':[20,30,40,70,90,10,20,40,50,80,90],'C':[45,154,87,21,74,48,414,21,288,174,54]})
a = df.groupby('A')['C'].apply(lambda x:' '.join(x.astype(str)))
Any ideas would be most appreciated.
解决方案
- Option 1:
Use pivot_table:
df.pivot_table(values='C',index='A',columns='B')
Output
B 10 20 30 40 50 70 80 90
A
25115 NaN 45.0 154.0 87.0 NaN 21.0 NaN 74.0
26200 48.0 414.0 NaN 21.0 288.0 NaN 174.0 54.0
- Option 2:
Use set_index / unstack:
df.set_index(['A','B'])['C'].unstack()
Output:
B 10 20 30 40 50 70 80 90
A
25115 NaN 45.0 154.0 87.0 NaN 21.0 NaN 74.0
26200 48.0 414.0 NaN 21.0 288.0 NaN 174.0 54.0
这篇关于转置列时按唯一值分组的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!
查看全文