从另一列的值列表创建多个列 [英] create multiple columns from list of values of another column

查看:84
本文介绍了从另一列的值列表创建多个列的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个看起来像这样的数据框:

I have the dataframe that looks like:

Groupe       Id   MotherName   FatherName    Field
Advanced    56    Laure         James        English-107,Economics, Management, History, Philosophy
Middle      11    Ann           Nicolas      Web-development, Java-2
Advanced    6     Helen         Franc        Literature, English-2
Beginner    43    Laure         James        Mathematics, History, Philosophy, Literature
Middle      14    Naomi         Franc        Java-2, Management, English-107

为进一步处理数据,我需要拆分Field列,并将其替换为如下所示的多列:

For farther work with the data, I need to split the Field column, and replace it with multiple columns that will look like:

Id English-107 Economics Management History Web-development Java-2 Literature English-2 Mathematics Philosophy
56     1         1          1           1           0          0       0             0          0         1
11     0         0          0           0           1           1      0             0            0          0

因此,这些列可以附加到初始数据框.我不知道该怎么做,因为像

So these columns could be append to the initial dataframe. I don't know how to make it, because just basic splitting like

pd.DataFrame(df.Field.str.split(',',1).tolist())

不能解决我的问题,因为我不仅需要基于列表中位置的列,还需要基于列表中每个唯一值的列.你知道我该如何处理吗?

doesn't resolve my probleme, because I need the columns based not just on the position in the list, but based on every unique value in the list. Have you any idea how I can approach it?

推荐答案

您可以使用 concat

You can use concat and str.get_dummies:

print pd.concat([df['Id'], df['Field'].str.get_dummies(sep=",")], axis=1)
   Id  Economics  English-107  English-2  History  Java-2  Literature  \
0  56          1            1          0        1       0           0   
1  11          0            0          0        0       1           0   
2   6          0            0          1        0       0           1   
3  43          0            0          0        1       0           1   
4  14          0            1          0        0       1           0   

   Management  Mathematics  Philosophy  Web-development  
0           1            0           1                0  
1           0            0           0                1  
2           0            0           0                0  
3           0            1           1                0  
4           1            0           0                0  

如果需要计数值,则可以使用 pivot_table (我添加了一个字符串Economics进行测试):

If you need count values, you can use pivot_table (I add one string Economics for testing):

df1 = df['Field'].str.split(',',expand=True).stack()
                                            .groupby(level=0)
                                            .value_counts()
                                            .reset_index()
df1.columns=['a','b','c']
print df1.pivot_table(index='a',columns='b',values='c').fillna(0)
b  Economics  English-107  English-2  History  Java-2  Literature  Management  \
a                                                                               
0          2            1          0        1       0           0           1   
1          0            0          0        0       1           0           0   
2          0            0          1        0       0           1           0   
3          0            0          0        1       0           1           0   
4          0            1          0        0       1           0           1   

b  Mathematics  Philosophy  Web-development  
a                                            
0            0           1                0  
1            0           0                1  
2            0           0                0  
3            1           1                0  
4            0           0                0  

这篇关于从另一列的值列表创建多个列的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆