Pandas,DataFrame:将一列拆分为多列 [英] Pandas, DataFrame: Splitting one column into multiple columns

查看:407
本文介绍了Pandas,DataFrame:将一列拆分为多列的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有以下DataFrame.我想知道是否可以将data列分成多列.例如,从此:

I have the following DataFrame. I am wondering whether it is possible to break the data column into multiple columns. E.g., from this:


ID       Date       data
6       21/05/2016  A: 7, B: 8, C: 5, D: 5, A: 8
6       21/01/2014  B: 5, C: 5, D: 7
6       02/04/2013  A: 4, D:7
7       05/06/2014  C: 25
7       12/08/2014  D: 20
8       18/04/2012  A: 2, B: 3, C: 3, E: 5, B: 4
8       21/03/2012  F: 6, B: 4, F: 5, D: 6, B: 4  

对此:


ID       Date       data                            A   B   C   D   E   F
6       21/05/2016  A: 7, B: 8, C: 5, D: 5, A: 8    15  8   5   5   0   0
6       21/01/2014  B: 5, C: 5, D: 7                0   5   5   7   0   0     
6       02/04/2013  B: 4, D: 7, B: 6                0   10  0   7   0   0
7       05/06/2014  C: 25                           0   0   25  0   0   0
7       12/08/2014  D: 20                           0   0   0   20  0   0   
8       18/04/2012  A: 2, B: 3, C: 3, E: 5, B: 4    2   7   3   0   5   0
8       21/03/2012  F: 6, B: 4, F: 5, D: 6, B: 4    0   8   0   6   0   11

我已经尝试过将元组中的字符串拆分为Pandas中的列,然后此 pandas:如何拆分文本在一列中分成多行?,但在我的情况下它们不起作用.

I have tried this Split strings in tuples into columns, in Pandas, and this pandas: How do I split text in a column into multiple rows? but they are not working in my case.

编辑

有点复杂,data列具有重复的值,例如在第一行中重复A,因此这些值在A列下求和(请参见第二张表).

There is a bit of complexity the data column has duplicate values for example in first row A is repeated, and therefore these values are summed up under the A column (please see second table).

推荐答案

这里是一个函数,可以将字符串转换为字典并基于键汇总值;转换后,使用pd.Series方法很容易获得结果:

Here is a function that can convert the string to a dictionary and aggregate values based on the key; After the conversion it will be easy to get the results with the pd.Series method:

def str_to_dict(str1):
    import re
    from collections import defaultdict
    d = defaultdict(int)
    for k, v in zip(re.findall('[A-Z]', str1), re.findall('\d+', str1)):
        d[k] += int(v)
    return d

pd.concat([df, df['dictionary'].apply(str_to_dict).apply(pd.Series).fillna(0).astype(int)], axis=1)

这篇关于Pandas,DataFrame:将一列拆分为多列的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆