Pandas,DataFrame:将一列拆分为多列 [英] Pandas, DataFrame: Splitting one column into multiple columns
问题描述
我有以下DataFrame.我想知道是否可以将data
列分成多列.例如,从此:
I have the following DataFrame. I am wondering whether it is possible to break the data
column into multiple columns. E.g., from this:
ID Date data
6 21/05/2016 A: 7, B: 8, C: 5, D: 5, A: 8
6 21/01/2014 B: 5, C: 5, D: 7
6 02/04/2013 A: 4, D:7
7 05/06/2014 C: 25
7 12/08/2014 D: 20
8 18/04/2012 A: 2, B: 3, C: 3, E: 5, B: 4
8 21/03/2012 F: 6, B: 4, F: 5, D: 6, B: 4
对此:
ID Date data A B C D E F
6 21/05/2016 A: 7, B: 8, C: 5, D: 5, A: 8 15 8 5 5 0 0
6 21/01/2014 B: 5, C: 5, D: 7 0 5 5 7 0 0
6 02/04/2013 B: 4, D: 7, B: 6 0 10 0 7 0 0
7 05/06/2014 C: 25 0 0 25 0 0 0
7 12/08/2014 D: 20 0 0 0 20 0 0
8 18/04/2012 A: 2, B: 3, C: 3, E: 5, B: 4 2 7 3 0 5 0
8 21/03/2012 F: 6, B: 4, F: 5, D: 6, B: 4 0 8 0 6 0 11
我已经尝试过将元组中的字符串拆分为Pandas中的列,然后此 pandas:如何拆分文本在一列中分成多行?,但在我的情况下它们不起作用.
I have tried this Split strings in tuples into columns, in Pandas, and this pandas: How do I split text in a column into multiple rows? but they are not working in my case.
编辑
有点复杂,data
列具有重复的值,例如在第一行中重复A
,因此这些值在A
列下求和(请参见第二张表).
There is a bit of complexity the data
column has duplicate values for example in first row A
is repeated, and therefore these values are summed up under the A
column (please see second table).
推荐答案
这里是一个函数,可以将字符串转换为字典并基于键汇总值;转换后,使用pd.Series
方法很容易获得结果:
Here is a function that can convert the string to a dictionary and aggregate values based on the key; After the conversion it will be easy to get the results with the pd.Series
method:
def str_to_dict(str1):
import re
from collections import defaultdict
d = defaultdict(int)
for k, v in zip(re.findall('[A-Z]', str1), re.findall('\d+', str1)):
d[k] += int(v)
return d
pd.concat([df, df['dictionary'].apply(str_to_dict).apply(pd.Series).fillna(0).astype(int)], axis=1)
这篇关于Pandas,DataFrame:将一列拆分为多列的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!