将分隔符上的多列拆分为pandas数据框中的行 [英] Splitting multiple columns on a delimiter into rows in pandas dataframe
问题描述
我有一个熊猫数据框,如下所示:
I have a pandas dataframe as shown here:
id pos value sent
1 a/b/c test/test2/test3 21
2 d/a test/test5 21
我想拆分(= explode)df['pos']
和df['token']
,以便数据框看起来像这样:
I would like to split (=explode)df['pos']
and df['token']
so that the dataframe looks like this:
id pos value sent
1 a test 21
1 b test2 21
1 c test3 21
2 d test 21
2 a test5 21
如果我拆分每列然后将它们合并在一起,那是行不通的
It doesn't work if I split each column and then concat them à la
pos = df.token.str.split('/', expand=True).stack().str.strip().reset_index(level=1, drop=True)
df1 = pd.concat([pos,value], axis=1, keys=['pos','value'])
有什么想法吗?我真的很感激.
Any ideas? I'd really appreciate it.
我尝试在此处使用此解决方案: https://stackoverflow.com/a/40449726/4219498
I tried using this solution here : https://stackoverflow.com/a/40449726/4219498
但是出现以下错误:
TypeError: Cannot cast array data from dtype('int64') to dtype('int32') according to the rule 'safe'
But I get the following error:
TypeError: Cannot cast array data from dtype('int64') to dtype('int32') according to the rule 'safe'
我不确定这是一个与numpy相关的问题,尽管我不确定这是怎么发生的.我正在使用Python 2.7.14
I suppose this is a numpy related issue although I'm not sure how this happens. I'm using Python 2.7.14
推荐答案
我倾向于避免使用stack
魔术,而是从头开始构建新的数据框.通常这也更有效.下面是一种方法.
I tend to avoid the stack
magic in favour of building a new dataframe from scratch. This is usually also more efficient. Below is one way.
import numpy as np
from itertools import chain
lens = list(map(len, df['pos'].str.split('/')))
res = pd.DataFrame({'id': np.repeat(df['id'], lens),
'pos': list(chain.from_iterable(df['pos'].str.split('/'))),
'value': list(chain.from_iterable(df['value'].str.split('/'))),
'sent': np.repeat(df['sent'], lens)})
print(res)
id pos sent value
0 1 a 21 test
0 1 b 21 test2
0 1 c 21 test3
1 2 d 21 test
1 2 a 21 test5
这篇关于将分隔符上的多列拆分为pandas数据框中的行的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!