在python中爆炸多个csv字段 [英] exploding multiple csv fields in python
问题描述
我有一个具有200行的excel文件,其中2行包含逗号分隔的值.如果将它们输出为制表符分隔,则如下所示:
I have an excel file with 200 rows, 2 of which have comma separated values in them. If I output them to tab-separated, it would look like this:
col1 col2 col3
a b,c d,e
f g,h i,j
我需要爆炸以获得这样的数据框,将200行分解为约4,000个:
I need to explode to get a dataframe like this, exploding 200 rows into ~4,000:
col1 col2 col3
a b d
a b e
a c d
a c e
f g i
f g j
f h i
f h j
我在熊猫中看不到任何爆炸功能,也无法弄清楚如何用逗号分隔值的列长度不均匀-不确定如何在这里拆分.
I don't see any explode functionality in pandas and haven't been able to figure out how to do this having the columns of comma-separated values uneven in length - not sure how split would work here.
帮我堆栈溢出,您是我唯一的希望.谢谢!
Help me stack-overflow, you're my only hope. Thanks!
推荐答案
使用itertools.product获取col2和col3之间的所有组合,然后将它们转换为单独的列
Use itertools.product to get all combinations between col2 and col3, and then convert them into separate columns
from itertools import product
df.set_index('col1')\
.apply(lambda x: pd.Series(list(product(x.col2.split(','),x.col3.split(',')))),axis=1)\
.stack()\
.reset_index(1,drop=True)\
.apply(pd.Series)\
.reset_index().rename(columns={0:'col1',1:'col3'})
Out[466]:
col1 col1 col3
0 a b d
1 a b e
2 a c d
3 a c e
4 f g i
5 f g j
6 f h i
7 f h j
这篇关于在python中爆炸多个csv字段的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!