在python中爆炸多个csv字段 [英] exploding multiple csv fields in python

查看:42
本文介绍了在python中爆炸多个csv字段的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个具有200行的excel文件,其中2行包含逗号分隔的值.如果将它们输出为制表符分隔,则如下所示:

I have an excel file with 200 rows, 2 of which have comma separated values in them. If I output them to tab-separated, it would look like this:

col1  col2    col3
a     b,c     d,e
f     g,h     i,j

我需要爆炸以获得这样的数据框,将200行分解为约4,000个:

I need to explode to get a dataframe like this, exploding 200 rows into ~4,000:

col1  col2  col3
a     b     d
a     b     e
a     c     d
a     c     e
f     g     i
f     g     j
f     h     i
f     h     j

我在熊猫中看不到任何爆炸功能,也无法弄清楚如何用逗号分隔值的列长度不均匀-不确定如何在这里拆分.

I don't see any explode functionality in pandas and haven't been able to figure out how to do this having the columns of comma-separated values uneven in length - not sure how split would work here.

帮我堆栈溢出,您是我唯一的希望.谢谢!

Help me stack-overflow, you're my only hope. Thanks!

推荐答案

使用itertools.product获取col2和col3之间的所有组合,然后将它们转换为单独的列

Use itertools.product to get all combinations between col2 and col3, and then convert them into separate columns

from itertools import product
df.set_index('col1')\
  .apply(lambda x: pd.Series(list(product(x.col2.split(','),x.col3.split(',')))),axis=1)\
  .stack()\
  .reset_index(1,drop=True)\
  .apply(pd.Series)\
  .reset_index().rename(columns={0:'col1',1:'col3'})

Out[466]: 
  col1 col1 col3
0    a    b    d
1    a    b    e
2    a    c    d
3    a    c    e
4    f    g    i
5    f    g    j
6    f    h    i
7    f    h    j

这篇关于在python中爆炸多个csv字段的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆