在 pandas 数据框中将单元格拆分为多行 [英] Split cell into multiple rows in pandas dataframe

查看:26
本文介绍了在 pandas 数据框中将单元格拆分为多行的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个包含订单数据的数据框,每个订单都有多个包存储为逗号分隔的字符串 [package &package_code] 列

I have a dataframe contains orders data, each order has multiple packages stored as comma separated string [package & package_code] columns

我想拆分包裹数据并为每个包裹创建一行,包括其订单详细信息

I want to split the packages data and create a row for each package including its order details

这是一个示例输入数据框:

Here is a sample input dataframe:

import pandas as pd
df = pd.DataFrame({"order_id":[1,3,7],"order_date":["20/5/2018","22/5/2018","23/5/2018"], "package":["p1,p2,p3","p4","p5,p6"],"package_code":["#111,#222,#333","#444","#555,#666"]})

这就是我想要实现的输出:

And this is what I am trying to achieve as output:

我怎样才能用熊猫做到这一点?

How can I do that with pandas?

推荐答案

这是使用 numpy.repeatitertools.chain 的一种方法.从概念上讲,这正是您想要做的:重复某些值,链接其他值.推荐用于少量列,否则基于 stack 的方法可能会更好.

Here's one way using numpy.repeat and itertools.chain. Conceptually, this is exactly what you want to do: repeat some values, chain others. Recommended for small numbers of columns, otherwise stack based methods may fare better.

import numpy as np
from itertools import chain

# return list from series of comma-separated strings
def chainer(s):
    return list(chain.from_iterable(s.str.split(',')))

# calculate lengths of splits
lens = df['package'].str.split(',').map(len)

# create new dataframe, repeating or chaining as appropriate
res = pd.DataFrame({'order_id': np.repeat(df['order_id'], lens),
                    'order_date': np.repeat(df['order_date'], lens),
                    'package': chainer(df['package']),
                    'package_code': chainer(df['package_code'])})

print(res)

   order_id order_date package package_code
0         1  20/5/2018      p1         #111
0         1  20/5/2018      p2         #222
0         1  20/5/2018      p3         #333
1         3  22/5/2018      p4         #444
2         7  23/5/2018      p5         #555
2         7  23/5/2018      p6         #666

这篇关于在 pandas 数据框中将单元格拆分为多行的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆