python将数据框列分成多行 [英] python split data frame columns into multiple rows

查看:271
本文介绍了python将数据框列分成多行的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个像这样的数据框:

I have a dataframe like this:

--------------------------------------------------------------------
Product        ProductType     SKU                Size
--------------------------------------------------------------------
T-shirt        Top            [111,222,333,444]   ['XS','S','M','L']
Pant(Flared)   Bottoms        [555,666]           ['M','L']
Sweater        Top            None                None

我想要以下输出:

Product       ProductType        SKU        Size
T-shirt       Top                111        XS
T-shirt       Top                222        S
T-shirt       Top                333        M
T-shirt       Top                444        L
Pant(Flared)  Bottoms            555        M
Pant(Flared)  Bottoms            666        L
Sweater       Top                None       None

我尝试了以下代码:

s = df['SKU'].apply(Series,1).stack()
s.index = s.index.droplevel(-1)
s.name = 'SKU'
del df['SKU']
df = df.join(s)

r = df['Size'].apply(Series,1).stack()
r.index = r.index.droplevel(-1)
r.name = 'Size'
del df['Size']
df = df.join(r)

但这会爆炸为以下内容:

But this explodes into the following:

Product       ProductType   SKU             Size
T-shirt       Top           111             XS
T-shirt       Top           111             S
T-shirt       Top           111             M
T-shirt       Top           111             L
T-shirt       Top           222             XS
T-shirt       Top           222             S
T-shirt       Top           222             M
T-shirt       Top           222             L
T-shirt       Top           333             XS
T-shirt       Top           333             S
T-shirt       Top           333             M
T-shirt       Top           333             L
T-shirt       Top           444             XS
T-shirt       Top           444             S
T-shirt       Top           444             M
T-shirt       Top           444             L
Pant(Flared)  Bottoms       555             M
Pant(Flared)  Bottoms       555             L
Pant(Flared)  Bottoms       666             M
Pant(Flared)  Bottoms       666             L

请注意,为简单起见,我添加了两个要重复的列(Product,ProductType),但是我有5个这样的包含字符串的列. 我基本上想将SKU与每种产品的尺寸相关联.

Note that for simplicity sake, I have added two columns that will be repeated (Product, ProductType) but I have 5 such columns that contain strings. I basically want to associate the SKU with the size for each product.

有人可以在这里帮忙吗?

Can anyone help here ?

推荐答案

此功能对错误开放,因此请谨慎使用:

This is open to bugs so use with caution:

将产品"列转换为一系列列表,这些列表的大小与其他列中的列表大小相同(例如SKU列.如果SKU和大小"中的列表长度不同,则无法使用)

Convert Product column to a collection of lists whose sizes are the same with the lists in other columns (say, column SKU. This will not work if the lists in SKU and Size are of different lengths)

df["Product"] = df["Product"].map(list) * df["SKU"].map(len)

Out[184]: 
                    SKU           Size       Product
0  [111, 222, 333, 444]  [XS, S, M, L]  [a, a, a, a]
1            [555, 666]         [M, L]        [b, b]

取列的总和(它将扩展列表),然后使用to_dict()将其传递给数据框构造函数:

Take the sum of the columns (it will extend the lists) and pass that to the dataframe constructor with to_dict():

pd.DataFrame(df.sum().to_dict())
Out[185]: 
  Product  SKU Size
0       a  111   XS
1       a  222    S
2       a  333    M
3       a  444    L
4       b  555    M
5       b  666    L

修改:

对于几列,您可以定义要重复的列:

For several columns, you can define the columns to be repeated:

cols_to_be_repeated = ["Product", "ProductType"]

将没有值的行保存在另一个数据框中:

Save the rows that has None values in another dataframe:

na_df = df[pd.isnull(df["SKU"])].copy()

从原始数据框中删除无内容

Drop None's from the original dataframe:

df.dropna(inplace = True)

遍历这些列:

for col in cols_to_be_repeated:
    df[col] = df[col].map(lambda x: [x]) * df["SKU"].map(len)

并使用相同的方法:

pd.concat([pd.DataFrame(df.sum().to_dict()), na_df])

        Product ProductType    SKU  Size
0       T-shirt         Top  111.0    XS
1       T-shirt         Top  222.0     S
2       T-shirt         Top  333.0     M
3       T-shirt         Top  444.0     L
4  Pant(Flared)     Bottoms  555.0     M
5  Pant(Flared)     Bottoms  666.0     L
2       Sweater         Top    NaN  None

处理原始数据框的副本可能会更好.

It might be better to work on a copy of the original dataframe.

这篇关于python将数据框列分成多行的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆