python将数据框列分成多行 [英] python split data frame columns into multiple rows
问题描述
我有一个像这样的数据框:
I have a dataframe like this:
--------------------------------------------------------------------
Product ProductType SKU Size
--------------------------------------------------------------------
T-shirt Top [111,222,333,444] ['XS','S','M','L']
Pant(Flared) Bottoms [555,666] ['M','L']
Sweater Top None None
我想要以下输出:
Product ProductType SKU Size
T-shirt Top 111 XS
T-shirt Top 222 S
T-shirt Top 333 M
T-shirt Top 444 L
Pant(Flared) Bottoms 555 M
Pant(Flared) Bottoms 666 L
Sweater Top None None
我尝试了以下代码:
s = df['SKU'].apply(Series,1).stack()
s.index = s.index.droplevel(-1)
s.name = 'SKU'
del df['SKU']
df = df.join(s)
r = df['Size'].apply(Series,1).stack()
r.index = r.index.droplevel(-1)
r.name = 'Size'
del df['Size']
df = df.join(r)
但这会爆炸为以下内容:
But this explodes into the following:
Product ProductType SKU Size
T-shirt Top 111 XS
T-shirt Top 111 S
T-shirt Top 111 M
T-shirt Top 111 L
T-shirt Top 222 XS
T-shirt Top 222 S
T-shirt Top 222 M
T-shirt Top 222 L
T-shirt Top 333 XS
T-shirt Top 333 S
T-shirt Top 333 M
T-shirt Top 333 L
T-shirt Top 444 XS
T-shirt Top 444 S
T-shirt Top 444 M
T-shirt Top 444 L
Pant(Flared) Bottoms 555 M
Pant(Flared) Bottoms 555 L
Pant(Flared) Bottoms 666 M
Pant(Flared) Bottoms 666 L
请注意,为简单起见,我添加了两个要重复的列(Product,ProductType),但是我有5个这样的包含字符串的列. 我基本上想将SKU与每种产品的尺寸相关联.
Note that for simplicity sake, I have added two columns that will be repeated (Product, ProductType) but I have 5 such columns that contain strings. I basically want to associate the SKU with the size for each product.
有人可以在这里帮忙吗?
Can anyone help here ?
推荐答案
此功能对错误开放,因此请谨慎使用:
This is open to bugs so use with caution:
将产品"列转换为一系列列表,这些列表的大小与其他列中的列表大小相同(例如SKU列.如果SKU和大小"中的列表长度不同,则无法使用)
Convert Product column to a collection of lists whose sizes are the same with the lists in other columns (say, column SKU. This will not work if the lists in SKU and Size are of different lengths)
df["Product"] = df["Product"].map(list) * df["SKU"].map(len)
Out[184]:
SKU Size Product
0 [111, 222, 333, 444] [XS, S, M, L] [a, a, a, a]
1 [555, 666] [M, L] [b, b]
取列的总和(它将扩展列表),然后使用to_dict()
将其传递给数据框构造函数:
Take the sum of the columns (it will extend the lists) and pass that to the dataframe constructor with to_dict()
:
pd.DataFrame(df.sum().to_dict())
Out[185]:
Product SKU Size
0 a 111 XS
1 a 222 S
2 a 333 M
3 a 444 L
4 b 555 M
5 b 666 L
修改:
对于几列,您可以定义要重复的列:
For several columns, you can define the columns to be repeated:
cols_to_be_repeated = ["Product", "ProductType"]
将没有值的行保存在另一个数据框中:
Save the rows that has None values in another dataframe:
na_df = df[pd.isnull(df["SKU"])].copy()
从原始数据框中删除无内容
Drop None's from the original dataframe:
df.dropna(inplace = True)
遍历这些列:
for col in cols_to_be_repeated:
df[col] = df[col].map(lambda x: [x]) * df["SKU"].map(len)
并使用相同的方法:
pd.concat([pd.DataFrame(df.sum().to_dict()), na_df])
Product ProductType SKU Size
0 T-shirt Top 111.0 XS
1 T-shirt Top 222.0 S
2 T-shirt Top 333.0 M
3 T-shirt Top 444.0 L
4 Pant(Flared) Bottoms 555.0 M
5 Pant(Flared) Bottoms 666.0 L
2 Sweater Top NaN None
处理原始数据框的副本可能会更好.
It might be better to work on a copy of the original dataframe.
这篇关于python将数据框列分成多行的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!