在Pandas DataFrame中拆分列列表 [英] Splitting Column Lists in Pandas DataFrame
问题描述
我正在寻找解决以下问题的好方法.我当前的修复方法不是特别干净,希望能从您的见解中吸取教训.
I'm looking for an good way to solve the following problem. My current fix is not particularly clean, and I'm hoping to learn from your insight.
假设我有一个Panda DataFrame,其条目如下所示:
Suppose I have a Panda DataFrame, whose entries look like this:
>>> df=pd.DataFrame(index=[1,2,3],columns=['Color','Texture','IsGlass'])
>>> df['Color']=[np.nan,['Red','Blue'],['Blue', 'Green', 'Purple']]
>>> df['Texture']=[['Rough'],np.nan,['Silky', 'Shiny', 'Fuzzy']]
>>> df['IsGlass']=[1,0,1]
>>> df
Color Texture IsGlass
1 NaN ['Rough'] 1
2 ['Red', 'Blue'] NaN 0
3 ['Blue', 'Green', 'Purple'] ['Silky','Shiny','Fuzzy'] 1
因此,索引中的每个观察值对应于我对其颜色,纹理以及是否为玻璃所测得的值.我想做的是通过为每个观察到的值创建一个列,并将其对应的条目更改为一个(如果我观察到的话)将其更改为一个新的指标" DataFrame,如果我没有信息则将其更改为NaN.
So each observation in the index corresponds to something I measured about its color, texture, and whether it's glass or not. What I'd like to do is turn this into a new "indicator" DataFrame, by creating a column for each observed value, and changing the corresponding entry to a one if I observed it, and NaN if I have no information.
>>> df
Red Blue Green Purple Rough Silky Shiny Fuzzy Is Glass
1 Nan Nan Nan Nan 1 NaN Nan Nan 1
2 1 1 Nan Nan Nan Nan Nan Nan 0
3 Nan 1 1 1 Nan 1 1 1 1
我有一个解决方案,它遍历每列,查看其值,并通过一系列针对非Nan值的Try/Excepts拆分列表,创建新列等,然后进行连接.
I have solution that loops over each column, looks at its values, and through a series of Try/Excepts for non-Nan values splits the lists, creates a new column, etc., and concatenates.
这是我对StackOverflow的第一篇文章-我希望这篇文章符合发布准则.谢谢.
This is my first post to StackOverflow - I hope this post conforms to the posting guidelines. Thanks.
推荐答案
堆叠技巧!
from sklearn.preprocessing import MultiLabelBinarizer
mlb = MultiLabelBinarizer()
df = df.stack().unstack(fill_value=[])
def b(c):
d = mlb.fit_transform(c)
return pd.DataFrame(d, c.index, mlb.classes_)
pd.concat([b(df[c]) for c in ['Color', 'Texture']], axis=1).join(df.IsGlass)
Blue Green Purple Red Fuzzy Rough Shiny Silky IsGlass
1 0 0 0 0 0 1 0 0 1
2 1 0 0 1 0 0 0 0 0
3 1 1 1 0 1 0 1 1 1
这篇关于在Pandas DataFrame中拆分列列表的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!