在Pandas DataFrame中拆分列列表 [英] Splitting Column Lists in Pandas DataFrame

查看：88 发布时间：2020/5/24 3:47:28 python-3.x pandas dataframe pandas-groupby

本文介绍了在Pandas DataFrame中拆分列列表的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我正在寻找解决以下问题的好方法.我当前的修复方法不是特别干净，希望能从您的见解中吸取教训.

I'm looking for an good way to solve the following problem. My current fix is not particularly clean, and I'm hoping to learn from your insight.

假设我有一个Panda DataFrame，其条目如下所示:

Suppose I have a Panda DataFrame, whose entries look like this:

>>> df=pd.DataFrame(index=[1,2,3],columns=['Color','Texture','IsGlass'])

>>> df['Color']=[np.nan,['Red','Blue'],['Blue', 'Green', 'Purple']]
>>> df['Texture']=[['Rough'],np.nan,['Silky', 'Shiny', 'Fuzzy']]
>>> df['IsGlass']=[1,0,1]

>>> df
                            Color                   Texture   IsGlass
    1                         NaN                  ['Rough']        1
    2              ['Red', 'Blue']                       NaN        0 
    3  ['Blue', 'Green', 'Purple']  ['Silky','Shiny','Fuzzy']       1

因此，索引中的每个观察值对应于我对其颜色，纹理以及是否为玻璃所测得的值.我想做的是通过为每个观察到的值创建一个列，并将其对应的条目更改为一个(如果我观察到的话)将其更改为一个新的指标" DataFrame，如果我没有信息则将其更改为NaN.

So each observation in the index corresponds to something I measured about its color, texture, and whether it's glass or not. What I'd like to do is turn this into a new "indicator" DataFrame, by creating a column for each observed value, and changing the corresponding entry to a one if I observed it, and NaN if I have no information.

>>> df
         Red Blue Green Purple Rough Silky Shiny Fuzzy Is Glass               
    1    Nan  Nan  Nan   Nan    1     NaN   Nan   Nan     1        
    2     1    1   Nan   Nan    Nan   Nan   Nan   Nan     0 
    3    Nan   1    1     1     Nan    1     1     1      1

我有一个解决方案，它遍历每列，查看其值，并通过一系列针对非Nan值的Try/Excepts拆分列表，创建新列等，然后进行连接.

I have solution that loops over each column, looks at its values, and through a series of Try/Excepts for non-Nan values splits the lists, creates a new column, etc., and concatenates.

这是我对StackOverflow的第一篇文章-我希望这篇文章符合发布准则.谢谢.

This is my first post to StackOverflow - I hope this post conforms to the posting guidelines. Thanks.

推荐答案

堆叠技巧！

from sklearn.preprocessing import MultiLabelBinarizer

mlb = MultiLabelBinarizer()

df = df.stack().unstack(fill_value=[])

def b(c):
    d = mlb.fit_transform(c)
    return pd.DataFrame(d, c.index, mlb.classes_)

pd.concat([b(df[c]) for c in ['Color', 'Texture']], axis=1).join(df.IsGlass)

   Blue  Green  Purple  Red  Fuzzy  Rough  Shiny  Silky IsGlass
1     0      0       0    0      0      1      0      0       1
2     1      0       0    1      0      0      0      0       0
3     1      1       1    0      1      0      1      1       1

这篇关于在Pandas DataFrame中拆分列列表的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

在Pandas DataFrame中拆分列列表 [英] Splitting Column Lists in Pandas DataFrame

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录关闭

在Pandas DataFrame中拆分列列表 [英] Splitting Column Lists in Pandas DataFrame

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录 关闭

登录关闭