Pandas - 使用包含数组的列来展开数据框 [英] Pandas - unflatten data frame with columns containing array

查看:48
本文介绍了Pandas - 使用包含数组的列来展开数据框的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个已在特定属性上展平的数据框:

id property_a properties_bid_1 property_a_1 [property_b_11, property_b_12]id_2 property_a_2 [property_b_21, property_b_22, property_b_23]………………

我想展开列 properties_b 以返回到如下所示的数据框:

id property_a property_bid_1 property_a_1 property_b_11id_1 property_a_1 property_b_12id_2 property_a_2 property_b_21id_2 property_a_2 property_b_22id_2 property_a_2 property_b_23………………

我怀疑这对 Pandas 来说很简单,但我是 Python 新手,我很难找到一种优雅的方法来做到这一点.

解决方案

此问题已在

rows = []对于 i,df.iterrows() 中的行:对于 row.properties_b 中的 a:row.properties_b = a行.追加(行)pd.DataFrame(行,列=df.columns)

方便的功能

def loc_expand(df, loc):行 = []对于 i,df.iterrows() 中的行:vs = row.at[loc]new = row.copy()对于 v in vs:new.at[loc] = v行.追加(新)返回 pd.DataFrame(rows)def iloc_expand(df, iloc):行 = []对于 i,df.iterrows() 中的行:vs = row.iat[iloc]new = row.copy()对于 v in vs:row.iat[iloc] = v行.追加(行)返回 pd.DataFrame(rows)

<小时>

这些都应该返回与上面相同的结果.

loc_expand(df, 'properties_b')iloc_expand(df, 2)

I have a data frame which has been flattened on a specific property:

id      property_a    properties_b
id_1    property_a_1  [property_b_11, property_b_12]
id_2    property_a_2  [property_b_21, property_b_22, property_b_23]

..................

I'd like to expand the column properties_b to go back to a data frame looking like this:

id      property_a    property_b
id_1    property_a_1  property_b_11
id_1    property_a_1  property_b_12
id_2    property_a_2  property_b_21
id_2    property_a_2  property_b_22
id_2    property_a_2  property_b_23

..................

I suspect this is very simple with Pandas, but being new to Python, I struggle to find an elegant way to do so.

解决方案

This question was addressed here and here. If you find these questions and answers useful, feel free to up vote them as well.

Setup

df = pd.DataFrame([
        ['id_1', 'property_a_1', ['property_b_11', 'property_b_12']],
        ['id_2', 'property_a_2', ['property_b_21', 'property_b_22', 'property_b_23']],
    ], columns=['id', 'property_a', 'properties_b'])

df

rows = []
for i, row in df.iterrows():
    for a in row.properties_b:
        row.properties_b = a
        rows.append(row)

pd.DataFrame(rows, columns=df.columns)

Handy functions

def loc_expand(df, loc):
    rows = []
    for i, row in df.iterrows():
        vs = row.at[loc]
        new = row.copy()
        for v in vs:
            new.at[loc] = v
            rows.append(new)

    return pd.DataFrame(rows)

def iloc_expand(df, iloc):
    rows = []
    for i, row in df.iterrows():
        vs = row.iat[iloc]
        new = row.copy()
        for v in vs:
            row.iat[iloc] = v
            rows.append(row)

    return pd.DataFrame(rows)


These should both return the same result as above.

loc_expand(df, 'properties_b')
iloc_expand(df, 2)

这篇关于Pandas - 使用包含数组的列来展开数据框的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆