Pandas-从具有嵌套列表列表的现有列创建动态列时出错 [英] Pandas- Error in creating dynamic columns from existing column having nested list of lists

查看:57
本文介绍了Pandas-从具有嵌套列表列表的现有列创建动态列时出错的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我要从包含列表嵌套列表作为值的现有列创建两个列。

由3个公司参与者及其角色组成的记录行:

**row 1** [{'roles': [{'type': 'director'}, {'type': 'founder'}, {'type': 'owner'}, {'type': 'real_owner'}], 'life': {'name': 'Lichun Du'}}]

**row 2** [{'roles': [{'type': 'board'}], 'life': {'name': 'Erik Mølgaard'}}, {'roles': [{'type': 'director'}, {'type': 'board'}, {'type': 'real_owner'}], 'life': {'name': 'Mikael Bodholdt Linde'}}, {'roles': [{'type': 'board'}, {'type': 'real_owner'}], 'life': {'name': 'Dorte Bøcker Linde'}}]

**row 3** [{'roles': [{'type': 'director'}, {'type': 'real_owner'}], 'life': {'name': 'Kristian Løth Hougaard'}}, {'roles': [{'type': 'owner'}], 'life': {'name': 'WORLD JET HOLDING ApS'}}]

到目前为止,我已尝试:

    responses['Role of Participant(s)'] = [element[0]['roles'] for element in responses['participants']]
    responses['Role of Participant(s)'] = responses['Role of Participant(s)'].apply(lambda x: ', '.join(t['type'] for t in x))
    responses['Name of Participant(s)'] = [element[0]['life']['name'] for element in responses['participants']]

这给出了以下输出:

它只向我返回第一个参与者的角色和名称

但是,我需要每个行/记录的所有参与者及其各自的角色,如下所示:

那么,如何使用";*";作为每个行值的分隔符,如上面的截图所示? 请帮帮忙!!

更新: 以下是数据帧的CSV版本:

participants
"[{'roles': [{'type': 'founder'}], 'life': {'name': 'Poul Erik Andersen'}}, {'roles': [{'type': 'director'}, {'type': 'board'}], 'life': {'name': 'Martin Ravn-Nielsen'}}, {'roles': [{'type': 'board'}], 'life': {'name': 'Søren Haugaard'}}, {'roles': [{'type': 'board'}], 'life': {'name': 'Mads Dehlsen Winther'}}, {'roles': [{'type': 'founder'}], 'life': {'name': 'M+ Ejendomme A/S'}}, {'roles': [{'type': 'founder'}], 'life': {'name': 'MILTON HOLDING HORSENS A/S'}}, {'roles': [{'type': 'accountant'}], 'life': {'name': 'EY Godkendt Revisionspartnerselskab'}}, {'roles': [{'type': 'owner'}], 'life': {'name': 'HUSCOMPAGNIET HOLDING A/S'}}]"
"[{'roles': [{'type': 'founder'}, {'type': 'director'}, {'type': 'board'}, {'type': 'real_owner'}], 'life': {'name': 'Rasmus Gert Hansen'}}, {'roles': [{'type': 'board'}], 'life': {'name': 'John Nyrup Larsen'}}, {'roles': [{'type': 'board'}], 'life': {'name': 'Ole Nidolf Larsen'}}, {'roles': [{'type': 'owner'}], 'life': {'name': 'RASMUS HANSEN HOLDING ApS'}}, {'roles': [{'type': 'accountant'}], 'life': {'name': 'DANSK REVISION SLAGELSE GODKENDT REVISIONSAKTIESELSKAB'}}]"
"[{'roles': [{'type': 'board'}], 'life': {'name': 'Berit Pedersen'}}, {'roles': [{'type': 'board'}], 'life': {'name': 'Sanne Kristine Späth'}}, {'roles': [{'type': 'real_owner'}], 'life': {'name': 'Kjeld Kirk Kristiansen'}}, {'roles': [{'type': 'director'}], 'life': {'name': 'Jesper Andersen'}}, {'roles': [{'type': 'board'}], 'life': {'name': 'Poul Hartvig Nielsen'}}, {'roles': [{'type': 'board'}], 'life': {'name': 'Nanna Birgitta Gudum'}}, {'roles': [{'type': 'board'}], 'life': {'name': 'Henrik Baagøe Fredeløkke'}}, {'roles': [{'type': 'board'}], 'life': {'name': 'Carsten Rasmussen'}}, {'roles': [{'type': 'board'}], 'life': {'name': 'Jesper Laursen'}}, {'roles': [{'type': 'board'}], 'life': {'name': 'John Hansen'}}, {'roles': [{'type': 'owner'}], 'life': {'name': 'LEGO A/S'}}, {'roles': [{'type': 'accountant'}], 'life': {'name': 'PRICEWATERHOUSECOOPERS STATSAUTORISERET REVISIONSPARTNERSELSKAB'}}]"

推荐答案

您需要第二个for循环,而不是[0]

我使用普通函数而不是lambda以使其更具可读性。

第一个角色

import pandas as pd

data = {'participants': 
[
    [{'roles': [{'type': 'director'}, {'type': 'founder'}, {'type': 'owner'}, {'type': 'real_owner'}], 'life': {'name': 'Lichun Du'}}],
    [{'roles': [{'type': 'board'}], 'life': {'name': 'Erik Mølgaard'}}, {'roles': [{'type': 'director'}, {'type': 'board'}, {'type': 'real_owner'}], 'life': {'name': 'Mikael Bodholdt Linde'}}, {'roles': [{'type': 'board'}, {'type': 'real_owner'}], 'life': {'name': 'Dorte Bøcker Linde'}}],
    [{'roles': [{'type': 'director'}, {'type': 'real_owner'}], 'life': {'name': 'Kristian Løth Hougaard'}}, {'roles': [{'type': 'owner'}], 'life': {'name': 'WORLD JET HOLDING ApS'}}],
]
}

df = pd.DataFrame(data)

def get_roles(cell):
    
    results = []
    
    for item in cell:
        roles = []
        for role in item['roles']:
            roles.append(role['type'])
        results.append(",".join(roles))
    
    results = "***".join(results)

    return results

df['Role of Participant(s)'] = df['participants'].apply(get_roles)

print(df[['Role of Participant(s)']].to_string())

结果:

                                 Role of Participant(s)
0                     director,founder,owner,real_owner
1  board***director,board,real_owner***board,real_owner
2                           director,real_owner***owner

现在您可以尝试写为lambda

df['Role of Participant(s)'] = df['participants'].apply(lambda cell:"***".join([",".join(role['type'] for role in item['roles']) for item in cell]))

类似名称

def get_names(cell):
    
    results = []
    
    for item in cell:
        results.append(item['life']['name'])
    
    results = "***".join(results)

    return results

df['Name of Participant(s)'] = df['participants'].apply(get_names)

和现在的lambda

df['Name of Participant(s)'] = df['participants'].apply(lambda cell:"***".join(item['life']['name'] for item in cell))

编辑:

在一个apply中创建两列并跳过具有director角色的参与者的版本

import pandas as pd

data = {'participants': 
[
    [{'roles': [{'type': 'director'}, {'type': 'founder'}, {'type': 'owner'}, {'type': 'real_owner'}], 'life': {'name': 'Lichun Du'}}],
    [{'roles': [{'type': 'board'}], 'life': {'name': 'Erik Mølgaard'}}, {'roles': [{'type': 'director'}, {'type': 'board'}, {'type': 'real_owner'}], 'life': {'name': 'Mikael Bodholdt Linde'}}, {'roles': [{'type': 'board'}, {'type': 'real_owner'}], 'life': {'name': 'Dorte Bøcker Linde'}}],
    [{'roles': [{'type': 'director'}, {'type': 'real_owner'}], 'life': {'name': 'Kristian Løth Hougaard'}}, {'roles': [{'type': 'owner'}], 'life': {'name': 'WORLD JET HOLDING ApS'}}],
]
}

df = pd.DataFrame(data)

def get_names_and_roles(cell):
    
    all_names = []
    all_roles = []
    
    for item in cell:
        name = item['life']['name']
        roles = [role['type'] for role in item['roles']]

        if 'director' not in roles:
            all_names.append(name)
            all_roles.append(",".join(roles))
    
    all_names = "***".join(all_names)
    all_roles = "***".join(all_roles)

    return pd.Series([all_names, all_roles])


df[ ['Name of Participant(s)', 'Role of Participant(s)'] ] = df['participants'].apply(get_names_and_roles)

print(df[ ['Name of Participant(s)', 'Role of Participant(s)'] ].to_string())

结果:

               Name of Participant(s)    Role of Participant(s)
0                                                              
1  Erik Mølgaard***Dorte Bøcker Linde  board***board,real_owner
2               WORLD JET HOLDING ApS                     owner

这篇关于Pandas-从具有嵌套列表列表的现有列创建动态列时出错的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆