如何规范 pandas 数据框中的多列字典 [英] How to normalize multiple columns of dicts in a pandas dataframe

查看：70 发布时间：2021/5/9 18:52:31 python pandas formatting json-normalize

本文介绍了如何规范 pandas 数据框中的多列字典的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我是编码的新手，我可以理解，这是一个非常基本的问题

I am new to coding and I can understand that this is a very basic question

我有一个数据框为:

df

      Unnamed: 0  time                 home_team      away_team       full_time_result                    both_teams_to_score        double_chance
--  ------------  -------------------  -------------  --------------  ----------------------------------  -------------------------  ------------------------------------
 0             0  2021-01-12 18:00:00  Sheff Utd      Newcastle       {'1': 2400, 'X': 3200, '2': 3100}   {'yes': 2000, 'no': 1750}  {'1X': 1360, '12': 1360, '2X': 1530}
 1             1  2021-01-12 20:15:00  Burnley        Man Utd         {'1': 7000, 'X': 4500, '2': 1440}   {'yes': 1900, 'no': 1900}  {'1X': 2620, '12': 1180, '2X': 1100}
 2             2  2021-01-12 20:15:00  Wolverhampton  Everton         {'1': 2450, 'X': 3200, '2': 3000}   {'yes': 1950, 'no': 1800}  {'1X': 1360, '12': 1360, '2X': 1530}
 3             3  2021-01-13 18:00:00  Man City       Brighton        {'1': 1180, 'X': 6500, '2': 14000}  {'yes': 2040, 'no': 1700}  {'1X': 1040, '12': 1110, '2X': 4500}
 4             4  2021-01-13 20:15:00  Aston Villa    Tottenham       {'1': 2620, 'X': 3500, '2': 2500}   {'yes': 1570, 'no': 2250}  {'1X': 1500, '12': 1280, '2X': 1440}
 5             5  2021-01-14 20:00:00  Arsenal        Crystal Palace  {'1': 1500, 'X': 4000, '2': 6500}   {'yes': 1950, 'no': 1800}  {'1X': 1110, '12': 1220, '2X': 2500}
 6             6  2021-01-15 20:00:00  Fulham         Chelsea         {'1': 5750, 'X': 4330, '2': 1530}   {'yes': 1800, 'no': 1950}  {'1X': 2370, '12': 1200, '2X': 1140}
 7             7  2021-01-16 12:30:00  Wolverhampton  West Brom       {'1': 1440, 'X': 4200, '2': 7500}   {'yes': 2250, 'no': 1570}  {'1X': 1100, '12': 1220, '2X': 2620}
 8             8  2021-01-16 15:00:00  Leeds          Brighton        {'1': 2000, 'X': 3600, '2': 3600}   {'yes': 1530, 'no': 2370}  {'1X': 1280, '12': 1280, '2X': 1720}

我希望很好地格式化字典列表，并获取数据框，例如将full_time_result列拆分为full_time_result_1，full_time_result_X，full_time_result_2，并且Both_teams_to_score和double_chance的内容相同，如下所示:

I am looking to format the dictionary list nicely and get the dataframe as e.g. the full_time_result column would be split into full_time_result_1, full_time_result_X, full_time_result_2 and the same for both_teams_to_score and double_chance as below:

      Unnamed: 0  time                 home_team      away_team       full_time_result_1                    full_time_result_x                    full_time_result_2                    both_teams_to_score_yes        both_teams_to_score_no        double_chance_1X
--  ------------  -------------------  -------------  --------------  ----------------------------------  -------------------------  ------------------------------------

我正在按照此示例在此处给出，但我无法使其正常工作.这是我的代码:

I am following this example given here but I am unable to get it to work. Here is my code:

import pandas as pd
from tabulate import tabulate
df = pd.read_csv(r'C:\Users\Harshad\Desktop\re.csv')
df['full_time_result'] = df['full_time_result'].apply(pd.Series)
print(tabulate(df, headers='keys'))

      Unnamed: 0  time                 home_team      away_team       full_time_result                    both_teams_to_score        double_chance
--  ------------  -------------------  -------------  --------------  ----------------------------------  -------------------------  ------------------------------------
 0             0  2021-01-12 18:00:00  Sheff Utd      Newcastle       {'1': 2400, 'X': 3200, '2': 3100}   {'yes': 2000, 'no': 1750}  {'1X': 1360, '12': 1360, '2X': 1530}
 1             1  2021-01-12 20:15:00  Burnley        Man Utd         {'1': 7000, 'X': 4500, '2': 1440}   {'yes': 1900, 'no': 1900}  {'1X': 2620, '12': 1180, '2X': 1100}
 2             2  2021-01-12 20:15:00  Wolverhampton  Everton         {'1': 2450, 'X': 3200, '2': 3000}   {'yes': 1950, 'no': 1800}  {'1X': 1360, '12': 1360, '2X': 1530}
 3             3  2021-01-13 18:00:00  Man City       Brighton        {'1': 1180, 'X': 6500, '2': 14000}  {'yes': 2040, 'no': 1700}  {'1X': 1040, '12': 1110, '2X': 4500}
 4             4  2021-01-13 20:15:00  Aston Villa    Tottenham       {'1': 2620, 'X': 3500, '2': 2500}   {'yes': 1570, 'no': 2250}  {'1X': 1500, '12': 1280, '2X': 1440}

我们将不胜感激.

推荐答案

验证列是 dict 类型，而不是 str 类型.

如果列为 str 类型，请使用 ast.literal_eval 对其进行转换.

Verify the columns are dict type, and not str type.

If the columns are str type, convert them with ast.literal_eval.

import pandas as pd
from ast import literal_eval

# test dataframe
data = {'time': ['2021-01-12 18:00:00', '2021-01-12 20:15:00', '2021-01-12 20:15:00', '2021-01-13 18:00:00', '2021-01-13 20:15:00', '2021-01-14 20:00:00', '2021-01-15 20:00:00', '2021-01-16 12:30:00', '2021-01-16 15:00:00'], 'home_team': ['Sheff Utd', 'Burnley', 'Wolverhampton', 'Man City', 'Aston Villa', 'Arsenal', 'Fulham', 'Wolverhampton', 'Leeds'], 'away_team': ['Newcastle', 'Man Utd', 'Everton', 'Brighton', 'Tottenham', 'Crystal Palace', 'Chelsea', 'West Brom', 'Brighton'], 'full_time_result': ["{'1': 2400, 'X': 3200, '2': 3100}", "{'1': 7000, 'X': 4500, '2': 1440}", "{'1': 2450, 'X': 3200, '2': 3000}", "{'1': 1180, 'X': 6500, '2': 14000}", "{'1': 2620, 'X': 3500, '2': 2500}", "{'1': 1500, 'X': 4000, '2': 6500}", "{'1': 5750, 'X': 4330, '2': 1530}", "{'1': 1440, 'X': 4200, '2': 7500}", "{'1': 2000, 'X': 3600, '2': 3600}"], 'both_teams_to_score': ["{'yes': 2000, 'no': 1750}", "{'yes': 1900, 'no': 1900}", "{'yes': 1950, 'no': 1800}", "{'yes': 2040, 'no': 1700}", "{'yes': 1570, 'no': 2250}", "{'yes': 1950, 'no': 1800}", "{'yes': 1800, 'no': 1950}", "{'yes': 2250, 'no': 1570}", "{'yes': 1530, 'no': 2370}"], 'double_chance': ["{'1X': 1360, '12': 1360, '2X': 1530}", "{'1X': 2620, '12': 1180, '2X': 1100}", "{'1X': 1360, '12': 1360, '2X': 1530}", "{'1X': 1040, '12': 1110, '2X': 4500}", "{'1X': 1500, '12': 1280, '2X': 1440}", "{'1X': 1110, '12': 1220, '2X': 2500}", "{'1X': 2370, '12': 1200, '2X': 1140}", "{'1X': 1100, '12': 1220, '2X': 2620}", "{'1X': 1280, '12': 1280, '2X': 1720}"]}
df = pd.DataFrame(data)

# display(df.head(2))
                  time  home_team  away_team                   full_time_result        both_teams_to_score                         double_chance
0  2021-01-12 18:00:00  Sheff Utd  Newcastle  {'1': 2400, 'X': 3200, '2': 3100}  {'yes': 2000, 'no': 1750}  {'1X': 1360, '12': 1360, '2X': 1530}
1  2021-01-12 20:15:00    Burnley    Man Utd  {'1': 7000, 'X': 4500, '2': 1440}  {'yes': 1900, 'no': 1900}  {'1X': 2620, '12': 1180, '2X': 1100}

# convert time to datetime
df.time = pd.to_datetime(df.time)

# determine if columns are str or dict type
print(type(df.iloc[0, 3]))
[out]:
str

# convert columns from str to dict only if the columns are str type
df.iloc[:, 3:] = df.iloc[:, 3:].applymap(literal_eval)

# normalize columns and rename headers
ftr = pd.json_normalize(df.full_time_result)
ftr.columns = [f'full_time_result_{col}' for col in ftr.columns]

btts = pd.json_normalize(df.both_teams_to_score)
btts.columns = [f'both_teams_to_score_{col}' for col in btts.columns]

dc = pd.json_normalize(df.double_chance)
dc.columns = [f'double_chance_{col}' for col in dc.columns]

# concat the dataframes
df_normalized = pd.concat([df.iloc[:, :3], ftr, btts, dc], axis=1)

`display(df_normalized)`

                 time      home_team       away_team  full_time_result_1  full_time_result_X  full_time_result_2  both_teams_to_score_yes  both_teams_to_score_no  double_chance_1X  double_chance_12  double_chance_2X
0 2021-01-12 18:00:00      Sheff Utd       Newcastle                2400                3200                3100                     2000                    1750              1360              1360              1530
1 2021-01-12 20:15:00        Burnley         Man Utd                7000                4500                1440                     1900                    1900              2620              1180              1100
2 2021-01-12 20:15:00  Wolverhampton         Everton                2450                3200                3000                     1950                    1800              1360              1360              1530
3 2021-01-13 18:00:00       Man City        Brighton                1180                6500               14000                     2040                    1700              1040              1110              4500
4 2021-01-13 20:15:00    Aston Villa       Tottenham                2620                3500                2500                     1570                    2250              1500              1280              1440
5 2021-01-14 20:00:00        Arsenal  Crystal Palace                1500                4000                6500                     1950                    1800              1110              1220              2500
6 2021-01-15 20:00:00         Fulham         Chelsea                5750                4330                1530                     1800                    1950              2370              1200              1140
7 2021-01-16 12:30:00  Wolverhampton       West Brom                1440                4200                7500                     2250                    1570              1100              1220              2620
8 2021-01-16 15:00:00          Leeds        Brighton                2000                3600                3600                     1530                    2370              1280              1280              1720

合并代码

# convert the columns to dict type if they are str type
df.iloc[:, 3:] = df.iloc[:, 3:].applymap(literal_eval)

# normalize all columns
df_list = list()

for col in df.columns[3:]:
    v = pd.json_normalize(df[col])
    v.columns = [f'{col}_{c}' for c in v.columns]
    df_list.append(v)

# combine into one dataframe
df_normalized = pd.concat([df.iloc[:, :3]] + df_list, axis=1)

这篇关于如何规范 pandas 数据框中的多列字典的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

如何规范 pandas 数据框中的多列字典 [英] How to normalize multiple columns of dicts in a pandas dataframe

问题描述

推荐答案

`display(df_normalized)`

合并代码

相关文章

Python最新文章

热门教程

热门工具

登录关闭

如何规范 pandas 数据框中的多列字典 [英] How to normalize multiple columns of dicts in a pandas dataframe

问题描述

推荐答案

display(df_normalized)

合并代码

相关文章

Python最新文章

热门教程

热门工具

登录 关闭

`display(df_normalized)`

登录关闭