将包含字典列表的pandas数据框列解压缩为新列 [英] Unpacking a pandas dataframe column that contains a list of dictionaries into new columns

查看:66
本文介绍了将包含字典列表的pandas数据框列解压缩为新列的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个数据框 new_df ,其中有一列,其中包含字典列表以及一些行 NaN .

I have a dataframe new_df that has one column, which contains a list of dictionaries, with some rows NaN.

new_df
                                                            0
    0                                                 NaN
    1                                                 NaN
    2   [{'start_time': '09:16:44', 'e...
    3   [{'start_time': '09:36:44', 'e...
    4   [{'start_time': '09:46:44', 'e...
    5   [{'start_time': '09:48:44', 'e...
    6   [{'start_time': '09:55:44', 'e...
    7   [{'start_time': '09:59:44', 'e...
    8   [{'start_time': '10:50:22', 'e...
    9   [{'start_time': '11:30:22', 'e...
    10  [{'start_time': '11:35:22', 'e...
    11  [{'start_time': '12:50:22', 'e...
    12                                                NaN
    13                                                NaN

当一行包含包含字典的列表时,其格式如下:

When a row contains a list containing a dictionary it is in this format:

[{'start_time': '09:16:44', 'end_time': '9:36:44', 'job_id': '123456'}]

我需要将 new_df 中每个列表/行中的字典解压缩到新列中,并将这些新列应用于另一个数据框.

I need to unpack the dictionary in each list/row in new_df into new columns and apply these new columns to another dataframe.

我遇到的问题是保留 new_df 的索引,因为将新列数据正确地应用于其他数据框是必需的.

The problem I am having is preserving the index of new_df as it is needed to correctly apply the new column data to the other dataframe.

我可以解压缩列表并从字典值创建新列,但是当我应用新列时,它们将应用于 row [0] 而不是 row [2] 在这种情况下.我丢失了行值是 NaN 的开头和结尾的行.

I can unpack the lists and create new columns from the dictionary values, but when I apply the new columns, they apply to row[0] instead of row[2] in this case. I lose the rows at the beginning and end where the row values are NaN.

add_df = pd.DataFrame(list(new_df[0]))

生产

  start_time   end_time   job_id  
0  09:16:44  09:36:44     123456
1  09:36:44  09:46:44     123457
2  09:46:44  09:48:44     123458
3  09:48:44  09:59:59     123459
      ...      ...          ...
8  11:35:22  12:45:00     123460
9  12:50:22  13:00:00     123461

需要是要保留如下所示的索引,即保存字典列表的 new_df 中的索引:

What I need is to preserve the indexes like shown below, the indexes from new_df that holds the lists of dictionaries:

      start_time   end_time   job_id  
    0    NaN        NaN         NaN
    1    NaN        NaN         NaN
    2  09:16:44  09:36:44     123456
    3  09:36:44  09:46:44     123457
    4  09:46:44  09:48:44     123458
    5  09:48:44  09:59:59     123459
          ...      ...          ...
   10  11:35:22  12:45:00     123460
   11  12:50:22  13:00:00     123461
   12    NaN        NaN         NaN
   13    NaN        NaN         NaN

如何保存索引并保留前行和后行 NaN 行?

How can I preserve the index to and have the leading and trailing NaN rows?

推荐答案

@ Ben.T的评论让我想到了我想要实现的目标.

The comment made by @Ben.T made me think of what I was trying to accomplish.

我从一系列字典创建一个数据框.当我可以将新数据框应用于列轴上的现有数据框时,为什么要逐列剥离此新数据框?

I was creating a dataframe from a series that is a list of dictionaries. Why was I peeling off this new dataframe column by column, when I could apply the new dataframe to the existing dataframe on the column axis?

我的解决方案:

# Creates df but removes the NaN elements
new_df = pd.DataFrame(list(orig_df[0]).dropna())   

# Get the orig_df indexes of non-NaN rows to apply to the new df
new_ndx = new_df.index[orig_df[0].notna()]

# Reset index and give new indexes that will line up
new_df = new_df.reset_index(drop=True)
new_df = new_df.set_index(new_ndx)

# Now apply the new_df to the orig_df
orig_df= pd.concat([orig_df, new_df ], axis=1)

也许还有一种更Python化的方式来完成此任务...?

Is there maybe a more pythonic way to accomplish this...?

这篇关于将包含字典列表的pandas数据框列解压缩为新列的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆