从包含数据列表的字典列表中创建 pandas 数据框 [英] Creating pandas dataframe from list of dictionaries containing lists of data

查看:76
本文介绍了从包含数据列表的字典列表中创建 pandas 数据框的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个具有这种结构的词典列表.

I have a list of dictionaries with this structure.

    {
        'data' : [[year1, value1], [year2, value2], ... m entries],
        'description' : string,
        'end' : string,
        'f' : string,
        'lastHistoricalperiod' : string, 
        'name' : string,
        'series_id' : string,
        'start' : int,
        'units' : string,
        'unitsshort' : string,
        'updated' : string
    }

我想把它放在一个看起来像熊猫数据框架中

I want to put this in a pandas DataFrame that looks like

   year       value  updated                   (other dict keys ... )
0  2040  120.592468  2014-05-23T12:06:16-0400  other key-values
1  2039  120.189987  2014-05-23T12:06:16-0400  ...
2  other year-value pairs ...
...
n

其中n = m * len(带有字典的列表)(其中数据"中每个列表的长度= m)

where n = m* len(list with dictionaries) (where length of each list in 'data' = m)

也就是说,数据"中的每个元组都应该有自己的行.到目前为止,我所做的是:

That is, each tuple in 'data' should have its own row. What I've done thus far is this:

x = [list of dictionaries as described above]
# Create Empty Data Frame
output = pd.DataFrame()

    # Loop through each dictionary in the list
    for dictionary in x:
        # Create a new DataFrame from the 2-D list alone.
        data = dictionary['data']
        y = pd.DataFrame(data, columns = ['year', 'value'])
        # Loop through all the other dictionary key-value pairs and fill in values
        for key in dictionary:
            if key != 'data':
                y[key] = dictionary[key]
        # Concatenate most recent output with the dframe from this dictionary.
        output = pd.concat([output_frame, y], ignore_index = True)

这似乎很hacky,我想知道是否还有一种更"pythonic"的方式来做到这一点,或者至少这里是否有明显的提速.

This seems very hacky, and I was wondering if there's a more 'pythonic' way to do this, or at least if there are any obvious speedups here.

推荐答案

如果您的数据采用[{},{},...]格式,则可以执行以下操作...

If Your data is in the form [{},{},...] you can do the following...

数据问题出在词典的数据键中.

The issue with your data is in the data key of your dictionaries.

df = pd.DataFrame(data)
fix = df.groupby(level=0)['data'].apply(lambda x:pd.DataFrame(x.iloc[0],columns = ['Year','Value']))
fix = fix.reset_index(level=1,drop=True)
df = pd.merge(fix,df.drop(['data'],1),how='inner',left_index=True,right_index=True)

代码执行以下操作...

The code does the following...

  1. 使用您的词典列表创建一个DataFrame
  2. 通过将数据列扩展为更多行来创建新的数据框
  3. 拉伸线导致多索引的列不相关-这将其删除
  4. 最后合并到原始索引并获得所需的DataFrame

这篇关于从包含数据列表的字典列表中创建 pandas 数据框的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆