从包含数据列表的字典列表中创建 pandas 数据框 [英] Creating pandas dataframe from list of dictionaries containing lists of data
问题描述
我有一个具有这种结构的词典列表.
I have a list of dictionaries with this structure.
{
'data' : [[year1, value1], [year2, value2], ... m entries],
'description' : string,
'end' : string,
'f' : string,
'lastHistoricalperiod' : string,
'name' : string,
'series_id' : string,
'start' : int,
'units' : string,
'unitsshort' : string,
'updated' : string
}
我想把它放在一个看起来像熊猫数据框架中
I want to put this in a pandas DataFrame that looks like
year value updated (other dict keys ... )
0 2040 120.592468 2014-05-23T12:06:16-0400 other key-values
1 2039 120.189987 2014-05-23T12:06:16-0400 ...
2 other year-value pairs ...
...
n
其中n = m * len(带有字典的列表)(其中数据"中每个列表的长度= m)
where n = m* len(list with dictionaries) (where length of each list in 'data' = m)
也就是说,数据"中的每个元组都应该有自己的行.到目前为止,我所做的是:
That is, each tuple in 'data' should have its own row. What I've done thus far is this:
x = [list of dictionaries as described above]
# Create Empty Data Frame
output = pd.DataFrame()
# Loop through each dictionary in the list
for dictionary in x:
# Create a new DataFrame from the 2-D list alone.
data = dictionary['data']
y = pd.DataFrame(data, columns = ['year', 'value'])
# Loop through all the other dictionary key-value pairs and fill in values
for key in dictionary:
if key != 'data':
y[key] = dictionary[key]
# Concatenate most recent output with the dframe from this dictionary.
output = pd.concat([output_frame, y], ignore_index = True)
这似乎很hacky,我想知道是否还有一种更"pythonic"的方式来做到这一点,或者至少这里是否有明显的提速.
This seems very hacky, and I was wondering if there's a more 'pythonic' way to do this, or at least if there are any obvious speedups here.
推荐答案
如果您的数据采用[{},{},...]
格式,则可以执行以下操作...
If Your data is in the form [{},{},...]
you can do the following...
数据问题出在词典的数据键中.
The issue with your data is in the data key of your dictionaries.
df = pd.DataFrame(data)
fix = df.groupby(level=0)['data'].apply(lambda x:pd.DataFrame(x.iloc[0],columns = ['Year','Value']))
fix = fix.reset_index(level=1,drop=True)
df = pd.merge(fix,df.drop(['data'],1),how='inner',left_index=True,right_index=True)
代码执行以下操作...
The code does the following...
- 使用您的词典列表创建一个DataFrame
- 通过将数据列扩展为更多行来创建新的数据框
- 拉伸线导致多索引的列不相关-这将其删除
- 最后合并到原始索引并获得所需的DataFrame
这篇关于从包含数据列表的字典列表中创建 pandas 数据框的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!