将EIA Json转换为DataFrame-Python 3.6 [英] Convert EIA Json to DataFrame - Python 3.6

查看:144
本文介绍了将EIA Json转换为DataFrame-Python 3.6的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试从 http://api.eia.gov/bulk转换Json文件/INTL.zip 到数据框. 下面是我的代码

I was trying to convert Json File from http://api.eia.gov/bulk/INTL.zip to dataframe. Below is my code

import os, sys,json
import pandas as pd

sourcePath = r"D:\Learn\EIA\INTL.txt"

DF = pd.read_json(sourcePath, lines=True)
DF2 = DF[['series_id', 'name', 'units', 'geography', 'f', 'data']] # Need only these columns
DF2 = DF2.dropna(subset=['data']) # Delete if blank/NA

DF2[['Date', 'Value']] = pd.DataFrame([item for item in DF2.data]) # DF2.data contains list, converting to Data Frame

错误:-

回溯(最近通话最近): 第11行的文件"D:\ Sunil_Work \ psnl \ python \ pyCharm \ knoema \ EIA \ EIAINTL2018May.py" DF2 [[''Date','Value']] = pd.DataFrame([DF2.data中项目的项目]) setitem 中的文件"C:\ Python36 \ lib \ site-packages \ pandas \ core \ frame.py",第2326行 self._setitem_array(键,值) _setitem_array中的文件"C:\ Python36 \ lib \ site-packages \ pandas \ core \ frame.py",第2350行 引发ValueError('列必须与键的长度相同') ValueError:列的长度必须与键的长度相同

Traceback (most recent call last): File "D:\Sunil_Work\psnl\python\pyCharm\knoema\EIA\EIAINTL2018May.py", line 11, in DF2[['Date', 'Value']] = pd.DataFrame([item for item in DF2.data]) File "C:\Python36\lib\site-packages\pandas\core\frame.py", line 2326, in setitem self._setitem_array(key, value) File "C:\Python36\lib\site-packages\pandas\core\frame.py", line 2350, in _setitem_array raise ValueError('Columns must be same length as key') ValueError: Columns must be same length as key

我卡住了,请帮忙.

我需要如下结果:Date& DF.data列中的列表中存在的值

I need results like below: Date & Values present in List in DF.data column

DF2[['Date', 'Value']] = pd.DataFrame([item for item in DF2.data]).iloc[:,0:2] # This not working

jezrael解决方案后的新代码更改:

New Code changes after jezrael solution:

import os, sys, ast
import pandas as pd

sourcePath = r"C:\sunil_plus\dataset\EIAINTL2018May\8_updation2018Aug2\source\INTL.txt"

DF = pd.read_json(sourcePath, lines=True)
DF2 = DF[['series_id', 'name', 'units', 'geography', 'f', 'data']] # Need only these columns
DF2 = DF2.dropna(subset=['data'])

DF2['Date'] = [[x[0] for x in item] for item in DF2.data]
DF2['Values'] = [[x[1] for x in item] for item in DF2.data]

DF_All = pd.DataFrame(); DF4 = pd.DataFrame()
for series_id in DF2['series_id']:
    DF3 = DF2.loc[DF2['series_id'] == series_id]

    DF4['DateF'] = [item for item in DF3.Date] # Here I need to convert List values to Rows
    DF4['ValuesF'] = [item for item in DF3.Values] # Here I need to convert List values to Rows
    # Above code not working as expected

    DF3 = DF3[['series_id', 'name', 'units', 'geography', 'f']] # Need only these columns
    DF5 = pd.concat([DF3, DF4], axis=1).ffill() # Concat to get DateF & ValuesF Values
    DF_All = DF_All.append(DF5)

推荐答案

您可以使用2个list comprehension来匹配嵌套列表的第一个和第二个值:

You can use 2 list comprehensions for match first and second value of nested lists:

sourcePath = r"D:\Learn\EIA\INTL.txt"

DF = pd.read_json(sourcePath, lines=True)
DF2 = DF[['series_id', 'name', 'units', 'geography', 'f', 'data']] # Need only these columns
DF2 = DF2.dropna(subset=['data'])

DF2['Date'] = [[x[0] for x in item] for item in DF2.data]
DF2['Values'] = [[x[1] for x in item] for item in DF2.data]


print (DF2.head())

               series_id                                               name  \
0  INTL.51-8-MKD-MMTCD.A  CO2 Emissions from the Consumption of Natural ...   
1  INTL.51-8-SRB-MMTCD.A  CO2 Emissions from the Consumption of Natural ...   
2  INTL.51-8-SSD-MMTCD.A  CO2 Emissions from the Consumption of Natural ...   
3  INTL.51-8-SUN-MMTCD.A  CO2 Emissions from the Consumption of Natural ...   
4  INTL.51-8-SVK-MMTCD.A  CO2 Emissions from the Consumption of Natural ...   

                 units geography  f  \
0  Million Metric Tons       MKD  A   
1  Million Metric Tons       SRB  A   
2  Million Metric Tons       SSD  A   
3  Million Metric Tons       SUN  A   
4  Million Metric Tons       SVK  A   

                                                data  \
0  [[2015, 0.1], [2014, (s)], [2013, (s)], [2012,...   
1  [[2015, 4.1], [2014, 3.5], [2013, 4.2], [2012,...   
2  [[2011, --], [2010, --], [2006, --], [2003, --...   
3  [[2006, --], [2003, --], [2002, --], [2001, --...   
4  [[2015, 9.1], [2014, 8.8], [2013, 11], [2012, ...   

                                                Date  \
0  [2015, 2014, 2013, 2012, 2011, 2010, 2009, 200...   
1  [2015, 2014, 2013, 2012, 2011, 2010, 2009, 200...   
2  [2011, 2010, 2006, 2003, 2002, 2001, 2000, 199...   
3  [2006, 2003, 2002, 2001, 2000, 1999, 1998, 199...   
4  [2015, 2014, 2013, 2012, 2011, 2010, 2009, 200...   

                                              Values  
0  [0.1, (s), (s), 0.2, 0.2, 0.2, 0.2, 0.1, 0.1, ...  
1  [4.1, 3.5, 4.2, 5.2, 4.4, 4.1, 3.2, 4.2, 4.1, ...  
2  [--, --, --, --, --, --, --, --, --, --, --, -...  
3  [--, --, --, --, --, --, --, --, --, --, --, -...  
4  [9.1, 8.8, 11, 10, 11, 12, 10, 12, 12, 13, 14,...  

您可以重复行并创建新的2列:

You can repeat rows and create new 2 columns:

sourcePath = 'INTL.txt'

DF = pd.read_json(sourcePath, lines=True)

cols = ['series_id', 'name', 'units', 'geography', 'f', 'data']
DF2 = DF[cols].dropna(subset=['data'])

DF3 = DF2.join(pd.DataFrame(DF2.pop('data').values.tolist())
               .stack()
               .reset_index(level=1, drop=True)
               .rename('data')
               ).reset_index(drop=True)
DF3[['Date', 'Value']] = pd.DataFrame(DF3['data'].values.tolist())

#if want remove original data column
#DF3[['Date', 'Value']] = pd.DataFrame(DF3.pop('data').values.tolist())
print (DF3.head())

               series_id                                               name  \
0  INTL.51-8-MKD-MMTCD.A  CO2 Emissions from the Consumption of Natural ...   
1  INTL.51-8-MKD-MMTCD.A  CO2 Emissions from the Consumption of Natural ...   
2  INTL.51-8-MKD-MMTCD.A  CO2 Emissions from the Consumption of Natural ...   
3  INTL.51-8-MKD-MMTCD.A  CO2 Emissions from the Consumption of Natural ...   
4  INTL.51-8-MKD-MMTCD.A  CO2 Emissions from the Consumption of Natural ...   

                 units geography  f         data  Date Value  
0  Million Metric Tons       MKD  A  [2015, 0.1]  2015   0.1  
1  Million Metric Tons       MKD  A  [2014, (s)]  2014   (s)  
2  Million Metric Tons       MKD  A  [2013, (s)]  2013   (s)  
3  Million Metric Tons       MKD  A  [2012, 0.2]  2012   0.2  
4  Million Metric Tons       MKD  A  [2011, 0.2]  2011   0.2  

这篇关于将EIA Json转换为DataFrame-Python 3.6的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆