如何将此 JSON 时间序列数据放入 Pandas 数据框? [英] How do I get this JSON time series data into a pandas dataframe?

查看:35
本文介绍了如何将此 JSON 时间序列数据放入 Pandas 数据框?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有来自 API 的时间序列数据,我想进入 python pandas 数据帧.
你会怎么做?

I have time series data from an API I'd like to get in to a python pandas dataframe.
How would you do this?

数据如下:

[{'id': 38421212541,
  'sensor_id': 12944473,
  'value': '6852.426',
  'date': '2015-02-05',
  'min': '0.0',
  'max': '833.789',
  'avg': '285.5177',
  'values': '[344.336,306.05449999999996,269.922,233.9845,198.63299999999998,164.0625,128.9065,96.2895,66.797,50.391,27.344,10.938,8.203,6.641,4.297,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,10.352,23.047,41.797,59.961,93.75,135.938,171.875,209.375,247.266,304.1015,345.703,387.5,438.4765,489.6485,524.414,579.688,627.9300000000001,680.859,722.656,776.563,802.344,811.5235,816.016,822.2655,826.3675000000001,830.2735,830.078,831.836,831.836,833.789,831.25,828.906,828.125,828.3205,827.3435,825,821.875,820.508,817.188,811.7185,810.547,807.422,803.516,801.172,796.094,792.7735,788.672,786.328,783.5935,778.125,775,772.266,768.359,765.625,763.672,759.766,758.203,757.422,753.516,752.734,748.633,751.172,748.047,747.461,747.266,749.6095,747.266,748.828,747.266,748.438,752.9295,750.1949999999999,755.078,756.641,759.961,760.1565,764.4535000000001,767.383,767.188,770.703,774.219,774.6095,780.469,782.8125,785.547,785.156,793.359,794.531,795.8985,798.047,800,805.859,805.469,810.1565,815.4295,811.719,814.453,817.578,814.063,817.188,817.969,813.281,812.5,810.938,786.133,755.859,719.922,676.3675000000001,642.383,606.0550000000001,565.039,534.1795,494.922,448.24199999999996,412.305,379.8825]',
  'start_time': None,
  'start_value': None},
 {'id': 38421212540,
  'sensor_id': 12944473,
  'value': '6771.2902',
  'date': '2015-02-04',
  'min': '0.0',
  'max': '828.9065',
  'avg': '282.1371',
  'values': '[333.0075,297.0705,253.32049999999998,211.7185,173.828,157.422,100.391,100,69.5315,36.3285,24.219,13.281,12.891,10.547,6.836,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1.172,17.188,35.547,55.078,78.516,118.75,160.938,192.969,210.156,262.8905,309.766,376.953,416.0155,460.156,511.719,567.578,609.1800000000001,662.5,703.516,747.8515,777.344,792.578,801.3675000000001,812.6955,819.727,825,828.906,822.266,825,825.781,827.344,827.9295,828.9065,825.781,817.578,810.9375,806.25,806.25,803.516,799.0235,796.484,794.336,792.7735,787.6955,784.766,782.422,777.734,775.781,773.438,768.75,764.8435,763.086,763.672,754.883,755.8595,759.766,754.1015,749.4145,746.875,748.2425000000001,744.141,745.313,747.6565,742.969,739.8435,742.7735,742.7735,744.141,748.438,749.0235,740.039,750.781,748.242,751.172,755.664,757.813,764.258,765.625,763.672,766.992,770.8985,772.266,773.2425000000001,781.641,787.5,785.938,789.453,792.188,796.6800000000001,800.3905,798.8285000000001,801.563,802.344,805.859,805.2735,813.2815,808.984,810.156,813.672,812.109,812.6955,814.0625,808.5935,808.008,777.9300000000001,747.0705,707.8125,677.7345,636.7185,602.3435,566.992,529.4925000000001,486.3285,451.172,417.7735,382.8125]',
  'start_time': None,
  'start_value': None},
 {'id': 38421163212,
  'sensor_id': 12944473,
  'value': '6652.3768',
  'date': '2015-02-03',
  'min': '0.0',
  'max': '807.2265',
  'avg': '277.1824',
  'values': '[331.44550000000004,297.65599999999995,256.25,220.8985,184.57,149.6095,100.9765,66.9925,31.6405,42.382999999999996,17.969,12.5,8.594,6.641,3.906,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0.391,12.6955,20.8985,48.047,78.516,107.6175,150.586,185.9375,222.2655,260.15650000000005,312.8905,350,402.344,436.133,486.719,536.328,584.375,637.1095,675.781,717.969,762.8905,771.4845,778.5155,784.961,790.4295,791.406,797.461,800.7815,802.9295,803.5155,804.8824999999999,805.469,804.6875,803.3205,800,800.1955,798.047,797.8515,795.508,792.578,789.453,787.6955,784.766,779.297,777.9295,775.586,771.094,768.5545,766.016,762.5,758.203,756.25,753.906,750.391,747.656,746.6800000000001,741.797,738.672,736.5235,734.5705,738.086,732.422,732.422,733.594,732.813,732.0315,733.984,734.766,735.3515,736.719,737.6955,738.672,740.234,746.094,744.531,752.9295,751.172,755.859,757.031,761.7185,764.258,766.406,771.094,773.047,776.953,778.125,783.008,783.984,787.891,793.75,795.703,798.047,800.586,799.6095,798.8285000000001,800.9765,806.25,805.664,805.469,805.859,806.25,807.2265,804.883,803.125,772.8515,739.258,698.4375,665.625,624.0235,594.336,551.5625,514.6485,471.875,437.30449999999996,402.539,369.336]',
  'start_time': None,
  'start_value': None}]

我通过以下方式获得了这些数据:

I got that data with:

import requests
import json

# query for data from sensor with id number "12944473"
# populate response 'r' and JSON 'j' with the first page of the response
r = requests.get('https://foo-bar-company.com/sensors/12944473/daily_data', auth=(user, pwd))
j = r.json()

# see how many pages there are.
# loop through all pages, appending to 'j'
last_uri = r.links['last']['url']

while r.url != last_uri:
    r = requests.get(r.links['next']['url'], auth=(user, pwd))
    j.append(r.json())

# print a few days to share with SO
print(j[0:3:1])

目标是这样的:

                12944473
10/2/2015 0:00  344.336
10/2/2015 0:05  306.0545
10/2/2015 0:10  269.922
10/2/2015 0:15  233.9845
10/2/2015 0:20  198.633
10/2/2015 0:25  164.0625
10/2/2015 0:30  128.9065
10/2/2015 0:35  96.2895
10/2/2015 0:40  66.797
10/2/2015 0:45  50.391
10/2/2015 0:50  27.344
10/2/2015 0:55  10.938
...

我尽量保持简短和重点,但很高兴添加更多细节.数据是阳光测量值.这个问题与我发现的其他帖子有些不同,因为它是时间序列数据.

I tried to keep this short and to the point, but happy to add more details. The data are sunlight measurements. This question is somewhat unique from other posts I've found because of it being time-series data.

在构建日期时间索引时,您如何防止 DST 和闰年问题?

How do you protect against DST and leap year issues when you're building the date-time index?

我是 Python 新手,我希望有一个优雅的解决方案,我只是没见过.
提前致谢!

I'm new to Python and I'm hoping there's elegant solution to this I just haven't seen.
Thanks in advance!

Python 3.7
熊猫 0.25.2
foo bar 公司的 RESTful API

Python 3.7
Pandas 0.25.2
foo bar company's RESTful API

推荐答案

快速解答

假设您拥有最新版本的 Pandas.

Quick Answer

Assuming you have the latest version of pandas.

data = [{'date': x['date'], 'values': eval(x['values'])} for x in your_json_dict]
pd.DataFrame(data).explode('values')

结果

          date   values
0   2015-02-05  344.336
0   2015-02-05  306.054
0   2015-02-05  269.922
0   2015-02-05  233.984
0   2015-02-05  198.633
..         ...      ...
2   2015-02-03  514.649
2   2015-02-03  471.875
2   2015-02-03  437.304
2   2015-02-03  402.539
2   2015-02-03  369.336

说明

[{'date': x['date'], 'values': eval(x['values'])} for x in your_json_dict]

这是一个列表理解.它创建一个新的字典列表,其中每个字典都有键日期"和值".它还将值"从字符串转换为数字列表.

This is a list comprehension. It creates a new list of dictionaries where each dictionary has the keys 'date' and 'values'. It also converts 'values' from a string to a list of numbers.

pd.DataFrame(data).explode('values')

pandas 非常适合接受字典列表.Explode 函数是 0.25 版本中的一项新功能,它将列表的列扩展为该列表的每个元素的行.

pandas is perfectly fine with accepting a list of dictionaries. The explode function is a new feature in version 0.25 that expands a column that is a list into rows for each element of that list.

这篇关于如何将此 JSON 时间序列数据放入 Pandas 数据框?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆