我应该如何使用 Pandas 处理时间序列数据中的重复时间? [英] How should I Handle duplicate times in time series data with pandas?

查看：90 发布时间：2021/6/13 20:45:57 python pandas time-series data-processing

本文介绍了我应该如何使用 Pandas 处理时间序列数据中的重复时间?的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

作为更大数据集的一部分，我从 API 调用中返回了以下内容:

<块引用>

{'时间': datetime.datetime(2017, 5, 21, 18, 18, 1,tzinfo=tzutc()), '价格':'0.052600'}

{'时间': datetime.datetime(2017, 5, 21, 18, 18, 1, tzinfo=tzutc()),'价格':'0.052500'}

理想情况下，我会使用时间戳作为 Pandas 数据框的索引，但是这似乎失败了，因为在转换为 JSON 时有重复:

df = df.set_index(pd.to_datetime(df['Timestamp']))打印(new_df.to_json(orient='index'))

<块引用>

ValueError:对于 orient='index'，DataFrame 索引必须是唯一的.

有关处理这种情况的最佳方法的任何指导吗?扔掉一个数据点?时间并没有比到秒更细粒度，而且在那一秒内显然有价格变化.

解决方案

我认为您可以通过 cumcount 和 to_timedelta:

d = [{'Time': datetime.datetime(2017, 5, 21, 18, 18, 1), 'Price': '0.052600'},{'时间': datetime.datetime(2017, 5, 21, 18, 18, 1), '价格': '0.052500'}]df = pd.DataFrame(d)打印 (df)价格时间0 0.052600 2017-05-21 18:18:011 0.052500 2017-05-21 18:18:01打印 (pd.to_timedelta(df.groupby('Time').cumcount(), unit='ms'))0 00:00:001 00:00:00.001000数据类型:timedelta64[ns]df['Time'] = df['Time'] + pd.to_timedelta(df.groupby('Time').cumcount(), unit='ms')打印 (df)价格时间0 0.052600 2017-05-21 18:18:01.0001 0.052500 2017-05-21 18:18:01.001

<小时>

new_df = df.set_index('时间')打印(new_df.to_json(orient='index')){"1495390681000":{"Price":"0.052600"},"1495390681001":{"Price":"0.052500"}}

I have the following returned from an API Call as part of a larger dataset:

{'Time': datetime.datetime(2017, 5, 21, 18, 18, 1, tzinfo=tzutc()), 'Price': '0.052600'}

{'Time': datetime.datetime(2017, 5, 21, 18, 18, 1, tzinfo=tzutc()), 'Price': '0.052500'}

Ideally I would use the timestamp as an index on the pandas data frame however this appears to fail as there is a duplicate when converting to JSON:

df = df.set_index(pd.to_datetime(df['Timestamp']))
print(new_df.to_json(orient='index'))

ValueError: DataFrame index must be unique for orient='index'.

Any guidance on the best way to deal with this situation? Throw away one datapoint? The time does not get more fine-grain than to the second, and there is obviously a price change during that second.

解决方案

I think you can change duplicates datetimes by adding ms by cumcount and to_timedelta:

d = [{'Time': datetime.datetime(2017, 5, 21, 18, 18, 1), 'Price': '0.052600'},
     {'Time': datetime.datetime(2017, 5, 21, 18, 18, 1), 'Price': '0.052500'}]
df = pd.DataFrame(d)
print (df)
      Price                Time
0  0.052600 2017-05-21 18:18:01
1  0.052500 2017-05-21 18:18:01

print (pd.to_timedelta(df.groupby('Time').cumcount(), unit='ms'))
0          00:00:00
1   00:00:00.001000
dtype: timedelta64[ns]

df['Time'] = df['Time'] + pd.to_timedelta(df.groupby('Time').cumcount(), unit='ms')
print (df)
      Price                    Time
0  0.052600 2017-05-21 18:18:01.000
1  0.052500 2017-05-21 18:18:01.001

new_df = df.set_index('Time')
print(new_df.to_json(orient='index'))
{"1495390681000":{"Price":"0.052600"},"1495390681001":{"Price":"0.052500"}}

这篇关于我应该如何使用 Pandas 处理时间序列数据中的重复时间?的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

我应该如何使用 Pandas 处理时间序列数据中的重复时间? [英] How should I Handle duplicate times in time series data with pandas?

问题描述

相关文章

Python最新文章

热门教程

热门工具

登录关闭

我应该如何使用 Pandas 处理时间序列数据中的重复时间? [英] How should I Handle duplicate times in time series data with pandas?

问题描述

相关文章

Python最新文章

热门教程

热门工具

登录 关闭

登录关闭