删除两个时间序列中的相同异常值 [英] Deleting the same outliers in two timeseries

查看:34
本文介绍了删除两个时间序列中的相同异常值的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个关于从两个时间序列中消除异常值的问题.一个时间序列包括现货市场价格,另一个包括电力输出.这两个系列是从 2012 年到 2016 年,都是带有时间戳和值的 CSV 文件.例如功率输出:2012-01-01 00:00:00,2335.2152646951617 和价格:2012-01-01 00:00:00,17.2

I have a question about eliminating outliers from two-time series. One time series includes spot market prices and the other includes power outputs. The two series are from 2012 to 2016 and are both CSV files with the with a timestamp and then a value. As example for the power output: 2012-01-01 00:00:00,2335.2152646951617 and for the price: 2012-01-01 00:00:00,17.2

因为现货市场价格波动很大,并且有很多异常值,我已经过滤掉了.对于第二个时间序列,我必须删除具有相同时间戳的值,这些值在价格的时间序列中被消除.我想用删除的值生成一个列表并编写一个循环来删除第二个时间序列中具有相同时间戳的值.但到目前为止,这还没有奏效,我还没有真正开始.有人有想法吗?

Because the spot market prices are very volatile and have a lot of outliers, I have filtered them. For the second time series, I have to delete the values with the same timestamp, which were eliminated in the time series of the prices. I thought about generating a list with the deleted values and writing a loop to delete the values with the same timestamp in the second time series. But so far that has not worked and I'm not really on. Does anyone have an idea?

我的python代码如下:

My python code looks as follow:

import pandas as pd
import matplotlib.pyplot as plt

power_output = pd.read_csv("./data/external/power_output.csv", delimiter=",", parse_dates=[0], index_col=[0])
print(power_output.head())
plt.plot(power_output)

spotmarket = pd.read_csv("./data/external/spotmarket_dhp.csv", delimiter=",", parse_dates=[0], index_col=[0])
print(spotmarket.head())

r = spotmarket['price'].pct_change().dropna() * 100
print(r)
plt.plot(r)

Q1 = r.quantile(.25)
Q3 = r.quantile(.75)
q1 = Q1-2*(Q3-Q1)
q3 = Q3+2*(Q3-Q1)

a = r[r.between(q1, q3)]
print(a)
plt.plot(a)

有人可以帮我吗?

推荐答案

如果您的问题是关于如何比较两个时间戳,您可以查看 这个.

If your question is about how to compare two timestamps you can have a look at this.

基本上你可以这样做:

out = r[~r.between(q1, q3)] # negation of your between to get the outliers
df=pd.merge(spotmarker,out,on=['date'],how="outer",indicator=True)
df=df[df['_merge']=='left_only']

这是一个合并操作,只保留那些只出现在左边数据框中的行

Which is a merge operation that conserves only those rows that are only present in the left dataframe

这篇关于删除两个时间序列中的相同异常值的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆