Python - 在情节中寻找模式 [英] Python - finding pattern in a plot
问题描述
此图由以下 gnuplot
脚本生成.estimated.csv
文件可在此链接中找到:
这是我的初步方法
#!/usr/bin/env python导入系统将 numpy 导入为 np从 shapely.geometry 导入 LineString#-------------------------------------------------------------------------------def load_data(fname):返回 LineString(np.genfromtxt(fname, delimiter = ','))#-------------------------------------------------------------------------------行 = 列表(地图(加载数据,sys.argv[1:]))对于行 [0].intersection(lines[1]) 中的 g:如果 g.geom_type != '点':继续打印('%f,%f' % (g.x, g.y))
然后直接在我的 gnuplot
中调用这个 python 脚本,如下所示:
设置终端pngcairo设置输出'fig.png'设置数据文件分隔符逗号设置年 [0:700]设置 xr [0:10]设置 xtics 0,2,10设置 ytics 0,100,700设置网格设置 xlabel "时间 [秒]"设置 ylabel段"阴谋 \'estimated.csv' w l lc rgb '深蓝色' t '估计', \'actual.csv' w l lc rgb 'green' t 'Actual', \'<python filter.pyestimated.csv actual.csv' w p lc rgb 'red' ps 0.5 pt 7 t ''
这给了我们下面的情节.但这似乎没有给我正确的模式,因为 gnuplot 不是执行此类任务的最佳工具.
有什么方法可以通过使用 python 将峰形成图来找到第一个图 (estimated.csv
) 的模式?如果我们从最后看,模式实际上似乎是可见的.任何帮助,将不胜感激.
我认为 pandas.rolling_max()
是正确的方法.我们将数据加载到 DataFrame 中并计算超过 8500 个值的滚动最大值.之后曲线看起来很相似.您可以稍微测试一下参数以优化结果.
将 numpy 导入为 np导入 matplotlib.pyplot 作为 plt将熊猫导入为 pdplt.ion()名称 = ['actual.csv','estimated.csv']#-------------------------------------------------------------------------------def load_data(fname):return np.genfromtxt(fname, delimiter = ',')#-------------------------------------------------------------------------------data = [load_data(name) for name in names]实际数据 = 数据 [0]估计数据 = 数据 [1]df = pd.read_csv('estimated.csv', names=('x','y'))df['rolling_max'] = pd.rolling_max(df['y'],8500)plt.figure()plt.plot(actual_data[:,0],actual_data[:,1], label='actual')plt.plot(estimated_data[:,0],estimated_data[:,1], label='estimated')plt.plot(df['x'], df['rolling_max'], label = 'rolling')plt.legend()plt.title('实际与插值')plt.xlim(0,10)plt.ylim(0,500)plt.xlabel('时间 [秒]')plt.ylabel('Segments')plt.grid()plt.show(块=真)
从评论中回答问题:
由于 pd.rolling()
正在为您的数据生成定义的窗口,因此 pd.rolling().max
的第一个值将是 NaN
代码>.要替换这些 NaN
,我建议翻转整个系列并向后计算窗口.之后,我们可以将所有 NaN
替换为向后计算的值.我调整了向后计算的窗口长度.否则我们会得到错误的数据.
此代码有效:
将 numpy 导入为 np导入 matplotlib.pyplot 作为 plt将熊猫导入为 pdplt.ion()df = pd.read_csv('estimated.csv', names=('x','y'))df['rolling_max'] = df['y'].rolling(8500).max()df['rolling_max_backwards'] = df['y'][::-1].rolling(850).max()df.rolling_max.fillna(df.rolling_max_backwards,就地=真)plt.figure()plt.plot(df['x'], df['rolling_max'], label = 'rolling')plt.legend()plt.title('实际与插值')plt.xlim(0,10)plt.ylim(0,700)plt.xlabel('时间 [秒]')plt.ylabel('Segments')plt.grid()plt.show(块=真)
我们得到以下结果:
This graph is generated by the following gnuplot
script. The estimated.csv
file is found in this link: https://drive.google.com/open?id=0B2Iv8dfU4fTUaGRWMm9jWnBUbzg
# ###### GNU Plot
set style data lines
set terminal postscript eps enhanced color "Times" 20
set output "cubic33_cwd_estimated.eps"
set title "Estimated signal"
set style line 99 linetype 1 linecolor rgb "#999999" lw 2
#set border 1 back ls 11
set key right top
set key box linestyle 50
set key width -2
set xrange [0:10]
set key spacing 1.2
#set nokey
set grid xtics ytics mytics
#set size 2
#set size ratio 0.4
#show timestamp
set xlabel "Time [Seconds]"
set ylabel "Segments"
set style line 1 lc rgb "#ff0000" lt 1 pi 0 pt 4 lw 4 ps 0
# Congestion control send window
plot "estimated.csv" using ($1):2 with lines title "Estimated";
I wanted to find the pattern of the estimated signal of the previous plot something close to the following plot. My ground truth (actual signal is shown in the following plot)
Here is my initial approach
#!/usr/bin/env python
import sys
import numpy as np
from shapely.geometry import LineString
#-------------------------------------------------------------------------------
def load_data(fname):
return LineString(np.genfromtxt(fname, delimiter = ','))
#-------------------------------------------------------------------------------
lines = list(map(load_data, sys.argv[1:]))
for g in lines[0].intersection(lines[1]):
if g.geom_type != 'Point':
continue
print('%f,%f' % (g.x, g.y))
Then invoke this python script in my gnuplot
directly as in the following:
set terminal pngcairo
set output 'fig.png'
set datafile separator comma
set yr [0:700]
set xr [0:10]
set xtics 0,2,10
set ytics 0,100,700
set grid
set xlabel "Time [seconds]"
set ylabel "Segments"
plot \
'estimated.csv' w l lc rgb 'dark-blue' t 'Estimated', \
'actual.csv' w l lc rgb 'green' t 'Actual', \
'<python filter.py estimated.csv actual.csv' w p lc rgb 'red' ps 0.5 pt 7 t ''
which gives us the following plot. But this does not seem to give me the right pattern as gnuplot is not the best tool for such tasks.
Is there any way where we can find the pattern of the first graph (estimated.csv
) by forming the peaks into a plot using python? If we see from the end, the pattern actually seems to be visible. Any help would be appreciated.
I think pandas.rolling_max()
is the right approach here. We are loading the data into a DataFrame and calculate the rolling maximum over 8500 values. Afterwards the curves look similar. You may test with the parameter a little bit to optimize the result.
import numpy as np
import matplotlib.pyplot as plt
import pandas as pd
plt.ion()
names = ['actual.csv','estimated.csv']
#-------------------------------------------------------------------------------
def load_data(fname):
return np.genfromtxt(fname, delimiter = ',')
#-------------------------------------------------------------------------------
data = [load_data(name) for name in names]
actual_data = data[0]
estimated_data = data[1]
df = pd.read_csv('estimated.csv', names=('x','y'))
df['rolling_max'] = pd.rolling_max(df['y'],8500)
plt.figure()
plt.plot(actual_data[:,0],actual_data[:,1], label='actual')
plt.plot(estimated_data[:,0],estimated_data[:,1], label='estimated')
plt.plot(df['x'], df['rolling_max'], label = 'rolling')
plt.legend()
plt.title('Actual vs. Interpolated')
plt.xlim(0,10)
plt.ylim(0,500)
plt.xlabel('Time [Seconds]')
plt.ylabel('Segments')
plt.grid()
plt.show(block=True)
To answer the question from the comments:
Since pd.rolling()
is generating defined windows of your data, the first values will be NaN
for pd.rolling().max
. To replace these NaN
s, I suggest to turn around the whole Series and to calculate the windows backwards. Afterwards, we can replace all the NaN
s by the values from the backwards calculation. I adjusted the window length for the backwards calculation. Otherwise we get erroneous data.
This code works:
import numpy as np
import matplotlib.pyplot as plt
import pandas as pd
plt.ion()
df = pd.read_csv('estimated.csv', names=('x','y'))
df['rolling_max'] = df['y'].rolling(8500).max()
df['rolling_max_backwards'] = df['y'][::-1].rolling(850).max()
df.rolling_max.fillna(df.rolling_max_backwards, inplace=True)
plt.figure()
plt.plot(df['x'], df['rolling_max'], label = 'rolling')
plt.legend()
plt.title('Actual vs. Interpolated')
plt.xlim(0,10)
plt.ylim(0,700)
plt.xlabel('Time [Seconds]')
plt.ylabel('Segments')
plt.grid()
plt.show(block=True)
And we get the following result:
这篇关于Python - 在情节中寻找模式的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!