Python - 在情节中寻找模式 [英] Python - finding pattern in a plot

查看:65
本文介绍了Python - 在情节中寻找模式的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

此图由以下 gnuplot 脚本生成.estimated.csv 文件可在此链接中找到:

这是我的初步方法

#!/usr/bin/env python导入系统将 numpy 导入为 np从 shapely.geometry 导入 LineString#-------------------------------------------------------------------------------def load_data(fname):返回 LineString(np.genfromtxt(fname, delimiter = ','))#-------------------------------------------------------------------------------行 = 列表(地图(加载数据,sys.argv[1:]))对于行 [0].intersection(lines[1]) 中的 g:如果 g.geom_type != '点':继续打印('%f,%f' % (g.x, g.y))

然后直接在我的 gnuplot 中调用这个 python 脚本,如下所示:

设置终端pngcairo设置输出'fig.png'设置数据文件分隔符逗号设置年 [0:700]设置 xr [0:10]设置 xtics 0,2,10设置 ytics 0,100,700设置网格设置 xlabel "时间 [秒]"设置 ylabel段"阴谋 \'estimated.csv' w l lc rgb '深蓝色' t '估计', \'actual.csv' w l lc rgb 'green' t 'Actual', \'<python filter.pyestimated.csv actual.csv' w p lc rgb 'red' ps 0.5 pt 7 t ''

这给了我们下面的情节.但这似乎没有给我正确的模式,因为 gnuplot 不是执行此类任务的最佳工具.

有什么方法可以通过使用 python 将峰形成图来找到第一个图 (estimated.csv) 的模式?如果我们从最后看,模式实际上似乎是可见的.任何帮助,将不胜感激.

解决方案

我认为 pandas.rolling_max() 是正确的方法.我们将数据加载到 DataFrame 中并计算超过 8500 个值的滚动最大值.之后曲线看起来很相似.您可以稍微测试一下参数以优化结果.

将 numpy 导入为 np导入 matplotlib.pyplot 作为 plt将熊猫导入为 pdplt.ion()名称 = ['actual.csv','estimated.csv']#-------------------------------------------------------------------------------def load_data(fname):return np.genfromtxt(fname, delimiter = ',')#-------------------------------------------------------------------------------data = [load_data(name) for name in names]实际数据 = 数据 [0]估计数据 = 数据 [1]df = pd.read_csv('estimated.csv', names=('x','y'))df['rolling_max'] = pd.rolling_max(df['y'],8500)plt.figure()plt.plot(actual_data[:,0],actual_data[:,1], label='actual')plt.plot(estimated_data[:,0],estimated_data[:,1], label='estimated')plt.plot(df['x'], df['rolling_max'], label = 'rolling')plt.legend()plt.title('实际与插值')plt.xlim(0,10)plt.ylim(0,500)plt.xlabel('时间 [秒]')plt.ylabel('Segments')plt.grid()plt.show(块=真)

从评论中回答问题:

由于 pd.rolling() 正在为您的数据生成定义的窗口,因此 pd.rolling().max 的第一个值将是 NaN代码>.要替换这些 NaN ,我建议翻转整个系列并向后计算窗口.之后,我们可以将所有 NaN 替换为向后计算的值.我调整了向后计算的窗口长度.否则我们会得到错误的数据.

此代码有效:

将 numpy 导入为 np导入 matplotlib.pyplot 作为 plt将熊猫导入为 pdplt.ion()df = pd.read_csv('estimated.csv', names=('x','y'))df['rolling_max'] = df['y'].rolling(8500).max()df['rolling_max_backwards'] = df['y'][::-1].rolling(850).max()df.rolling_max.fillna(df.rolling_max_backwards,就地=真)plt.figure()plt.plot(df['x'], df['rolling_max'], label = 'rolling')plt.legend()plt.title('实际与插值')plt.xlim(0,10)plt.ylim(0,700)plt.xlabel('时间 [秒]')plt.ylabel('Segments')plt.grid()plt.show(块=真)

我们得到以下结果:

This graph is generated by the following gnuplot script. The estimated.csv file is found in this link: https://drive.google.com/open?id=0B2Iv8dfU4fTUaGRWMm9jWnBUbzg

# ###### GNU Plot
   set style data lines
   set terminal postscript eps enhanced color "Times" 20

   set output "cubic33_cwd_estimated.eps"

   set title "Estimated signal"

    set style line 99 linetype 1 linecolor rgb "#999999" lw 2
    #set border 1 back ls 11
    set key right top
    set key box linestyle 50
    set key width -2
    set xrange [0:10]
    set key spacing 1.2
    #set nokey

    set grid xtics ytics mytics
    #set size 2
    #set size ratio 0.4

    #show timestamp
    set xlabel "Time [Seconds]"
    set ylabel "Segments"

    set style line 1 lc rgb "#ff0000" lt 1 pi 0 pt 4 lw 4 ps 0

    # Congestion control send window

    plot  "estimated.csv" using ($1):2 with lines title "Estimated";

I wanted to find the pattern of the estimated signal of the previous plot something close to the following plot. My ground truth (actual signal is shown in the following plot)

Here is my initial approach

#!/usr/bin/env python
import sys

import numpy as np
from shapely.geometry import LineString
#-------------------------------------------------------------------------------
def load_data(fname):
    return LineString(np.genfromtxt(fname, delimiter = ','))
#-------------------------------------------------------------------------------
lines = list(map(load_data, sys.argv[1:]))

for g in lines[0].intersection(lines[1]):
    if g.geom_type != 'Point':
        continue
    print('%f,%f' % (g.x, g.y))

Then invoke this python script in my gnuplot directly as in the following:

set terminal pngcairo
set output 'fig.png'

set datafile separator comma
set yr [0:700]
set xr [0:10]

set xtics 0,2,10
set ytics 0,100,700

set grid

set xlabel "Time [seconds]"
set ylabel "Segments"

plot \
    'estimated.csv' w l lc rgb 'dark-blue' t 'Estimated', \
    'actual.csv' w l lc rgb 'green' t 'Actual', \
    '<python filter.py estimated.csv actual.csv' w p lc rgb 'red' ps 0.5 pt 7 t ''

which gives us the following plot. But this does not seem to give me the right pattern as gnuplot is not the best tool for such tasks.

Is there any way where we can find the pattern of the first graph (estimated.csv) by forming the peaks into a plot using python? If we see from the end, the pattern actually seems to be visible. Any help would be appreciated.

解决方案

I think pandas.rolling_max() is the right approach here. We are loading the data into a DataFrame and calculate the rolling maximum over 8500 values. Afterwards the curves look similar. You may test with the parameter a little bit to optimize the result.

import numpy as np
import matplotlib.pyplot as plt
import pandas as pd
plt.ion()
names = ['actual.csv','estimated.csv']
#-------------------------------------------------------------------------------
def load_data(fname):
    return np.genfromtxt(fname, delimiter = ',')
#-------------------------------------------------------------------------------

data = [load_data(name) for name in names]
actual_data = data[0]
estimated_data = data[1]
df = pd.read_csv('estimated.csv', names=('x','y'))
df['rolling_max'] = pd.rolling_max(df['y'],8500)
plt.figure()
plt.plot(actual_data[:,0],actual_data[:,1], label='actual')
plt.plot(estimated_data[:,0],estimated_data[:,1], label='estimated')
plt.plot(df['x'], df['rolling_max'], label = 'rolling')

plt.legend()
plt.title('Actual vs. Interpolated')
plt.xlim(0,10)
plt.ylim(0,500)
plt.xlabel('Time [Seconds]')
plt.ylabel('Segments')
plt.grid()
plt.show(block=True)

To answer the question from the comments:

Since pd.rolling() is generating defined windows of your data, the first values will be NaN for pd.rolling().max. To replace these NaNs, I suggest to turn around the whole Series and to calculate the windows backwards. Afterwards, we can replace all the NaNs by the values from the backwards calculation. I adjusted the window length for the backwards calculation. Otherwise we get erroneous data.

This code works:

import numpy as np
import matplotlib.pyplot as plt
import pandas as pd
plt.ion()

df = pd.read_csv('estimated.csv', names=('x','y'))
df['rolling_max'] = df['y'].rolling(8500).max()
df['rolling_max_backwards'] = df['y'][::-1].rolling(850).max()
df.rolling_max.fillna(df.rolling_max_backwards, inplace=True)
plt.figure()
plt.plot(df['x'], df['rolling_max'], label = 'rolling')

plt.legend()
plt.title('Actual vs. Interpolated')
plt.xlim(0,10)
plt.ylim(0,700)
plt.xlabel('Time [Seconds]')
plt.ylabel('Segments')
plt.grid()
plt.show(block=True)

And we get the following result:

这篇关于Python - 在情节中寻找模式的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆