Matplotlib 对大数据集很慢,如何启用抽取? [英] Matplotlib slow with large data sets, how to enable decimation?

查看:69
本文介绍了Matplotlib 对大数据集很慢,如何启用抽取?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我将matplotlib用于信号处理应用程序,我注意到它在大型数据集上令人窒息.这是我真正需要改进的地方,使其成为可用的应用程序.

I use matplotlib for a signal processing application and I noticed that it chokes on large data sets. This is something that I really need to improve to make it a usable application.

我正在寻找的是一种让matplotlib抽取我的数据的方法.是否有设置,属性或其他简单的方法可以启用它?欢迎提出任何有关如何实现此目标的建议.

What I'm looking for is a way to let matplotlib decimate my data. Is there a setting, property or other simple way to enable that? Any suggestion of how to implement this are welcome.

import numpy as np
import matplotlib.pyplot as plt

n=100000 # more then 100000 points makes it unusable slow
plt.plot(np.random.random_sample(n))
plt.show()

一些背景资料

我以前在大型C ++应用程序上工作,在该应用程序中,我们需要绘制大型数据集并解决此问题,我们曾经利用如下数据结构:

Some background information

I used to work on a large C++ application where we needed to plot large datasets and to solve this problem we used to take advantage of the structure of the data as follows:

在大多数情况下,如果我们想要一个线图,那么数据是有序的,甚至通常是等距的.如果是等距的,那么可以直接从缩放矩形和逆轴变换计算数据数组中的开始和结束索引.如果它是有序的但不是等距的,则可以使用二分查找.

In most cases, if we want a line plot then the data is ordered and often even equidistantial. If it is equidistantial, then you can calculate the start and end index in the data array directly from the zoom rectangle and the inverse axis transformation. If it is ordered but not equidistantial a binary search can be used.

接下来缩放切片被抽取,因为数据是有序的,我们可以简单地迭代一个落在一个像素内的点块.并且对于每个块计算平均值、最大值和最小值.然后我们在图中绘制一个条,而不是一个像素.

Next the zoomed slice is decimated, and because the data is ordered we can simply iterate a block of points that fall inside one pixel. And for each block the mean, maximum and minimum is calculated. Instead of one pixel, we then draw a bar in the plot.

例如:如果 x 轴是有序的,则将为每个块绘制一条垂直线,可能是具有不同颜色的平均值.

For example: if the x axis is ordered, a vertical line will be drawn for each block, possibly the mean with a different color.

为了避免混叠,图被过采样了两倍.

To avoid aliasing the plot is oversampled with a factor of two.

如果是散点图,可以通过排序对数据进行排序,因为绘制的顺序并不重要.

In case it is a scatter plot, the data can be made ordered by sorting, because the sequence of plotting is not important.

这个简单食谱的好处在于,放大得越多,它就变得越快.根据我的经验,只要数据适合内存,绘图就会保持非常灵敏.例如,具有1000万点的20个时间历史数据图应该没问题.

The nice thing of this simple recipe is that the more you zoom in the faster it becomes. In my experience, as long as the data fits in memory the plots stays very responsive. For instance, 20 plots of timehistory data with 10 million points should be no problem.

推荐答案

似乎您只需要在绘制数据之前对数据进行抽取

It seems like you just need to decimate the data before you plot it

import numpy as np
import matplotlib.pyplot as plt

n=100000 # more then 100000 points makes it unusable slow
X=np.random.random_sample(n)
i=10*array(range(n/10))
plt.plot(X[i])
plt.show()

这篇关于Matplotlib 对大数据集很慢,如何启用抽取?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆