通过采样/插值减小大数据集的大小,以提高图表性能 [英] Reduce the size of a large data set by sampling/interpolation to improve chart performance

查看:797
本文介绍了通过采样/插值减小大数据集的大小,以提高图表性能的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个大集合(> 2000)的时间序列数据,我想在浏览器中使用d3显示。 D3正在为用户显示一个数据子集(〜100点),但我还想要一个上下文视图(模块,它提供了许多不同的算法用于采样数据。下面是API的样子:

  //创建采样器
var sampler = fc_sample.largestTriangleThreeBucket();

//配置x / y值访问器
sampler.x(function(d){return dx;})
.y(function(d){return dy; });

//配置用于下采样数据的桶的大小。
sampler.bucketSize(10);

//运行sampler
var sampledData = sampler(data);

您可以在网站上看到它的运行示例:



http://d3fc.github.io/d3fc-sample/



最大三角形三桶算法在补丁的数据上运行得很好。它不改变桶大小,但确保包括峰/谷,这导致采样数据的良好表示。


I have a large set (>2000) of time series data that I'd like to display using d3 in the browser. D3 is working great for displaying a subset of the data (~100 points) to the user, but I also want a "context" view (like this) to show the entire data set and allow users to select as subregion to view in detail.

However, performance is abysmal when trying to display that many points in d3. I feel like a good solution would be to select a sample of the data and then use some kind of interpolation (spline, polynomial, etc., this is the part I know how to do) to draw a curve that is reasonably similar to the actual data.

However, it's not clear to me how I ought to go about selecting the subset. The data (shown below) has rather flat regions where fewer samples would be needed for a decent interpolation, and other regions where the absolute derivative is quite high, where more frequent sampling is needed.

To further complicate matters, the data has gaps (where the sensor generating it was failing or out of range), and I'd like to keep these gaps in the chart rather than interpolating through them. Detection of the gaps is fairly simple though, and simply clipping them out after drawing the entire data set with the interpolation seems like a reasonable solution.

I'm doing this in JavaScript, but a solution in any language or a mathematical answer to the problem would do.

解决方案

You could use the d3fc-sample module, which provides a number of different algorithms for sampling data. Here's what the API looks like:

// Create the sampler
var sampler = fc_sample.largestTriangleThreeBucket();

// Configure the x / y value accessors
sampler.x(function (d) { return d.x; })
    .y(function (d) { return d.y; });

// Configure the size of the buckets used to downsample the data.
sampler.bucketSize(10);

// Run the sampler
var sampledData = sampler(data);

You can see an example of it running on the website:

http://d3fc.github.io/d3fc-sample/

The largest-triangle three-buckets algorithm works quite well on data that is 'patchy'. It doesn't vary the bucket size, but does ensure that peaks / troughs are included, which results in a good representation of the sampled data.

这篇关于通过采样/插值减小大数据集的大小,以提高图表性能的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆