DC和crossfilter与大数据集 [英] DC and crossfilter with large datasets

查看:212
本文介绍了DC和crossfilter与大数据集的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我一直在使用dc和crossfilter js,而且目前我有一个550,000行的大型数据集,大小为60mb csv,并且正面临诸如浏览器崩溃等诸多问题。



所以,我想了解dc和crossfilter如何处理大型数据集。
http://dc-js.github.io/dc.js/



他们的主站点上的示例运行顺利,看到timeline-> memory(在控制台中),最大值达到34 mb,随着时间的推移逐渐减少



我的项目占用每个下拉列表选择范围为300-500MB的内存,当它加载一个json文件并呈现整个可视化



所以,2个问题




  • dc站点示例的后端是什么?可以找出确切的后端文件吗?

  • 如何从我的应用程序中减少RAM上的数据超载,运行速度很慢,最终会崩溃?


解决方案

您可以尝试运行加载数据,并在服务器上进行过滤。当我的数据集的大小对于浏览器来说太大时,我面临着类似的问题。
我在几个星期内发布了一个关于实现相同的问题。 在客户端使用dc.js与crossfilter在服务器上



这是一个关于它的概述。



在客户端你会想要创建具有dc.js期望的基本功能的假尺寸和假组( https://github.com/dc-js/dc.js/wiki/FAQ#filter-the-data-before-its-charted )。您可以在客户端创建您的dc.js图表​​,并在需要的地方插入假尺寸和组。



现在在服务器端,您已经运行了过滤器( https://www.npmjs.org/package/crossfilter )。您可以在此创建实际的维度和组。



fakedimensions有一个 .filter()函数,它基本上向服务器发送一个ajax请求,以执行实际过滤过滤信息可以以查询字符串的形式进行编码。您还需要一个 .all()功能,以返回过滤结果。


I have been working on dc and crossfilter js and I currently have a large dataset with 550,000 rows and size 60mb csv and am facing a lot of issues with it like browser crashes etc

So , I'm trying to understand how dc and crossfilter deals with large datasets. http://dc-js.github.io/dc.js/

The example on their main site runs very smoothly and after seeing timelines->memory (in console) it goes to a max of 34 mb and slowly reduces with time

My project is taking up memory in the range of 300-500mb per dropdown selection, when it loads a json file and renders the entire visualization

So, 2 questions

  • What is the backend for the dc site example? Is it possible to find out the exact backend file?
  • How can I reduce the data overload on my RAM from my application, which is running very slowly and eventually crashing?

解决方案

Hi you can try running loading the data, and filtering it on the server. I faced a similar problem when the size of my dataset was being too big for the browser to handle. I posted a question a few weeks back as to implementing the same. Using dc.js on the clientside with crossfilter on the server

Here is an overview of going about it.

On the client side, you'd want to create fake dimensions and fake groups that have basic functionality that dc.js expects(https://github.com/dc-js/dc.js/wiki/FAQ#filter-the-data-before-its-charted). You create your dc.js charts on the client side and plug in the fake dimensions and groups wherever required.

Now on the server side you have crossfilter running(https://www.npmjs.org/package/crossfilter). You create your actual dimensions and groups here.

The fakedimensions have a .filter() function that basically sends an ajax request to the server to perform the actual filtering. The filtering information could be encoded in the form of a query string. You'd also need a .all() function on your fake group to return the results of the filtering.

这篇关于DC和crossfilter与大数据集的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆