为什么将数据加载到Meteor集合中需要这么长时间? [英] Why does loading data into a Meteor Collection take so long?

查看:366
本文介绍了为什么将数据加载到Meteor集合中需要这么长时间?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我试图使用Meteor构建一个数据可视化应用程序,以可视化一个大型数据集。数据目前存储在CSV格式的数据文件中,大小约为64MB。

I'm trying to build a data visualization application using Meteor to visualize a large dataset. The data is currently stored in a CSV-format data file, and is about 64MB.

我使用node-csv插件将此数据文件加载到Meteor集合(代码如下)。但是它需要大约1分钟每10k记录,这将需要大约1.5小时将整个文件加载到集合。在此期间,Meteor服务器对Web请求没有响应。

I'm using the node-csv plugin to load this data file into a Meteor Collection (code below). But it's taking about 1 minute per 10k records, which at that rate will take about 1.5 hours to load the whole file into the Collection. During that time, the Meteor server is unresponsive to web requests.

这似乎对我来说异常慢。这是正常吗? Meteor是不是设计为处理中等数量的数据?或者,有没有比我发现的更好的方式来做这个数据导入过程?

This seems abnormally slow to me. Is this normal? Is Meteor just not designed to handle moderately large amounts of data? Or is there a better way to do this data-import process than the way I discovered?

var csv = Meteor.require('CSV');
var fs = Meteor.require('fs');
var path = Npm.require('path');

function loadData() {
  var basepath = path.resolve('.').split('.meteor')[0];
  console.log('Loading data into Meteor...');

  csv().from.stream(
    fs.createReadStream(basepath+'server/data/enron_data.csv'),
      {'escape': '\\'})
    .on('record', Meteor.bindEnvironment(function(row, index) {
      if ((index % 10000) == 0) {
        console.log('Processing:', index, row);
      }
      Emails.insert({
        'sender_id': row[0],
        'recipient_id': row[1],
        'recipient_type': row[2],
        'date': row[3],
        'timezone': row[4],
        'subject': row[5]
        })
      }, function(error) {
          console.log('Error in bindEnvironment:', error);
      }
    ))
    .on('error', function(err) {
      console.log('Error reading CSV:', err);
    })
    .on('end', function(count) {
      console.log(count, 'records read');
    });
}


推荐答案

的meteor环境,一次加载一行数据真的是低效的。我认为您想要的工具是 mongoimport

Even if you do this outside of the meteor environment, loading your data one row at a time is really inefficient. I think the tool you want is mongoimport.

这可能并不明显,但您不需要插入您的文件meteor为了使用meteor与您的文件。

It may not be obvious, but you do not need to insert your documents with meteor in order to use meteor with your documents.

您可以尝试从 Meteor.startup 调用mongoimport当你的集合中有0个文档(或任何基本条件在你的情况下是有意义的)。我没有尝试这样,所以我不能说这是多么痛苦,但我想象你可以只是调用 child_process.spawn 启动mongoimport。如果由于某些原因不起作用,你总是可以把它放在一个脚本中,并且每当你执行流星重置时运行该脚本。

You can try calling mongoimport from Meteor.startup when there are 0 documents in your collection (or whatever base condition make sense in your situation). I haven't tried this so I can't say how much of a pain this is, but I'd imagine you could just call child_process.spawn to start mongoimport. If for some reason that doesn't work you could always put it in a script and run that script whenever you do a meteor reset.

注意事项 - 我相信您的静态服务器资产的适当位置是 private 目录。这还允许您使用资源 api访问这些文件。

Side note - I believe the appropriate place for your static server assets is the private directory. This also lets you use the Assets api to access those files.

这篇关于为什么将数据加载到Meteor集合中需要这么长时间?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆