如何在nodejs中加载非常大的csv文件? [英] How to load very large csv files in nodejs?

查看:415
本文介绍了如何在nodejs中加载非常大的csv文件?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试将2个大型csv加载到nodejs中,第一个的大小为257597 ko,第二个的大小为104330 ko.我正在使用文件系统(fs)和csv模块,这是我的代码:

I'm trying to load 2 big csv into nodejs, first one has a size of 257 597 ko and second one 104 330 ko. I'm using the filesystem (fs) and csv modules, here's my code :

fs.readFile('path/to/my/file.csv', (err, data) => {
  if (err) console.err(err)
  else {
    csv.parse(data, (err, dataParsed) => {
      if (err) console.err(err)
      else {
        myData = dataParsed
        console.log('csv loaded')
      }
    })
  }
})

经过一段时间(1-2小时)后,它只是崩溃并显示以下错误消息:

And after ages (1-2 hours) it just crashes with this error message :

<--- Last few GCs --->

[1472:0000000000466170]  4366473 ms: Mark-sweep 3935.2 (4007.3) -> 3935.2 (4007.
3) MB, 5584.4 / 0.0 ms  last resort GC in old space requested
[1472:0000000000466170]  4371668 ms: Mark-sweep 3935.2 (4007.3) -> 3935.2 (4007.
3) MB, 5194.3 / 0.0 ms  last resort GC in old space requested


<--- JS stacktrace --->

==== JS stack trace =========================================

Security context: 000002BDF12254D9 <JSObject>
    1: stringSlice(aka stringSlice) [buffer.js:590] [bytecode=000000810336DC91 o
ffset=94](this=000003512FC822D1 <undefined>,buf=0000007C81D768B9 <Uint8Array map
 = 00000352A16C4D01>,encoding=000002BDF1235F21 <String[4]: utf8>,start=0,end=263
778854)
    2: toString [buffer.js:664] [bytecode=000000810336D8D9 offset=148](this=0000
007C81D768B9 <Uint8Array map = 00000352A16C4D01>,encoding=000002BDF1...

FATAL ERROR: CALL_AND_RETRY_LAST Allocation failed - JavaScript heap out of memo
ry
 1: node::DecodeWrite
 2: node_module_register
 3: v8::internal::FatalProcessOutOfMemory
 4: v8::internal::FatalProcessOutOfMemory
 5: v8::internal::Factory::NewRawTwoByteString
 6: v8::internal::Factory::NewStringFromUtf8
 7: v8::String::NewFromUtf8
 8: std::vector<v8::CpuProfileDeoptFrame,std::allocator<v8::CpuProfileDeoptFrame
> >::vector<v8::CpuProfileDeoptFrame,std::allocator<v8::CpuProfileDeoptFrame> >
 9: v8::internal::wasm::SignatureMap::Find
10: v8::internal::Builtins::CallableFor
11: v8::internal::Builtins::CallableFor
12: v8::internal::Builtins::CallableFor
13: 00000081634043C1

已加载最大文件,但另一个节点的内存不足.分配更多的内存可能很容易,但是这里的主要问题是加载时间,尽管文件很大,但是看起来很长.那么正确的方法是什么呢? Python使用pandas btw(3-5秒)非常快地加载了这些csv.

The biggest file is loaded but node runs out of memory for the other. It's probably easy to allocate more memory, but the main issue here is the loading time, it seems very long despite the size of files. So what is the correct way to do it? Python loads these csv really fast with pandas btw (3-5 seconds).

推荐答案

流完美运行,只用了3-5秒:

Stream works perfectly, it took only 3-5 seconds :

var csv = require('csv-parser')
var data = []

fs.createReadStream('path/to/my/data.csv')
  .pipe(csv())
  .on('data', function (row) {
    data.push(row)
  })
  .on('end', function () {
    console.log('Data loaded')
  })

这篇关于如何在nodejs中加载非常大的csv文件?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆