大型CSV到JSON / Node.js中的对象 [英] Large CSV to JSON/Object in Node.js

查看:330
本文介绍了大型CSV到JSON / Node.js中的对象的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我想做一些似乎不应该只是相当简单的完成,但一个普通的任务,有一个简单的包可以做到。我想要一个大的CSV文件(从关系数据库表导出),并将其转换为JavaScript对象数组。此外,我想将其导出到 .json 文件夹。

I am trying to do something that seems like it should not only be fairly simple to accomplish but a common enough task that there would be straightforward packages available to do it. I wish to take a large CSV file (an export from a relational database table) and convert it to an array of JavaScript objects. Furthermore, I would like to export it to a .json file fixture.

示例CSV:

a,b,c,d
1,2,3,4
5,6,7,8
...

所需的JSON:

[
{"a": 1,"b": 2,"c": 3,"d": 4},
{"a": 5,"b": 6,"c": 7,"d": 8},
...
]

我已经尝试过几个节点的CSV解析器,流媒体,自我宣称的CSV到JSON库,但我似乎无法得到我想要的结果,或者如果我只能工作,如果文件较小。我的文件是近1 GB的大小与〜40m行(这将创建40m对象)。我希望它需要流输入输入和/或输出以避免内存问题。

I've tried several node CSV parsers, streamers, self-proclaimed CSV-to-JSON libraries, but I can't seem to get the result I want, or if I can it only works if the files are smaller. My file is nearly 1 GB in size with ~40m rows (which would create 40m objects). I expect that it would require streaming the input and/or output to avoid memory problems.

这是我试过的软件包:

  • https://github.com/klaemo/csv-stream
  • https://github.com/koles/ya-csv
  • https://github.com/davidgtonge/stream-convert (works but it so exceedingly slow as to be useless, since I alter the dataset often. It took nearly 3 hours to parse a 60 MB csv file)
  • https://github.com/cgiffard/CSVtoJSON.js
  • https://github.com/wdavidw/node-csv-parser (doesn't seem to be designed for converting csv to other formats)
  • https://github.com/voodootikigod/node-csv

我使用Node 0.10.6,并想要如何轻松完成这个建议。滚动我自己可能是最好的,但我不知道从哪里开始所有的节点的流功能,特别是因为他们改变了在0.10.x的API。

I'm using Node 0.10.6 and would like a recommendation on how to easily accomplish this. Rolling my own might be best but I'm not sure where to begin with all of Node's streaming features, especially since they changed the API in 0.10.x.

推荐答案

虽然这远不是​​一个完整的答案,您可以能够使解决方案基于 https://github.com/dominictarr/event-stream 。自述的适应示例:

While this is far from a complete answer, you may be able to base your solution on https://github.com/dominictarr/event-stream . Adapted example from the readme:

    var es = require('event-stream')
    es.pipeline(                         //connect streams together with `pipe`
      process.openStdin(),              //open stdin
      es.split(),                       //split stream to break on newlines
      es.map(function (data, callback) { //turn this async function into a stream
        callback(null
          , JSON.stringify(parseCSVLine(data)))  // deal with one line of CSV data
      }), 
      process.stdout
      )

在每行上都有一串JSON格式的对象。
然后需要将它转换为一个数组,你可以使用结束每一行,删除它的最后一个,然后向文件的开头和结尾添加 []

After that, I expect you have a bunch of stringified JSON objects on each line. This then needs to be converted to an array, which you may be able to do with and appending , to end of every line, removing it on the last, and then adding [ and ] to beginning and end of the file.

parseCSVLine 函数必须配置为将CSV值分配到正确的对象属性。在传递文件的第一行之后可以很容易地做到这一点。

parseCSVLine function must be configured to assign the CSV values to the right object properties. This can be fairly easily done after passing the first line of the file.

我注意到库没有在0.10(至少不是Travis)上测试,所以要小心。也许可以在源自己上运行 npm test

I do notice the library is not tested on 0.10 (at least not with Travis), so beware. Maybe run npm test on the source yourself.

这篇关于大型CSV到JSON / Node.js中的对象的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆