使用节点在流中求和的列 [英] Sum column in stream using node

查看:49
本文介绍了使用节点在流中求和的列的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我的目标是计算与csv中的另一列相关的一列之和.

My goal is to count the sum of a column which correlates to another column in a csv.

例如,我有一个csv输入,看起来像这样

For example, I have an input csv what looks like this

"500","my.jpg"
"500","my.jpg"
"200","another.jpg"

我希望输出为:

[{ bytes: 1000, uri: "my.jpg" }, { bytes:200, "another.jpg" }]

注意:我需要作为流来执行此操作,因为给定的csv可能有超过300万条记录,并且循环太慢了.

Note: I need to do this as a stream as there can be over 3 millions records for a given csv and looping is just too slow.

我设法使用 awk 完成此操作,但是我正在努力在节点中实现它

I have managed to accomplish this using awk but I am struggling to implement it in node

以下是使用 awk 命令的bash脚本

Here is bash script using awk command

awk -F, 'BEGIN { print "["}
{   
    gsub(/"/, ""); # Remove all quotation from csv
    uri=$2; # Put the current uri in key
    a[uri]++; # Increment the count of uris
    b[uri] = b[uri] + $1; # total up bytes
} 
END { 
    for (i in a) {
        printf "%s{\"uri\":\"%s\",\"count\":\"%s\",\"bytes\":\"%s\"}",
        separator, i, a[i], b[i]
        separator = ", "
    }

    print "]"
}
' ./res.csv

任何朝着正确方向的指针将不胜感激

any pointers in the right direction would be hugely appreciated

推荐答案

您可以尝试为您的csv文件创建读取流,并将其通过管道传输到

You can try creating a read stream to you csv file and piping it to a csv-streamify parser.

const csv = require('csv-streamify')
const fs = require('fs')

const parser = csv()
const sum = {};

// emits each line as a buffer or as a string representing an array of fields
parser.on('data', function (line) {
  let key = line[1];
  let val = line[0];
  if (!sum[key]) {
    sum[key] = 0;
  }
  sum[key] = sum[key] + parseInt(val);
  console.log("Current sum for " + key + ": " + sum[key])
})

parser.on('end', function () {
  let results = Object.keys(sum)
    .map(key => ({ bytes: sum[key], uri: key }))
  console.log(results);
})

// now pipe some data into it
fs.createReadStream('./test.csv').pipe(parser)

使用示例数据,此示例应打印:

Using your sample data this example should print:

[ { bytes: 1000, uri: 'my.jpg' },
  { bytes: 200, uri: 'another.jpg' } ]

这篇关于使用节点在流中求和的列的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆