Nodejs读取非常大的文件(〜10GB),逐行处理然后写入其他文件 [英] Nodejs Read very large file(~10GB), Process line by line then write to other file
问题描述
我有一个特殊格式的 10 GB 日志文件,我想逐行处理此文件,然后将输出写入其他文件应用一些转换后.我正在使用节点执行此操作.
I have a 10 GB log file in a particular format, I want to process this file line by line and then write the output to other file after applying some transformations. I am using node for this operation.
尽管此方法很好,但是要花很多时间才能做到这一点.在JAVA中,我能够在30-45分钟内完成此操作,但是在节点上,要完成同一工作要花费160多分钟.以下是代码:
Though this method is fine but it takes a hell lot of time to do this. I was able to do this within 30-45 mins in JAVA, but in node it is taking more than 160 minutes to do the same job. Following is the code:
以下是启动代码,该启动代码从输入中读取每一行.
Following is the initiation code which reads each line from the input.
var path = '../10GB_input_file.txt';
var output_file = '../output.txt';
function fileopsmain(){
fs.exists(output_file, function(exists){
if(exists) {
fs.unlink(output_file, function (err) {
if (err) throw err;
console.log('successfully deleted ' + output_file);
});
}
});
new lazy(fs.createReadStream(path, {bufferSize: 128 * 4096}))
.lines
.forEach(function(line){
var line_arr = line.toString().split(';');
perform_line_ops(line_arr, line_arr[6], line_arr[7], line_arr[10]);
}
);
}
这是在那条线上执行一些操作的方法, 传递输入的write方法以将其写入输出文件.
This is the method that performs some operation over that line and passes the input to write method to write it into the output file.
function perform_line_ops(line_arr, range_start, range_end, daynums){
var _new_lines = '';
for(var i=0; i<days; i++){
//perform some operation to modify line pass it to print
}
write_line_ops(_new_lines);
}
以下方法用于将数据写入新文件.
Following method is used to write data into a new file.
function write_line_ops(line) {
if(line != null && line != ''){
fs.appendFileSync(output_file, line);
}
}
我希望将此时间缩短至15-20分钟.是否有可能这样做.
I want to bring this time down to 15-20 mins. Is it possible to do so.
为了便于记录,我还在具有 8 GB RAM的英特尔 i7处理器上进行了尝试.
Also for the record I'm trying this on a intel i7 processor with 8 GB of RAM.
推荐答案
您无需模块即可轻松完成此操作.例如:
You can do this easily without a module. For example:
var fs = require('fs');
var inspect = require('util').inspect;
var buffer = '';
var rs = fs.createReadStream('foo.log');
rs.on('data', function(chunk) {
var lines = (buffer + chunk).split(/\r?\n/g);
buffer = lines.pop();
for (var i = 0; i < lines.length; ++i) {
// do something with `lines[i]`
console.log('found line: ' + inspect(lines[i]));
}
});
rs.on('end', function() {
// optionally process `buffer` here if you want to treat leftover data without
// a newline as a "line"
console.log('ended on non-empty buffer: ' + inspect(buffer));
});
这篇关于Nodejs读取非常大的文件(〜10GB),逐行处理然后写入其他文件的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!