逐行读取FileReader对象,而不将整个文件加载到RAM中 [英] Read FileReader object line-by-line without loading the whole file into RAM
问题描述
现在许多浏览器都支持使用HTML5的FileReader读取本地文件,这为超出数据库前端的网站打开了大门,这些脚本可以对本地数据执行一些有用的操作,而无需先将其发送到服务器。
Now that many browsers support reading local files with HTML5's FileReader, this opens the door to websites which go beyond 'database front-ends' into scripts which can do something useful with local data without having to send it up to a server first.
在上传之前预处理图像和视频,FileReader的一个大应用是从某种磁盘表(CSV,TSV,无论什么)加载数据进入浏览器进行操作 - 可能用于在D3.js中进行绘图或分析,或者在WebGL中创建格局。
Pre-processing images and video before upload aside, one big application of FileReader would be loading data from some kind of on-disk table (CSV, TSV, whatever) into the browser for manipulation - perhaps for plotting or analysis in D3.js or creating landscapes in WebGL.
问题是,StackOverflow和其他网站上的大多数示例都使用FileReader的.readAsText()属性,在返回结果之前将整个文件读入RAM。
Problem is, most examples out there on StackOverflow and other sites use FileReader's .readAsText() property, which reads the whole file into RAM before returning a result.
要在不将数据加载到RAM的情况下读取文件,需要使用.readAsArrayBuffer( ),和我的帖子是最接近我的答案:
To read a file without loading the data into RAM, one would need to use .readAsArrayBuffer(), and this SO post is the closest I can get to a good answer:
然而,对于那个特定问题来说,它有点过于具体,而且说实话,我可以尝试好几天来使解决方案更加通用,因为我不理解块大小的重要性或为什么使用Uint8Array而空手而归。使用用户可定义的行分隔符逐行读取文件的一般问题的解决方案(理想情况下使用.split(),因为它也接受正则表达式),然后按行执行某些操作(例如将其打印到console.log)是理想的。
However, it's a bit too specific to that particular problem, and in all honesty, I could try for days to make the solution more general, and come out empty handed because I didn't understand the significance of the chunk sizes or why Uint8Array is used. A solution to the more general problem of reading a file line-by-line using a user-definable line separator (ideally with .split() since that also accept regex), and then doing something per-line (like printing it to the console.log) would be ideal.
推荐答案
我在下面的Gist URL上创建了一个LineReader类。正如我在评论中提到的,使用除LF,CR / LF和CR之外的其他行分隔符是不常见的。因此,我的代码只将LF和CR / LF视为行分隔符。
I've made a LineReader class at the following Gist URL. As I mentioned in a comment, it's unusual to use other line separators than LF, CR/LF and maybe CR. Thus, my code only considers LF and CR/LF as line separators.
https://gist.github.com/peteroupc/b79a42fffe07c2a87c28
示例:
new LineReader(file).readLines(function(line){
console.log(line);
});
这篇关于逐行读取FileReader对象,而不将整个文件加载到RAM中的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!