在客户端用JavaScript读取逐行文件 [英] Reading line-by-line file in JavaScript on client side

查看:95
本文介绍了在客户端用JavaScript读取逐行文件的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述


$ b

目标



情景

p>

我正在研究应该解析客户端文件的网页。目前,我正在阅读此文章中描述的文件。



HTML:

 < input type =fileid =filesname = files []/> 

JavaScript:

 < $($#$)$(#files)。on('change',function(evt){
//创建FileReader
var reader = new FileReader();

//分配处理程序
reader.onloadend = function(evt){
lines = evt.target.result.split(/ \ r?\\\
/);

lines.forEach(function(line){
parseLine(...);
});
};

//获取文件实例
var file = evt.target.files [0];

//开始阅读
reader.readAsText(文件);
}

问题在于FileReader一次读取整个文件,导致大文件崩溃(size> = 300 MB)。 c $ c> reader.onprogress 并不能解决问题,因为它只是增加一个结果,直到达到极限。



发明轮子



我在互联网上做了一些研究, nd没有简单的方法来做到这一点(有很多文章描述这个确切的功能,但在服务器端为node.js)。

作为解决它的唯一方法,我看到只有以下内容:


  1. 通过块分割文件(通过 File.split(startByte,endByte)
  2. 在该块中查找最后一个新行字符('/ n')

  3. 并将其转换为字符串并按行分割

  4. 从步骤2中找到的最后一个新行字符开始读取下一块

但我会更好地使用已有的东西来避免熵增。

解决方案

最终,我创建了新的逐行阅读器,这与之前的完全不同。



特点是:


  • 对文件进行基于索引的访问(顺序和随机)

  • 针对重复的随机阅读进行了优化(对于已经在以前导航的行保存了字节偏移的里程碑),因此在读取所有文件一次后,访问行43422145几乎与访问第12行。

  • 在文件中搜索:查找下一个查找所有。确切的索引,偏移量和匹配的长度,以便您可以轻松地突出显示它们
  • //jsfiddle.net/3hmee6vb/1/\">jsFiddle 举例。



    用法:

      //初始化
    var file; // HTML5文件对象
    var navigator = new FileNavigator(file);

    //读取一定数量的行(顺序读取文件的最佳性能)
    navigator.readSomeLines(startingFromIndex,function(err,index,lines,eof,progress){...} );

    //读取确切的行数
    navigator.readLines(startingFromIndex,count,function(err,index,lines,eof,progress){...});

    //首先从索引
    中查找navigator.find(pattern,startingFromIndex,function(err,index,match){...});

    //找到所有匹配的行
    navigator.findAll(new RegExp(pattern),indexToStartWith,limitOfMatches,function(err,index,limitHit,results){...});

    性能与以前的解决方案相同。您可以在jsFiddle中调用'Read'来测量它。



    GitHub: https://github.com/anpur/client-line-navigator/wiki


    Could you please help me with following issue.

    Goal

    Read file on client side (in browser via JS and HTML5 classes) line by line, without loading whole file to memory.

    Scenario

    I'm working on web page which should parse files on client side. Currently, I'm reading file as it described in this article.

    HTML:

    <input type="file" id="files" name="files[]" />
    

    JavaScript:

    $("#files").on('change', function(evt){
        // creating FileReader
        var reader = new FileReader();
    
        // assigning handler
        reader.onloadend = function(evt) {      
            lines = evt.target.result.split(/\r?\n/);
    
            lines.forEach(function (line) {
                parseLine(...);
            }); 
        };
    
        // getting File instance
        var file = evt.target.files[0];
    
        // start reading
        reader.readAsText(file);
    }
    

    The problem is that FileReader reads whole file at once, which causes crashed tab for big files (size >= 300 MB). Using reader.onprogress doesn't solve a problem, as it just increments a result till it will hit the limit.

    Inventing a wheel

    I've done some research in internet and have found no simple way to do this (there are bunch of articles describing this exact functionality but on server side for node.js).

    As only way to solve it I see only following:

    1. Split file by chunks (via File.split(startByte, endByte) method)
    2. Find last new line character in that chunk ('/n')
    3. Read that chunk except part after last new line character and convert it to the string and split by lines
    4. Read next chunk starting from last new line character found on step 2

    But I'll better use something already existing to avoid entropy growth.

    解决方案

    Eventually I've created new line-by-line reader, which is totally different from previous one.

    Features are:

    • Index-based access to File (sequential and random)
    • Optimized for repeat random reading (milestones with byte offset saved for lines already navigated in past), so after you've read all file once, accessing line 43422145 will be almost as fast as accessing line 12.
    • Searching in file: find next and find all.
    • Exact index, offset and length of matches, so you can easily highlight them

    Check this jsFiddle for examples.

    Usage:

    // Initialization
    var file; // HTML5 File object
    var navigator = new FileNavigator(file);
    
    // Read some amount of lines (best performance for sequential file reading)
    navigator.readSomeLines(startingFromIndex, function (err, index, lines, eof, progress) { ... });
    
    // Read exact amount of lines
    navigator.readLines(startingFromIndex, count, function (err, index, lines, eof, progress) { ... });
    
    // Find first from index
    navigator.find(pattern, startingFromIndex, function (err, index, match) { ... });
    
    // Find all matching lines
    navigator.findAll(new RegExp(pattern), indexToStartWith, limitOfMatches, function (err, index, limitHit, results) { ... });
    

    Performance is same to previous solution. You can measure it invoking 'Read' in jsFiddle.

    GitHub: https://github.com/anpur/client-line-navigator/wiki

    这篇关于在客户端用JavaScript读取逐行文件的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆