对大型csv文件使用papa解析 [英] using papa parse for big csv files

查看:137
本文介绍了对大型csv文件使用papa解析的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试加载一个大约有10万行的文件,到目前为止,浏览器已经崩溃(本地).我在互联网上看了看,Papa Parse似乎可以处理大文件.现在将其减少到大约3-4分钟,以加载到文本区域中.加载文件后,然后我想再做一些jQuery来做计数和事情,因此该过程需要花费一些时间.有没有一种方法可以使csv加载更快?我是否正确使用该程序?

I am trying to load a file that has about 100k in lines and so far the browser has been crashing ( locally ). I looked on the internet and saw Papa Parse seems to handle large files. Now it is reduced down to about 3-4 minutes to load into the textarea. Once the file is loaded, I then want to do some more jQuery to do counts and things so the process is taking awhile. Is there a way to make the csv load faster? Am I using the program correctly?

<div id="tabs">
<ul>
  <li><a href="#tabs-4">Generate a Report</a></li>
</ul>
<div id="tabs-4">
  <h2>Generating a CSV report</h2>
  <h4>Input Data:</h4>      
  <input id="myFile" type="file" name="files" value="Load File" />
  <button onclick="loadFileAsText()">Load Selected File</button>
  <form action="./" method="post">
  <textarea id="input3" style="height:150px;"></textarea>

  <input id="run3" type="button" value="Run" />
  <input id="runSplit" type="button" value="Run Split" />
  <input id="downloadLink" type="button" value="Download" />
  </form>
</div>
</div>

$(function () {
    $("#tabs").tabs();
});

var data = $('#input3').val();

function handleFileSelect(evt) {
    var file = evt.target.files[0];

Papa.parse(file, {
    header: true,
    dynamicTyping: true,
    complete: function (results) {
        data = results;
    }
});
}

$(document).ready(function () {

    $('#myFile').change(function(handleFileSelect){

    });
});


function loadFileAsText() {
    var fileToLoad = document.getElementById("myFile").files[0];

    var fileReader = new FileReader();
    fileReader.onload = function (fileLoadedEvent) {
        var textFromFileLoaded = fileLoadedEvent.target.result;
        document.getElementById("input3").value = textFromFileLoaded;
    };
    fileReader.readAsText(fileToLoad, "UTF-8");
}

推荐答案

您可能正确使用了它,只是程序需要花费一些时间来解析所有100k行!

You probably are using it correctly, it is just the program will take some time to parse through all 100k lines!

对于网络工作者,这可能是一个很好的用例场景.

注意:根据以下 @tomBryer的回答,Papa Parse现在开箱即用地支持Web Workers..这可能比滚动自己的工作人员更好.

NOTE: Per @tomBryer's answer below, Papa Parse now has support for Web Workers out of the box. This may be a better approach than rolling your own worker.

如果您以前从未使用过它们,请

If you've never used them before, this site gives a decent rundown, but the key part is:

Web Workers模仿多线程,允许密集型脚本在后台运行,因此它们不会阻止其他脚本运行.在保持UI响应能力的同时还执行处理器密集型功能的理想选择.

Web Workers mimics multithreading, allowing intensive scripts to be run in the background so they do not block other scripts from running. Ideal for keeping your UI responsive while also performing processor-intensive functions.

浏览器覆盖范围也相当不错,其中IE10及以下版本是唯一的半现代不支持它的浏览器.

Browser coverage is pretty decent as well, with IE10 and below being the only semi-modern browsers that don't support it.

Mozilla拥有一个很好的视频,它展示了网络工作者如何提高帧速率在页面上也是如此.

Mozilla has a good video that shows how web workers can speed up frame rate on a page as well.

我将尝试为您提供有关Web Worker的有效示例,但还要注意,这不会加快脚本的执行速度,只会使其异步处理,从而使页面保持响应状态.

I'll try to get a working example with web workers for you, but also note that this won't speed up the script, it'll just make it process asynchronously so your page stays responsive.

编辑:

(注意:如果要解析worker中的CSV,则可能需要使用importScript函数(已全局定义)在worker.js中导入Papa Parser脚本.在工作线程中). a>有关该信息的更多信息.)

(NOTE: if you want to parse the CSV within the worker, you'll probably need to import the Papa Parser script within worker.js using the importScript function (which is globally defined within the worker thread). See the MDN page for more info on that.)

这是我的工作示例:

<!doctype html>
<html>
<head>
  <script type="text/javascript" src="https://cdnjs.cloudflare.com/ajax/libs/jquery/3.0.0/jquery.min.js"></script>
  <script type="text/javascript" src="https://cdnjs.cloudflare.com/ajax/libs/PapaParse/4.1.2/papaparse.js"></script>
</head>

<body>
  <input id="myFile" type="file" name="files" value="Load File" />
  <br>
  <button class="load-file">Load and Parse Selected CSV File</button>
  <div id="report"></div>

<script>
// initialize our parsed_csv to be used wherever we want
var parsed_csv;
var start_time, end_time;

// document.ready
$(function() {

  $('.load-file').on('click', function(e) {
    start_time = performance.now();
    $('#report').text('Processing...');

    console.log('initialize worker');

    var worker = new Worker('worker.js');
    worker.addEventListener('message', function(ev) {
      console.log('received raw CSV, now parsing...');

      // Parse our CSV raw text
      Papa.parse(ev.data, {
        header: true,
        dynamicTyping: true,
        complete: function (results) {
            // Save result in a globally accessible var
          parsed_csv = results;
          console.log('parsed CSV!');
          console.log(parsed_csv);

          $('#report').text(parsed_csv.data.length + ' rows processed');
          end_time = performance.now();
          console.log('Took ' + (end_time - start_time) + " milliseconds to load and process the CSV file.")
        }
      });

      // Terminate our worker
      worker.terminate();
    }, false);

    // Submit our file to load
    var file_to_load = document.getElementById("myFile").files[0];

    console.log('call our worker');
    worker.postMessage({file: file_to_load});
  });

});
</script>
</body>

</html>

worker.js

self.addEventListener('message', function(e) {
    console.log('worker is running');

    var file = e.data.file;
    var reader = new FileReader();

    reader.onload = function (fileLoadedEvent) {
        console.log('file loaded, posting back from worker');

        var textFromFileLoaded = fileLoadedEvent.target.result;

        // Post our text file back from the worker
        self.postMessage(textFromFileLoaded);
    };

    // Actually load the text file
    reader.readAsText(file, "UTF-8");
}, false);

它处理的GIF文件不到一秒(全部在本地运行)

这篇关于对大型csv文件使用papa解析的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆