转换使用大数据集将JSON输出解析为CSV [英] Converting Parse JSON output to CSV with large datasets

查看:1003
本文介绍了转换使用大数据集将JSON输出解析为CSV的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

Parse允许用户使用导出工具下载其数据,但只允许以JSON格式导出数据。

Parse allows users to download their data using their Export tool, but only allows the data to be exported in JSON format. I want this in CSV format for analysis in Excel.

虽然一个简单的脚本足够用于较小的JSON对象,但我处理的数据集是670,000行和超过360MB的数据集。在线转换器无法处理此文件大小,经常援引PHP已超过其内存限制。

While a simple script suffices for smaller JSON objects, I am dealing with a dataset that is 670,000 rows and over 360MB. Online converters cannot handle this file size, frequently citing that PHP has exceeded its memory limit.

我试过基于PHP CLI的脚本和在线转换器,但他们似乎超过其分配的内存。我想我需要一个新的方法, ini_set('memory_limit','4096M'); 仍然没有给我足够的内存。

I have tried PHP CLI-based scripts and online converters, but they all seem to exceed their allocated memory. I figured I needed a new approach when ini_set('memory_limit', '4096M'); still didn't give me enough memory.

我目前正在使用这个基于CLI的脚本来解析数据:

I am currently using this CLI-based script for parsing data:

// flatten to CSV
function flatten2CSV($file){
    $fileIO = fopen($file, 'w+');
    foreach ($this->dataArray as $items) {
        $flatData = array();
        $fields = new RecursiveIteratorIterator(new RecursiveArrayIterator($items));
        foreach($fields as $value) {
            array_push($flatData, $value);
        }
        fputcsv($fileIO, $flatData, ";", '"');
    }
    fclose($fileIO);
}

// and $this->dataArray is created here
function readJSON($JSONdata){
    $this->dataArray = json_decode($JSONdata,1);
    $this->prependColumnNames();
    return $this->dataArray;
}

private function prependColumnNames(){
    foreach(array_keys($this->dataArray[0]) as $key){
        $keys[0][$key] = $key;
    }
    $this->dataArray = array_merge($keys, $this->dataArray);
}

如何解决内存管理问题使用PHP和解析这个大型数据集吗?对于大型数据集,是否有更好的方法来读取JSON对象 json_decode

How can I solve memory management issues with PHP and parsing through this large dataset? Is there a better way to read in JSON objects than json_decode for large datasets?

推荐答案

如果您能够在浏览器中运行脚本,请查看 PapaParse JavaScript库 - 它支持大型数据集的分块和多线程,并且可以将 JSON转换为CSV

If you're able to run a script in the browser, check out the PapaParse JavaScript library -- it supports chunking and multi-threading for larger datasets and can convert JSON to CSV.

可能相关的特定配置选项


  • 工人

  • chunk

  • fastMode

  • worker
  • chunk
  • fastMode

或者,还有一个 PapaParse 用于Node.js,但没有 worker chunk 选项。

Alternatively, there is a fork of PapaParse for Node.js, though without the worker and chunk options.

我与这个库没有联系,但是已经成功地用于大型数据集上的CSV到JSON转换。

这篇关于转换使用大数据集将JSON输出解析为CSV的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆