转换使用大数据集将JSON输出解析为CSV [英] Converting Parse JSON output to CSV with large datasets
问题描述
Parse允许用户使用导出工具下载其数据,但只允许以JSON格式导出数据。
Parse allows users to download their data using their Export tool, but only allows the data to be exported in JSON format. I want this in CSV format for analysis in Excel.
虽然一个简单的脚本足够用于较小的JSON对象,但我处理的数据集是670,000行和超过360MB的数据集。在线转换器无法处理此文件大小,经常援引PHP已超过其内存限制。
While a simple script suffices for smaller JSON objects, I am dealing with a dataset that is 670,000 rows and over 360MB. Online converters cannot handle this file size, frequently citing that PHP has exceeded its memory limit.
我试过基于PHP CLI的脚本和在线转换器,但他们似乎超过其分配的内存。我想我需要一个新的方法, ini_set('memory_limit','4096M');
仍然没有给我足够的内存。
I have tried PHP CLI-based scripts and online converters, but they all seem to exceed their allocated memory. I figured I needed a new approach when ini_set('memory_limit', '4096M');
still didn't give me enough memory.
我目前正在使用这个基于CLI的脚本来解析数据:
I am currently using this CLI-based script for parsing data:
// flatten to CSV
function flatten2CSV($file){
$fileIO = fopen($file, 'w+');
foreach ($this->dataArray as $items) {
$flatData = array();
$fields = new RecursiveIteratorIterator(new RecursiveArrayIterator($items));
foreach($fields as $value) {
array_push($flatData, $value);
}
fputcsv($fileIO, $flatData, ";", '"');
}
fclose($fileIO);
}
// and $this->dataArray is created here
function readJSON($JSONdata){
$this->dataArray = json_decode($JSONdata,1);
$this->prependColumnNames();
return $this->dataArray;
}
private function prependColumnNames(){
foreach(array_keys($this->dataArray[0]) as $key){
$keys[0][$key] = $key;
}
$this->dataArray = array_merge($keys, $this->dataArray);
}
如何解决内存管理问题使用PHP和解析这个大型数据集吗?对于大型数据集,是否有更好的方法来读取JSON对象 json_decode
?
How can I solve memory management issues with PHP and parsing through this large dataset? Is there a better way to read in JSON objects than json_decode
for large datasets?
推荐答案
如果您能够在浏览器中运行脚本,请查看 PapaParse JavaScript库 - 它支持大型数据集的分块和多线程,并且可以将 JSON转换为CSV
If you're able to run a script in the browser, check out the PapaParse JavaScript library -- it supports chunking and multi-threading for larger datasets and can convert JSON to CSV.
可能相关的特定配置选项 :
-
工人
-
chunk
-
fastMode
worker
chunk
fastMode
或者,还有一个 PapaParse 用于Node.js,但没有 worker
和 chunk
选项。
Alternatively, there is a fork of PapaParse for Node.js, though without the worker
and chunk
options.
我与这个库没有联系,但是已经成功地用于大型数据集上的CSV到JSON转换。
这篇关于转换使用大数据集将JSON输出解析为CSV的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!