处理非常大的csv文件没有超时和内存错误 [英] Process very big csv file without timeout and memory error
问题描述
目前我正在为一个非常大的CSV文件编写一个导入脚本。
At the moment I'm writing an import script for a very big CSV file. The Problem is most times it stops after a while because of an timeout or it throws an memory error.
我的想法现在是解析在100行的CSV文件,因为超时或它引发一个内存错误。步骤和100行后自动调用脚本。我试图实现这与头(位置...)和传递当前行与get但它没有工作,因为我想。
My Idea was now to parse the CSV file in "100 lines" steps and after 100 lines recall the script automatically. I tried to achieve this with header (location ...) and pass the current line with get but it didn't work out as I want to.
有更好的方法,或者有人有一个想法如何摆脱内存错误和超时?
Is there a better way to this or does someone have an idea how to get rid of the memory error and the timeout?
推荐答案
使用 fgetcsv
读取120MB csv在流的方式(这是正确的英语吗?)。它逐行读取,然后我插入每一行到一个数据库。这样,每次迭代只有一行在内存中。脚本还需要20分钟。跑步。也许我下次尝试Python ...不要尝试加载一个巨大的csv文件到数组,这将消耗大量的内存。
I've used fgetcsv
to read a 120MB csv in a stream-wise-manner (is that correct english?). That reads in line by line and then I've inserted every line into a database. That way only one line is hold in memory on each iteration. The script still needed 20 min. to run. Maybe I try Python next time… Don't try to load a huge csv-file into an array, that really would consume a lot of memory.
// WDI_GDF_Data.csv (120.4MB) are the World Bank collection of development indicators:
// http://data.worldbank.org/data-catalog/world-development-indicators
if(($handle = fopen('WDI_GDF_Data.csv', 'r')) !== false)
{
// get the first row, which contains the column-titles (if necessary)
$header = fgetcsv($handle);
// loop through the file line-by-line
while(($data = fgetcsv($handle)) !== false)
{
// resort/rewrite data and insert into DB here
// try to use conditions sparingly here, as those will cause slow-performance
// I don't know if this is really necessary, but it couldn't harm;
// see also: http://php.net/manual/en/features.gc.php
unset($data);
}
fclose($handle);
}
这篇关于处理非常大的csv文件没有超时和内存错误的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!