使用PHP解析大型文本文件而不会杀死服务器 [英] Parsing Large Text Files with PHP Without Killing the Server
问题描述
我正在尝试读取一些较大的文本文件(介于50M-200M之间),并进行简单的文本替换(基本上,在一些常规情况下,我没有正确转义的xml).这是该函数的简化版本:
I'm trying to read some large text files (between 50M-200M), doing simple text replacement (Essentially the xml I have hasn't been properly escaped in a few, regular cases). Here's a simplified version of the function:
<?php
function cleanFile($file1, $file2) {
$input_file = fopen($file1, "r");
$output_file = fopen($file2, "w");
while (!feof($input_file)) {
$buffer = trim(fgets($input_file, 4096));
if (substr($buffer,0, 6) == '<text>' AND substr($buffer,0, 15) != '<text><![CDATA[')
{
$buffer = str_replace('<text>', '<text><![CDATA[', $buffer);
$buffer = str_replace('</text>', ']]></text>', $buffer);
}
fputs($output_file, $buffer . "\n");
}
fclose($input_file);
fclose($output_file);
}
?>
我没有得到的是,对于最大的文件(大约150mb),PHP内存使用在失败之前超出了图表(大约2GB).我认为这是读取大文件最有效的内存方式.我是否缺少一些可以提高内存效率的方法?也许某些设置可以在收集信息时将其保留在内存中?
What I don't get is that for the largest of files, around 150mb, PHP memory usage goes off the chart (around 2GB) before failing. I thought that this was the most memory efficient way to go about reading large files. Is there some method I am missing that would be more efficient for memory? Perhaps some setting that's keeping things in memory when it should be being collected?
换句话说,它不起作用,我也不知道为什么,据我所知我没有做错事情.我有什么方向吗?感谢您的任何投入.
In other words, it's not working and I don't know why, and as far as I know I am not doing things incorrectly. Any direction for me to go? Thanks for any input.
推荐答案
PHP并非真的为此目的而设计.将工作分流到另一个进程,然后调用它或从PHP启动它.我建议使用 Python 或 Perl .
PHP isn't really designed for this. Offload the work to a different process and call it or start it from PHP. I suggest using Python or Perl.
这篇关于使用PHP解析大型文本文件而不会杀死服务器的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!