使用PHP解析大型文本文件而不会杀死服务器 [英] Parsing Large Text Files with PHP Without Killing the Server

查看:74
本文介绍了使用PHP解析大型文本文件而不会杀死服务器的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试读取一些较大的文本文件(介于50M-200M之间),并进行简单的文本替换(基本上,在一些常规情况下,我没有正确转义的xml).这是该函数的简化版本:

I'm trying to read some large text files (between 50M-200M), doing simple text replacement (Essentially the xml I have hasn't been properly escaped in a few, regular cases). Here's a simplified version of the function:

<?php
function cleanFile($file1, $file2) {
$input_file     = fopen($file1, "r");
$output_file    = fopen($file2, "w");
  while (!feof($input_file)) {
    $buffer = trim(fgets($input_file, 4096));
    if (substr($buffer,0, 6) == '<text>' AND substr($buffer,0, 15) != '<text><![CDATA[')
    {
      $buffer = str_replace('<text>', '<text><![CDATA[', $buffer);
      $buffer = str_replace('</text>', ']]></text>', $buffer);
    }
   fputs($output_file, $buffer . "\n");
  }
  fclose($input_file);
  fclose($output_file);     
}
?>

我没有得到的是,对于最大的文件(大约150mb),PHP内存使用在失败之前超出了图表(大约2GB).我认为这是读取大文件最有效的内存方式.我是否缺少一些可以提高内存效率的方法?也许某些设置可以在收集信息时将其保留在内存中?

What I don't get is that for the largest of files, around 150mb, PHP memory usage goes off the chart (around 2GB) before failing. I thought that this was the most memory efficient way to go about reading large files. Is there some method I am missing that would be more efficient for memory? Perhaps some setting that's keeping things in memory when it should be being collected?

换句话说,它不起作用,我也不知道为什么,据我所知我没有做错事情.我有什么方向吗?感谢您的任何投入.

In other words, it's not working and I don't know why, and as far as I know I am not doing things incorrectly. Any direction for me to go? Thanks for any input.

推荐答案

PHP并非真的为此目的而设计.将工作分流到另一个进程,然后调用它或从PHP启动它.我建议使用 Python Perl .

PHP isn't really designed for this. Offload the work to a different process and call it or start it from PHP. I suggest using Python or Perl.

这篇关于使用PHP解析大型文本文件而不会杀死服务器的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆