使用完整网址时,PHP file_get_contents非常慢 [英] PHP file_get_contents very slow when using full url

查看:1130
本文介绍了使用完整网址时,PHP file_get_contents非常慢的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在使用一个脚本(我最初并未创建),该脚本从HTML页面生成pdf文件.问题是现在要花费很长时间,例如1-2分钟.据说这本来可以正常工作,但在过去的几周内变慢了.

I am working with a script (that I did not create originally) that generates a pdf file from an HTML page. The problem is that it is now taking a very long time, like 1-2 minutes, to process. Supposedly this was working fine originally, but has slowed down within the past couple of weeks.

该脚本在php脚本上调用file_get_contents,然后将结果输出到服务器上的HTML文件中,并在该文件上运行pdf生成器应用程序.

The script calls file_get_contents on a php script, which then outputs the result into an HTML file on the server, and runs the pdf generator app on that file.

我似乎已将问题缩小为对完整网址而不是本地路径的file_get_contents调用.

I seem to have narrowed down the problem to the file_get_contents call on a full url, rather than a local path.

当我使用

$content = file_get_contents('test.txt');

它几乎是即时处理的.但是,如果我使用完整的网址

it processes almost instantaneously. However, if I use the full url

$content = file_get_contents('http://example.com/test.txt');

处理过程大约需要30-90秒.

it takes anywhere from 30-90 seconds to process.

它不仅限于我们的服务器,访问任何外部URL(例如 http://www.google)时速度很慢. com .我相信脚本会调用完整的url,因为有些查询字符串变量是必需的,如果您在本地调用文件,这些变量将无法正常工作.

It's not limited to our server, it is slow when accessing any external url, such as http://www.google.com. I believe the script calls the full url because there are query string variables that are necessary that don't work if you call the file locally.

我也尝试了fopenreadfilecurl,它们的运行速度都类似.有什么想法可以解决这个问题?

I also tried fopen, readfile, and curl, and they were all similarly slow. Any ideas on where to look to fix this?

推荐答案

注意:此问题已在PHP 5.6.14中修复.现在,即使对于HTTP/1.0请求,也会自动发送Connection: close标头. 请参阅提交4b1dff6 .

Note: This has been fixed in PHP 5.6.14. A Connection: close header will now automatically be sent even for HTTP/1.0 requests. See commit 4b1dff6.

我很难找出file_get_contents脚本运行缓慢的原因.

I had a hard time figuring out the cause of the slowness of file_get_contents scripts.

通过使用Wireshark进行分析,问题(就我而言,也可能是您的问题)是远程Web服务器直到15秒后才关闭TCP连接(即保持活动").

By analyzing it with Wireshark, the issue (in my case and probably yours too) was that the remote web server DIDN'T CLOSE THE TCP CONNECTION UNTIL 15 SECONDS (i.e. "keep-alive").

实际上,file_get_contents不会发送连接" HTTP标头,因此远程Web服务器默认情况下认为这是一个保持活动的连接,并且直到15秒后才会关闭TCP流(这可能不是标准的值-取决于服务器配置).

Indeed, file_get_contents doesn't send a "connection" HTTP header, so the remote web server considers by default that's it's a keep-alive connection and doesn't close the TCP stream until 15 seconds (It might not be a standard value - depends on the server conf).

如果HTTP有效负载长度达到响应Content-Length HTTP标头中指定的长度,则普通浏览器会认为页面已完全加载. File_get_contents不会这样做,这太可惜了.

A normal browser would consider the page is fully loaded if the HTTP payload length reaches the length specified in the response Content-Length HTTP header. File_get_contents doesn't do this and that's a shame.

解决方案

所以,如果您想知道解决方案,这里是:

SO, if you want to know the solution, here it is:

$context = stream_context_create(array('http' => array('header'=>'Connection: close\r\n')));
file_get_contents("http://www.something.com/somepage.html",false,$context);

事情只是要在下载完成后告诉远程Web服务器关闭连接,因为file_get_contents不够智能,无法单独使用响应Content-Length HTTP标头来完成此操作

The thing is just to tell the remote web server to close the connection when the download is complete, as file_get_contents isn't intelligent enough to do it by itself using the response Content-Length HTTP header.

这篇关于使用完整网址时,PHP file_get_contents非常慢的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆