PHP-"get_headers"返回"400 Bad Request"；和"403 Forbidden"有效的网址? [英] PHP - `get_headers` returns "400 Bad Request" and "403 Forbidden" for valid URLs?

查看：106 发布时间：2020/7/23 2:00:24 php get-headers

本文介绍了PHP-"get_headers"返回"400 Bad Request"；和"403 Forbidden"有效的网址?的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

说明底部的工作解决方案！

我正在运行PHP 5.4，并尝试获取URL列表的标题.

I am running PHP 5.4, and trying to get the headers of a list of URLs.

在大多数情况下，一切工作正常，但是有三个URL引起了问题(可能还会出现更多问题，并且需要进行更广泛的测试).

For the most part, everything is working fine, but there are three URLs that are causing issues (and likely more, with more extensive testing).

'http://www.alealimay.com'
'http://www.thelovelist.net'
'http://www.bleedingcool.com'

这三个站点在浏览器中都可以正常运行，并产生以下标题响应:

All three sites work fine in a browser, and produce the following header responses:

(来自Safari)

请注意，所有三个标头响应均为Code = 200

Note that all three header responses are Code = 200

但是使用 get_headers 通过PHP检索标头. ..

But retrieving the headers via PHP, using get_headers...

stream_context_set_default(array('http' => array('method' => "HEAD")));
$headers = get_headers($url, 1);
stream_context_set_default(array('http' => array('method' => "GET")));

...返回以下内容:

... returns the following:

url  ......  "http://www.alealimay.com"

headers
|    0  ............................  "HTTP/1.0 400 Bad Request"
|    content-length  ...............  "378"
|    X-Synthetic  ..................  "true"
|    expires  ......................  "Thu, 01 Jan 1970 00:00:00 UTC"
|    pragma  .......................  "no-cache"
|    cache-control  ................  "no-cache, must-revalidate"
|    content-type  .................  "text/html; charset=UTF-8"
|    connection  ...................  "close"
|    date  .........................  "Wed, 24 Aug 2016 01:26:21 UTC"
|    X-ContextId  ..................  "QIFB0I8V/xsTFMREg"
|    X-Via  ........................  "1.0 echo109"
   


url  ......  "http://www.thelovelist.net"

headers
|    0  ............................  "HTTP/1.0 400 Bad Request"
|    content-length  ...............  "378"
|    X-Synthetic  ..................  "true"
|    expires  ......................  "Thu, 01 Jan 1970 00:00:00 UTC"
|    pragma  .......................  "no-cache"
|    cache-control  ................  "no-cache, must-revalidate"
|    content-type  .................  "text/html; charset=UTF-8"
|    connection  ...................  "close"
|    date  .........................  "Wed, 24 Aug 2016 01:26:22 UTC"
|    X-ContextId  ..................  "aNKvf2RB/bIMjWyjW"
|    X-Via  ........................  "1.0 echo103"



url  ......  "http://www.bleedingcool.com"

headers
|    0  ............................  "HTTP/1.1 403 Forbidden"
|    Server  .......................  "Sucuri/Cloudproxy"
|    Date  .........................  "Wed, 24 Aug 2016 01:26:22 GMT"
|    Content-Type  .................  "text/html"
|    Content-Length  ...............  "5311"
|    Connection  ...................  "close"
|    Vary  .........................  "Accept-Encoding"
|    ETag  .........................  "\"57b7f28e-14bf\""
|    X-XSS-Protection  .............  "1; mode=block"
|    X-Frame-Options  ..............  "SAMEORIGIN"
|    X-Content-Type-Options  .......  "nosniff"
|    X-Sucuri-ID  ..................  "11005"

无论更改stream_context都是这种情况

This is the case regardless of changing the stream_context

//stream_context_set_default(array('http' => array('method' => "HEAD")));
$headers = get_headers($url, 1);
//stream_context_set_default(array('http' => array('method' => "GET")));

产生相同的结果.

任何一种警告或错误都不会引发(通常使用@get_headers抑制错误，但是两种方法都没有区别).

No warnings or errors are thrown for any of these (normally have the errors suppressed with @get_headers, but there is no difference either way).

我已经检查了我的php.ini，并且具有 allow_url_fopen 设置为On.

I have checked my php.ini, and have allow_url_fopen set to On.

我正朝 stream_get_meta_data ，，并且对CURL解决方案不感兴趣. stream_get_meta_data(及其随附的fopen)在与get_headers相同的位置将失败，因此在这种情况下，修复一个将修复这两个问题.

I am headed towards stream_get_meta_data, and am not interested in CURL solutions. stream_get_meta_data (and its accompanying fopen) will fail in the same spot as get_headers, so fixing one will fix both in this case.

通常，如果存在重定向，则输出如下所示:

Usually, if there are redirects, the output looks like:

url  ......  "http://www.startingURL.com/"

headers
|    0  ............................  "HTTP/1.1 301 Moved Permanently"
|    1  ............................  "HTTP/1.1 200 OK"
|    Date
|    |    "Wed, 24 Aug 2016 02:02:29 GMT"
|    |    "Wed, 24 Aug 2016 02:02:32 GMT"
|    
|    Server
|    |    "Apache"
|    |    "Apache"
|    
|    Location  .....................  "http://finishingURL.com/"
|    Connection
|    |    "close"
|    |    "close"
|    
|    Content-Type
|    |    "text/html; charset=UTF-8"
|    |    "text/html; charset=UTF-8"
|    
|    Link  .........................  "; rel=\"https://api.w.org/\", ; rel=shortlink"

这些网站如何在浏览器中运行，但是使用get_headers时失败?

How come the sites work in browsers, but fail when using get_headers?

有各种各样的SO帖子讨论同一件事，但是所有解决方案都不适用于这种情况:

There are various SO posts discussing the same thing, but the solution for all of them doesn't pertain to this case:

POST要求Content-Length (我正在发送HEAD请求，未返回任何内容)

POST requires Content-Length (I'm sending a HEAD request, no content is returned)

URL contains UTF-8 data (The only chars in these URLs are all from the Latin alphabet)

无法发送一个带有空格的URL (这些URL都是无空间的，并且在各个方面都很普通)

Cannot send a URL with spaces in it (These URLs are all space-free, and very ordinary in every way)

(感谢马克斯在下面的答案中为我指明正确的方向.)

问题是因为没有预定义的user_agent，没有在php.ini中进行设置或在代码中声明它.

The issue is because there is no pre-defined user_agent, without either setting on in php.ini, or declaring it in code.

因此，我将user_agent更改为模仿浏览器，执行任务，然后将其还原为陈述值(可能为空白).

So, I change the user_agent to mimic a browser, do the deed, and then revert it back to stating value (likely blank).

$OriginalUserAgent = ini_get('user_agent');
ini_set('user_agent', 'Mozilla/5.0');

$headers = @get_headers($url, 1);

ini_set('user_agent', $OriginalUserAgent);

在此处找到了用户代理更改

PHP-"get_headers"返回"400 Bad Request"；和"403 Forbidden"有效的网址? [英] PHP - `get_headers` returns "400 Bad Request" and "403 Forbidden" for valid URLs?

问题描述

推荐答案

相关文章

PHP最新文章

热门教程

热门工具

登录关闭

PHP-"get_headers"返回"400 Bad Request"；和"403 Forbidden"有效的网址? [英] PHP - `get_headers` returns &quot;400 Bad Request&quot; and &quot;403 Forbidden&quot; for valid URLs?

问题描述

推荐答案

相关文章

PHP最新文章

热门教程

热门工具

登录 关闭

PHP-"get_headers"返回"400 Bad Request"；和"403 Forbidden"有效的网址? [英] PHP - `get_headers` returns "400 Bad Request" and "403 Forbidden" for valid URLs?

登录关闭