get_headers不一致 [英] get_headers Inconsistency
问题描述
运行以下代码
var_dump(get_headers("http://www.domainnnnnnnnnnnnnnnnnnnnnnnnnnnn.com/CraxyFile.jpg"));
对于任何不存在的域或URL,返回HTTP 200而不是404
Array
(
[0] => HTTP/1.1 200 OK
[1] => Server: nginx/1.1.15
[2] => Date: Mon, 08 Oct 2012 12:29:13 GMT
[3] => Content-Type: text/html; charset=utf-8
[4] => Connection: close
[5] => Set-Cookie: PHPSESSID=3iucojet7bt2peub72rgo0iu21; path=/; HttpOnly
[6] => Expires: Thu, 19 Nov 1981 08:52:00 GMT
[7] => Cache-Control: no-store, no-cache, must-revalidate, post-check=0, pre-check=0
[8] => Pragma: no-cache
[9] => Set-Cookie: bypassStaticCache=deleted; expires=Thu, 01-Jan-1970 00:00:01 GMT; path=/; httponly
[10] => Set-Cookie: bypassStaticCache=deleted; expires=Thu, 01-Jan-1970 00:00:01 GMT; path=/; httponly
[11] => Vary: Accept
)
如果您运行
var_dump(get_headers("http://www.domain.com/CraxyFile.jpg"));
您得到
Array
(
[0] => HTTP/1.1 404 Not Found
[1] => Date: Mon, 08 Oct 2012 12:32:18 GMT
[2] => Content-Type: text/html
[3] => Content-Length: 8727
[4] => Connection: close
[5] => Server: Apache
[6] => Vary: Accept-Encoding
)
在很多情况下,已经证明get_headers
是验证现有URL的解决方案
They are so many instances where get_headers
has been proven to be a solution to validate existing URL
- What is the best way to check if a URL exists in PHP?
- How can I check if a URL exists via PHP?
发现CURL也存在相同的问题
Got to find out that CURL also has the same issue
$curl = curl_init();
curl_setopt_array($curl, array(CURLOPT_RETURNTRANSFER => true,CURLOPT_URL => 'idontexist.tld'));
curl_exec($curl);
$info = curl_getinfo($curl);
curl_close($curl);
var_dump($info);
推荐答案
问题与域名的长度无关,只是域名是否存在.
The problem is nothing to do with the length of the domain name, it is simply whether the domain exists.
您正在使用DNS服务,该服务将不存在的域解析到服务器,该服务器为您提供友好的"错误页面,该页面返回200响应码.这意味着get_headers()
也不是问题,具体来说,它是任何依赖于合理的DNS查找的过程.
You are using a DNS service that resolves non-existent domains to a server that gives you a "friendly" error page, which it returns with a 200 response code. This means it is also not a problem with get_headers()
specifically, it is any procedure with an underlying reliance on sensible DNS lookups.
一种处理此问题的方法而无需对您工作的每个环境进行硬编码,如下所示:
A way to handle this without hardcoding a work around for every environment you work in might look something like this:
// A domain that definitely does not exist. The easiest way to guarantee that
// this continues to work is to use an illegal top-level domain (TLD) suffix
$testDomain = 'idontexist.tld';
// If this resolves to an IP, we know that we are behind a service such as this
// We can simply compare the actual domain we test with the result of this
$badIP = gethostbyname($testDomain);
// Then when you want to get_headers()
$url = 'http://www.domainnnnnnnnnnnnnnnnnnnnnnnnnnnn.com/CraxyFile.jpg';
$host = parse_url($url, PHP_URL_HOST);
if (gethostbyname($host) === $badIP) {
// The domain does not exist - probably handle this as if it were a 404
} else {
// do the actual get_headers() stuff here
}
您可能希望以某种方式缓存第一次调用gethostbyname()
的返回值,因为您知道自己正在查找一个不存在的名称,这通常可能需要几秒钟.
You may want to somehow cache the return value of the first call to gethostbyname()
, since you know you are looking up a name that does not exist, and this can often take a few seconds.
这篇关于get_headers不一致的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!