如何使用PHP编程检查有效(非死亡)链接? [英] How do I check for valid (not dead) links programatically using PHP?

查看:112
本文介绍了如何使用PHP编程检查有效(非死亡)链接?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

给定网址列表,我想检查每个网址:




  • 返回200 OK状态代码

  • 在X个时间内返回响应



最终目标是能够标记网址的系统



脚本将用PHP编写,很可能每天通过cron运行。

$



这个脚本会处理大约1000个网址。

问题有两部分:




  • 有这样一个操作的任何大事情,你遇到了什么问题?

  • 什么是最好的检查方法考虑到准确性和性能,PHP中的网址状态?



非常感谢您花费时间。

解决方案

使用PHP cURL扩展。与fopen()不同,它还可以使HTTP HEAD请求足以检查URL的可用性,并节省了大量的带宽,因为您不必下载整个页面的主体来检查。



作为一个起点,你可以使用这样的函数:

  function is_available url,$ timeout = 30){
$ ch = curl_init(); // get cURL handle

//设置cURL选项
$ opts = array(CURLOPT_RETURNTRANSFER => true,//不输出到浏览器
CURLOPT_URL => $ url ,// set URL
CURLOPT_NOBODY => true,// do a HEAD request only
CURLOPT_TIMEOUT => $ timeout); // set timeout
curl_setopt_array($ ch,$ opts);

curl_exec($ ch); //做吧!

$ retval = curl_getinfo($ ch,CURLINFO_HTTP_CODE)== 200; // check if HTTP OK

curl_close($ ch); // close handle

return $ retval;
}

但是,有很多可能的优化: cURL实例,如果每个主机检查多个URL,甚至重新使用连接。



哦,这个代码确实严格检查HTTP响应代码200。它不跟随重定向(302) - 但是也有一个cURL选项。


Given a list of urls, I would like to check that each url:

  • Returns a 200 OK status code
  • Returns a response within X amount of time

The end goal is a system that is capable of flagging urls as potentially broken so that an administrator can review them.

The script will be written in PHP and will most likely run on a daily basis via cron.

The script will be processing approximately 1000 urls at a go.

Question has two parts:

  • Are there any bigtime gotchas with an operation like this, what issues have you run into?
  • What is the best method for checking the status of a url in PHP considering both accuracy and performance?

Thanks very much for taking the time.

解决方案

Use the PHP cURL extension. Unlike fopen() it can also make HTTP HEAD requests which are sufficient to check the availability of a URL and save you a ton of bandwith as you don't have to download the entire body of the page to check.

As a starting point you could use some function like this:

function is_available($url, $timeout = 30) {
    $ch = curl_init(); // get cURL handle

    // set cURL options
    $opts = array(CURLOPT_RETURNTRANSFER => true, // do not output to browser
                  CURLOPT_URL => $url,            // set URL
                  CURLOPT_NOBODY => true,         // do a HEAD request only
                  CURLOPT_TIMEOUT => $timeout);   // set timeout
    curl_setopt_array($ch, $opts); 

    curl_exec($ch); // do it!

    $retval = curl_getinfo($ch, CURLINFO_HTTP_CODE) == 200; // check if HTTP OK

    curl_close($ch); // close handle

    return $retval;
}

However, there's a ton of possible optimizations: You might want to re-use the cURL instance and, if checking more than one URL per host, even re-use the connection.

Oh, and this code does check strictly for HTTP response code 200. It does not follow redirects (302) -- but there also is a cURL-option for that.

这篇关于如何使用PHP编程检查有效(非死亡)链接?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆