PHP Curl跟随重定向 [英] PHP Curl following redirects
问题描述
我想要有点狡猾,作为学习过程的一部分,尝试改善我的页面刮取技巧。
I'm trying to be a bit sneeky and as part of a learning process try and improve my page scraping skills.
我遇到了一件事我还没有能够解决的是,某些网站将使用一个内部链接,然后重定向到一个外部链接。
One thing i've come across that I have yet to be able to solve is that certain sites will use an internal link which then redirects to an external link.
我想做的是修改一些curl代码
What I want to do is modify some curl code to follow the redirects until they stop and then obtain the final resting place URL.
任何人都为我推荐一些代码?
Anyone recommend some code for me?
我目前有这个,但目前无法正确追踪重新导向。
I have this at the moment, but it's not following the redirects properly at the moment.
$opts = array(CURLOPT_URL => $url,
CURLOPT_RETURNTRANSFER => true,
CURLOPT_HEADER => true,
CURLOPT_FOLLOWLOCATION => true);
$curl = curl_init();
curl_setopt_array($curl, $opts);
$str = curl_exec($curl);
curl_close($curl);
推荐答案
http:// php.net/manual/en /ref.curl.php
http.//php.net/manual/en/ref.curl.php
function get_final_url( $url, $timeout = 5 )
{
$url = str_replace( "&", "&", urldecode(trim($url)) );
$cookie = tempnam ("/tmp", "CURLCOOKIE");
$ch = curl_init();
curl_setopt( $ch, CURLOPT_USERAGENT, "Mozilla/5.0 (Windows; U; Windows NT 5.1; rv:1.7.3) Gecko/20041001 Firefox/0.10.1" );
curl_setopt( $ch, CURLOPT_URL, $url );
curl_setopt( $ch, CURLOPT_COOKIEJAR, $cookie );
curl_setopt( $ch, CURLOPT_FOLLOWLOCATION, true );
curl_setopt( $ch, CURLOPT_ENCODING, "" );
curl_setopt( $ch, CURLOPT_RETURNTRANSFER, true );
curl_setopt( $ch, CURLOPT_AUTOREFERER, true );
curl_setopt( $ch, CURLOPT_CONNECTTIMEOUT, $timeout );
curl_setopt( $ch, CURLOPT_TIMEOUT, $timeout );
curl_setopt( $ch, CURLOPT_MAXREDIRS, 10 );
$content = curl_exec( $ch );
$response = curl_getinfo( $ch );
curl_close ( $ch );
if ($response['http_code'] == 301 || $response['http_code'] == 302)
{
ini_set("user_agent", "Mozilla/5.0 (Windows; U; Windows NT 5.1; rv:1.7.3) Gecko/20041001 Firefox/0.10.1");
$headers = get_headers($response['url']);
$location = "";
foreach( $headers as $value )
{
if ( substr( strtolower($value), 0, 9 ) == "location:" )
return get_final_url( trim( substr( $value, 9, strlen($value) ) ) );
}
}
if ( preg_match("/window\.location\.replace\('(.*)'\)/i", $content, $value) ||
preg_match("/window\.location\=\"(.*)\"/i", $content, $value)
)
{
return get_final_url ( $value[1] );
}
else
{
return $response['url'];
}
}
这篇关于PHP Curl跟随重定向的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!