PHP Curl跟随重定向 [英] PHP Curl following redirects

查看:319
本文介绍了PHP Curl跟随重定向的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我想要有点狡猾,作为学习过程的一部分,尝试改善我的页面刮取技巧。

I'm trying to be a bit sneeky and as part of a learning process try and improve my page scraping skills.

我遇到了一件事我还没有能够解决的是,某些网站将使用一个内部链接,然后重定向到一个外部链接。

One thing i've come across that I have yet to be able to solve is that certain sites will use an internal link which then redirects to an external link.

我想做的是修改一些curl代码

What I want to do is modify some curl code to follow the redirects until they stop and then obtain the final resting place URL.

任何人都为我推荐一些代码?

Anyone recommend some code for me?

我目前有这个,但目前无法正确追踪重新导向。

I have this at the moment, but it's not following the redirects properly at the moment.

        $opts = array(CURLOPT_URL => $url,
                      CURLOPT_RETURNTRANSFER => true,
                      CURLOPT_HEADER => true,
                      CURLOPT_FOLLOWLOCATION => true);      

        $curl = curl_init(); 
        curl_setopt_array($curl, $opts);  
        $str = curl_exec($curl);  
        curl_close($curl);  


推荐答案

http:// php.net/manual/en /ref.curl.php


http.//php.net/manual/en/ref.curl.php

   function get_final_url( $url, $timeout = 5 )
 {
    $url = str_replace( "&", "&", urldecode(trim($url)) );

   $cookie = tempnam ("/tmp", "CURLCOOKIE");
$ch = curl_init();
curl_setopt( $ch, CURLOPT_USERAGENT, "Mozilla/5.0 (Windows; U; Windows NT 5.1; rv:1.7.3) Gecko/20041001 Firefox/0.10.1" );
curl_setopt( $ch, CURLOPT_URL, $url );
curl_setopt( $ch, CURLOPT_COOKIEJAR, $cookie );
curl_setopt( $ch, CURLOPT_FOLLOWLOCATION, true );
curl_setopt( $ch, CURLOPT_ENCODING, "" );
curl_setopt( $ch, CURLOPT_RETURNTRANSFER, true );
curl_setopt( $ch, CURLOPT_AUTOREFERER, true );
curl_setopt( $ch, CURLOPT_CONNECTTIMEOUT, $timeout );
curl_setopt( $ch, CURLOPT_TIMEOUT, $timeout );
curl_setopt( $ch, CURLOPT_MAXREDIRS, 10 );
$content = curl_exec( $ch );
$response = curl_getinfo( $ch );
curl_close ( $ch );

if ($response['http_code'] == 301 || $response['http_code'] == 302)
{
    ini_set("user_agent", "Mozilla/5.0 (Windows; U; Windows NT 5.1; rv:1.7.3) Gecko/20041001 Firefox/0.10.1");
    $headers = get_headers($response['url']);

    $location = "";
    foreach( $headers as $value )
    {
        if ( substr( strtolower($value), 0, 9 ) == "location:" )
            return get_final_url( trim( substr( $value, 9, strlen($value) ) ) );
    }
}

if (    preg_match("/window\.location\.replace\('(.*)'\)/i", $content, $value) ||
        preg_match("/window\.location\=\"(.*)\"/i", $content, $value)
)
{
    return get_final_url ( $value[1] );
}
else
{
    return $response['url'];
   }
}

这篇关于PHP Curl跟随重定向的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆