从Google快讯链接中提取原始网址 [英] Extract original url from Google Alerts link

查看:131
本文介绍了从Google快讯链接中提取原始网址的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我希望有人可以帮助我解决这个小问题.

我正在使用Google警报来引入重大新闻报道以在网站上列出,很不幸,当我尝试查找原始网址(在Google警报之前)时,我得到的只是一个Google网址,如下所示;

http://www.google.com/url?sa=X&q =

http://www.source.com/2013/04 /02/title.html

& ct = ga& cad = CAcQARgAIAAoATAAOABArOXtigVIAlAAWABiBWVuLVVT& cd = ZQHHhnCXS8w& usg = AFQjCNGGGZgSyC3KvMJUW0ICYsCtRZ2uJA

我将此网址分为相关部分,以使其易于理解,第一部分始终完全相同,而第二部分和第二部分则完全相同.第三部分确实会改变.但是,第三部分始终以& ct =开头,我认为这是查询的一部分.

在我使用的脚本中,整个URL都分配为$ link& ;;如果可能的话,我想做的是从Google快讯网址中提取原始源网址,以使归因能够准确定位到&而不是中间的人!

我的php知识非常基础,因此,对此的任何帮助将不胜感激.

谢谢

解决方案

您可以使用此函数,该函数基本上采用起始URL,跟随所有重定向并为其返回最后一个有效URL.

/**
 * Get target url from a redirect
 *
 * @param string $url Source url
 * @return string
 */

function getLastEffectiveUrl($url) {

    // initialize cURL
    $curl = curl_init($url);
    curl_setopt_array($curl, array(
        CURLOPT_RETURNTRANSFER  => true,
        CURLOPT_FOLLOWLOCATION  => true,
    ));

    // execute the request
    $result = curl_exec($curl);

    // fail if the request was not successful
    if ($result === false) {
        curl_close($curl);
        return null;
    }

    // extract the target url
    $redirectUrl = curl_getinfo($curl, CURLINFO_EFFECTIVE_URL);
    curl_close($curl);

        return $redirectUrl;
    }

用法很简单.如果我们想获取Mark Zuckerberg的个人资料图片的最后一个有效URL,我们将调用以下函数:

    $lastEffectiveUrl = getLastEffectiveUrl('http://graph.facebook.com/4/picture');

调用后的$lastEffectiveUrl值应为预期值:

    'http://profile.ak.fbcdn.net/hprofile-ak-snc4/157340_4_3955636_q.jpg';

全部的荣誉归功于撰写这篇文章的人,我只是做了一点挖掘:http://www.google.com/url?sa=X&q=

http://www.source.com/2013/04/02/title.html

&ct=ga&cad=CAcQARgAIAAoATAAOABArOXtigVIAlAAWABiBWVuLVVT&cd=ZQHHhnCXS8w&usg=AFQjCNGGGZgSyC3KvMJUW0ICYsCtRZ2uJA

I've broken this url into the relevant sections to make it easier to follow, the 1st part is always exactly the same, however the 2nd & 3rd parts do change. The 3rd part however always starts with &ct= which I assume is part of a query..?

In the script I am using, this entire url is assigned as $link & what I would like to do if possible is to extract the original source url from the Google Alerts url, so that attribution goes where it is meant to go & not to the guy in the middle!

My php knowledge is very basic so any help on this would be greatly appreciated.

Thanks

解决方案

You can use this function which basically takes the starting URL, follows all the redirects and returns the last effective URL for it.

/**
 * Get target url from a redirect
 *
 * @param string $url Source url
 * @return string
 */

function getLastEffectiveUrl($url) {

    // initialize cURL
    $curl = curl_init($url);
    curl_setopt_array($curl, array(
        CURLOPT_RETURNTRANSFER  => true,
        CURLOPT_FOLLOWLOCATION  => true,
    ));

    // execute the request
    $result = curl_exec($curl);

    // fail if the request was not successful
    if ($result === false) {
        curl_close($curl);
        return null;
    }

    // extract the target url
    $redirectUrl = curl_getinfo($curl, CURLINFO_EFFECTIVE_URL);
    curl_close($curl);

        return $redirectUrl;
    }

The usage is straightforward. If we wanted to fetch the last effective URL for Mark Zuckerberg's profile image we would call the function like this:

    $lastEffectiveUrl = getLastEffectiveUrl('http://graph.facebook.com/4/picture');

The value of $lastEffectiveUrl after the call would be the expected:

    'http://profile.ak.fbcdn.net/hprofile-ak-snc4/157340_4_3955636_q.jpg';

ALL the credit is for the guy who wrote this post, I just did a little digging: Get the last effective URL from a series of redirects for the given URL

这篇关于从Google快讯链接中提取原始网址的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆