使用 php 获取缩短网址(如 bit.ly)的最终网址 [英] Getting final urls of shortened urls (like bit.ly) using php

查看:72
本文介绍了使用 php 获取缩短网址(如 bit.ly)的最终网址的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

[底部更新]
大家好.

从短网址开始:
想象一下,您在一个 php 数组中有 5 个短网址的集合(例如 http://bit.ly),像这样:

$shortUrlArray = array("http://bit.ly/123",
"http://bit.ly/123",
"http://bit.ly/123",
"http://bit.ly/123",
"http://bit.ly/123");

以最终的重定向网址结尾:
如何使用 php 获取这些短网址的最终网址?像这样:

End with Final, Redirected URLs:
How can I get the final url of these short urls with php? Like this:

http://www.example.com/some-directory/some-页面.html
http://www.example.com/some-directory/some-page.html
http://www.example.com/some-directory/some-page.html
http://www.example.com/some-directory/some-page.html
http://www.example.com/some-directory/some-page.html

我有一种方法(在网上找到)适用于单个 url,但是当循环多个 url 时,它只适用于数组中的最终 url.供大家参考,方法是这样的:

I have one method (found online) that works well with a single url, but when looping over multiple urls, it only works with the final url in the array. For your reference, the method is this:

function get_web_page( $url ) 
{ 
    $options = array( 
        CURLOPT_RETURNTRANSFER => true,     // return web page 
        CURLOPT_HEADER         => true,    // return headers 
        CURLOPT_FOLLOWLOCATION => true,     // follow redirects 
        CURLOPT_ENCODING       => "",       // handle all encodings 
        CURLOPT_USERAGENT      => "spider", // who am i 
        CURLOPT_AUTOREFERER    => true,     // set referer on redirect 
        CURLOPT_CONNECTTIMEOUT => 120,      // timeout on connect 
        CURLOPT_TIMEOUT        => 120,      // timeout on response 
        CURLOPT_MAXREDIRS      => 10,       // stop after 10 redirects 
    ); 

    $ch      = curl_init( $url ); 
    curl_setopt_array( $ch, $options ); 
    $content = curl_exec( $ch ); 
    $err     = curl_errno( $ch ); 
    $errmsg  = curl_error( $ch ); 
    $header  = curl_getinfo( $ch ); 
    curl_close( $ch ); 

    //$header['errno']   = $err; 
    //$header['errmsg']  = $errmsg; 
    //$header['content'] = $content; 
    print($header[0]); 
    return $header; 
}  


//Using the above method in a for loop

$finalURLs = array();

$lineCount = count($shortUrlArray);

for($i = 0; $i <= $lineCount; $i++){

    $singleShortURL = $shortUrlArray[$i];

    $myUrlInfo = get_web_page( $singleShortURL ); 

    $rawURL = $myUrlInfo["url"];

    array_push($finalURLs, $rawURL);

}

关闭,但还不够
此方法有效,但仅适用于单个 url.我不能在我想要做的 for 循环中使用它.当在 for 循环中的上述示例中使用时,前四个元素返回原样,只有最后一个元素被转换为其最终 url.无论您的数组是 5 个元素还是 500 个元素,都会发生这种情况.

Close, but not enough
This method works, but only with a single url. I Can't use it in a for loop which is what I want to do. When used in the above example in a for loop, the first four elements come back unchanged, and only the final element is converted into its final url. This happens whether your array is 5 elements or 500 elements long.

寻求解决方案:
请给我一个提示,告诉我当在带有 url 集合(而不仅仅是一个)的 for 循环中使用时如何修改此方法以使其工作.

Solution Sought:
Please give me a hint as to how you'd modify this method to work when used inside of a for loop with collection of urls (Rather than just one).

-或-

如果您知道更适合此任务的代码,请将其包含在您的答案中.

If you know of code that is better suited for this task, please include it in your answer.

提前致谢.

更新:
经过一些进一步的刺激,我发现问题不在于上述方法(毕竟,它似乎在 for 循环中工作得很好),而可能是编码.当我硬编码一个短 url 数组时,循环工作正常.但是,当我使用 GET 或 POST 从一个 html 表单中传入一个以换行符分隔的 url 块时,就会出现上述问题.当我提交表单时,网址是否以某种方式被更改为与该方法不兼容的格式????

Update:
After some further prodding I've found that the problem lies not in the above method (which, after all, seems to work fine in for loops) but possibly encoding. When I hard-code an array of short urls, the loop works fine. But when I pass in a block of newline-seperated urls from an html form using GET or POST, the above mentioned problem ensues. Are the urls somehow being changed into a format not compatible with the method when I submit the form????

新更新:
你们,我发现我的问题是由于与上述方法无关的事情.我的问题是,我的短网址的网址编码将我认为只是换行符(分隔网址)的内容转换为:%0D%0A,这是一个换行符或返回字符...而且所有短网址都保存在集合中的最终 url 在尾部附加了一个幽灵"字符,因此无法仅检索这些字符的最终 url.我确定了幽灵角色,纠正了我的 php 爆炸,现在一切正常.抱歉,谢谢.

New Update:
You guys, I've found that my problem was due to something unrelated to the above method. My problem was that the URL encoding of my short urls converted what i thought were just newline characters (separating the urls) into this: %0D%0A which is a line feed or return character... And that all short urls save for the final url in the collection had a "ghost" character appended to the tail, thus making it impossible to retrieve the final urls for those only. I identified the ghost character, corrected my php explode, and all works fine now. Sorry and thanks.

推荐答案

这可能会有所帮助:如何将字符串放入数组,按新行拆分?

假设您在 POST 中获得了返回的 URL,您可能会做这样的事情:

You would probably do something like this, assuming you're getting the URLs returned in POST:

$final_urls = array();

$short_urls = explode( chr(10), $_POST['short_urls'] ); //You can replace chr(10) with "
" or "
", depending on how you get your urls. And of course, change $_POST['short_urls'] to the source of your string.

foreach ( $short_urls as $short ) {
    $final_urls[] = get_web_page( $short );
}

我得到以下输出,使用 var_dump($final_urls); 和你的 bit.ly url:

I get the following output, using var_dump($final_urls); and your bit.ly url:

http://codepad.org/8YhqlCo1

我的来源:$_POST['short_urls'] = "http://bit.ly/123 http://bit.ly/123 http://bit.ly/123 http://bit.ly/123";

And my source: $_POST['short_urls'] = "http://bit.ly/123 http://bit.ly/123 http://bit.ly/123 http://bit.ly/123";

我也有一个错误,使用你的函数:Notice: Undefined offset: 0 in/var/www/test.php on line 27 Line 27: print($header[0]); 我不确定你想要什么......

I also got an error, using your function: Notice: Undefined offset: 0 in /var/www/test.php on line 27 Line 27: print($header[0]); I'm not sure what you wanted there...

这是我的 test.php,如果有帮助的话:http://codepad.org/zI2waOWL

Here's my test.php, if it will help: http://codepad.org/zI2wAOWL

这篇关于使用 php 获取缩短网址(如 bit.ly)的最终网址的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆