使用 php 获取缩短网址(如 bit.ly)的最终网址 [英] Getting final urls of shortened urls (like bit.ly) using php
问题描述
[底部更新]
大家好.
从短网址开始:
想象一下,您在一个 php 数组中有 5 个短网址的集合(例如 http://bit.ly),像这样:
$shortUrlArray = array("http://bit.ly/123",
"http://bit.ly/123",
"http://bit.ly/123",
"http://bit.ly/123",
"http://bit.ly/123");
以最终的重定向网址结尾:
如何使用 php 获取这些短网址的最终网址?像这样:
End with Final, Redirected URLs:
How can I get the final url of these short urls with php? Like this:
http://www.example.com/some-directory/some-页面.html
http://www.example.com/some-directory/some-page.html
http://www.example.com/some-directory/some-page.html
http://www.example.com/some-directory/some-page.html
http://www.example.com/some-directory/some-page.html
我有一种方法(在网上找到)适用于单个 url,但是当循环多个 url 时,它只适用于数组中的最终 url.供大家参考,方法是这样的:
I have one method (found online) that works well with a single url, but when looping over multiple urls, it only works with the final url in the array. For your reference, the method is this:
function get_web_page( $url )
{
$options = array(
CURLOPT_RETURNTRANSFER => true, // return web page
CURLOPT_HEADER => true, // return headers
CURLOPT_FOLLOWLOCATION => true, // follow redirects
CURLOPT_ENCODING => "", // handle all encodings
CURLOPT_USERAGENT => "spider", // who am i
CURLOPT_AUTOREFERER => true, // set referer on redirect
CURLOPT_CONNECTTIMEOUT => 120, // timeout on connect
CURLOPT_TIMEOUT => 120, // timeout on response
CURLOPT_MAXREDIRS => 10, // stop after 10 redirects
);
$ch = curl_init( $url );
curl_setopt_array( $ch, $options );
$content = curl_exec( $ch );
$err = curl_errno( $ch );
$errmsg = curl_error( $ch );
$header = curl_getinfo( $ch );
curl_close( $ch );
//$header['errno'] = $err;
//$header['errmsg'] = $errmsg;
//$header['content'] = $content;
print($header[0]);
return $header;
}
//Using the above method in a for loop
$finalURLs = array();
$lineCount = count($shortUrlArray);
for($i = 0; $i <= $lineCount; $i++){
$singleShortURL = $shortUrlArray[$i];
$myUrlInfo = get_web_page( $singleShortURL );
$rawURL = $myUrlInfo["url"];
array_push($finalURLs, $rawURL);
}
关闭,但还不够
此方法有效,但仅适用于单个 url.我不能在我想要做的 for 循环中使用它.当在 for 循环中的上述示例中使用时,前四个元素返回原样,只有最后一个元素被转换为其最终 url.无论您的数组是 5 个元素还是 500 个元素,都会发生这种情况.
Close, but not enough
This method works, but only with a single url. I Can't use it in a for loop which is what I want to do. When used in the above example in a for loop, the first four elements come back unchanged, and only the final element is converted into its final url. This happens whether your array is 5 elements or 500 elements long.
寻求解决方案:
请给我一个提示,告诉我当在带有 url 集合(而不仅仅是一个)的 for 循环中使用时如何修改此方法以使其工作.
Solution Sought:
Please give me a hint as to how you'd modify this method to work when used inside of a for loop with collection of urls (Rather than just one).
-或-
如果您知道更适合此任务的代码,请将其包含在您的答案中.
If you know of code that is better suited for this task, please include it in your answer.
提前致谢.
更新:
经过一些进一步的刺激,我发现问题不在于上述方法(毕竟,它似乎在 for 循环中工作得很好),而可能是编码.当我硬编码一个短 url 数组时,循环工作正常.但是,当我使用 GET 或 POST 从一个 html 表单中传入一个以换行符分隔的 url 块时,就会出现上述问题.当我提交表单时,网址是否以某种方式被更改为与该方法不兼容的格式????
Update:
After some further prodding I've found that the problem lies not in the above method (which, after all, seems to work fine in for loops) but possibly encoding. When I hard-code an array of short urls, the loop works fine. But when I pass in a block of newline-seperated urls from an html form using GET or POST, the above mentioned problem ensues. Are the urls somehow being changed into a format not compatible with the method when I submit the form????
新更新:
你们,我发现我的问题是由于与上述方法无关的事情.我的问题是,我的短网址的网址编码将我认为只是换行符(分隔网址)的内容转换为:%0D%0A,这是一个换行符或返回字符...而且所有短网址都保存在集合中的最终 url 在尾部附加了一个幽灵"字符,因此无法仅检索这些字符的最终 url.我确定了幽灵角色,纠正了我的 php 爆炸,现在一切正常.抱歉,谢谢.
New Update:
You guys, I've found that my problem was due to something unrelated to the above method. My problem was that the URL encoding of my short urls converted what i thought were just newline characters (separating the urls) into this: %0D%0A which is a line feed or return character... And that all short urls save for the final url in the collection had a "ghost" character appended to the tail, thus making it impossible to retrieve the final urls for those only. I identified the ghost character, corrected my php explode, and all works fine now. Sorry and thanks.
推荐答案
这可能会有所帮助:如何将字符串放入数组,按新行拆分?
假设您在 POST 中获得了返回的 URL,您可能会做这样的事情:
You would probably do something like this, assuming you're getting the URLs returned in POST:
$final_urls = array();
$short_urls = explode( chr(10), $_POST['short_urls'] ); //You can replace chr(10) with "
" or "
", depending on how you get your urls. And of course, change $_POST['short_urls'] to the source of your string.
foreach ( $short_urls as $short ) {
$final_urls[] = get_web_page( $short );
}
我得到以下输出,使用 var_dump($final_urls);
和你的 bit.ly url:
I get the following output, using var_dump($final_urls);
and your bit.ly url:
我的来源:$_POST['short_urls'] = "http://bit.ly/123
http://bit.ly/123
http://bit.ly/123
http://bit.ly/123";
And my source: $_POST['short_urls'] = "http://bit.ly/123
http://bit.ly/123
http://bit.ly/123
http://bit.ly/123";
我也有一个错误,使用你的函数:Notice: Undefined offset: 0 in/var/www/test.php on line 27
Line 27: print($header[0]);
我不确定你想要什么......
I also got an error, using your function: Notice: Undefined offset: 0 in /var/www/test.php on line 27
Line 27: print($header[0]);
I'm not sure what you wanted there...
这是我的 test.php
,如果有帮助的话:http://codepad.org/zI2waOWL
Here's my test.php
, if it will help: http://codepad.org/zI2wAOWL
这篇关于使用 php 获取缩短网址(如 bit.ly)的最终网址的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!