如何最有效地获取完整的URL地址? [英] How to get the complete URL address most efficiently?
问题描述
我正在使用Java程序从短网址获取扩展的网址.给定Java URLConnection
,在这两种方法中,哪种方法更好地获得期望的结果?
I'm using a Java program to get expanded URLs from short URLs. Given a Java URLConnection
, among the two approaches, which one is better to get the desired result?
Connection.getHeaderField("Location");
vs
Connection.getURL();
我想他们两个都给出相同的输出.第一种方法没有给我最好的结果,只有7分之1得到解决.第二种方法可以提高效率吗?
I guess both of them give the same output. The first approach did not give me the best results, only 1 out of 7 were resolved. Can the efficiency be increased by the second approach?
我们可以使用其他更好的方法吗?
Can we use any other better approach?
推荐答案
我将使用以下内容:
@Test
public void testLocation() throws Exception {
final String link = "http://bit.ly/4Agih5";
final URL url = new URL(link);
final HttpURLConnection urlConnection = (HttpURLConnection) url.openConnection();
urlConnection.setInstanceFollowRedirects(false);
final String location = urlConnection.getHeaderField("location");
assertEquals("http://stackoverflow.com/", location);
assertEquals(link, urlConnection.getURL().toString());
}
使用setInstanceFollowRedirects(false)
时,HttpURLConnection
不会跟随重定向,并且仅从bit.ly
的重定向页面将不会下载目标页面(在上例中为stackoverflow.com
).
With setInstanceFollowRedirects(false)
the HttpURLConnection
does not follow redirects and the destination page (stackoverflow.com
in the above example) will not be downloaded just the redirect page from bit.ly
.
一个缺点是,当解析的bit.ly
URL指向例如tinyurl.com
上的另一个短URL时,您将获得tinyurl.com
链接,而不是tinyurl.com
重定向到的链接.
One drawback is that when a resolved bit.ly
URL points to another short URL for example on tinyurl.com
you will get a tinyurl.com
link, not what the tinyurl.com
redirects to.
修改:
要查看bit.ly
的响应,请使用curl
:
To see the reponse of bit.ly
use curl
:
$ curl --dump-header /tmp/headers http://bit.ly/4Agih5
<html>
<head>
<title>bit.ly</title>
</head>
<body>
<a href="http://stackoverflow.com/">moved here</a>
</body>
</html>
如您所见,bit.ly
仅发送一个简短的重定向页面.然后检查HTTP标头:
As you can see bit.ly
sends only a short redirect page. Then check the HTTP headers:
$ cat /tmp/headers
HTTP/1.0 301 Moved Permanently
Server: nginx
Date: Wed, 06 Nov 2013 08:48:59 GMT
Content-Type: text/html; charset=utf-8
Cache-Control: private; max-age=90
Location: http://stackoverflow.com/
Mime-Version: 1.0
Content-Length: 117
X-Cache: MISS from cam
X-Cache-Lookup: MISS from cam:3128
Via: 1.1 cam:3128 (squid/2.7.STABLE7)
Connection: close
它发送带有Location
标头(指向http://stackoverflow.com/
)的301 Moved Permanently
响应.现代浏览器不会向您显示上面的HTML页面.相反,它们会自动将您重定向到Location
标头中的URL.
It sends a 301 Moved Permanently
response with a Location
header (which points to http://stackoverflow.com/
). Modern browsers don't show you the HTML page above. Instead they automatically redirect you to the URL in the Location
header.
这篇关于如何最有效地获取完整的URL地址?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!