从原始地址获取重定向的URL [英] Getting the Redirected URL from the Original URL

查看:159
本文介绍了从原始地址获取重定向的URL的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我在我的数据库中的表,其中包含一些网站的网址。我要打开这些网址并验证这些页面上的一些链接。的问题是,某些URL重定向到其它网址。我的逻辑是没有这样的网址。

I have a table in my database which contains the URLs of some websites. I have to open those URLs and verify some links on those pages. The problem is that some URLs get redirected to other URLs. My logic is failing for such URLs.

有没有一些方法,通过它我可以通过我的原始URL字符串,并得到重定向的URL后面?

Is there some way through which I can pass my original URL string and get the redirected URL back?

例:我想这个网址:  <一href="http://individual.troweprice.com/public/Retail/xStaticFiles/FormsAndLiterature/CollegeSavings/trp529Disclosure.pdf">http://individual.troweprice.com/public/Retail/xStaticFiles/FormsAndLiterature/CollegeSavings/trp529Disclosure.pdf

Example: I am trying with this URL: http://individual.troweprice.com/public/Retail/xStaticFiles/FormsAndLiterature/CollegeSavings/trp529Disclosure.pdf

它被重定向到这一个: <一href="http://individual.troweprice.com/staticFiles/Retail/Shared/PDFs/trp529Disclosure.pdf">http://individual.troweprice.com/staticFiles/Retail/Shared/PDFs/trp529Disclosure.pdf

It gets redirected to this one: http://individual.troweprice.com/staticFiles/Retail/Shared/PDFs/trp529Disclosure.pdf

我试图用下面的code:

I tried to use following code:

HttpWebRequest req = (HttpWebRequest)WebRequest.Create(Uris);
req.Proxy = proxy;
req.Method = "HEAD";
req.AllowAutoRedirect = false;

HttpWebResponse myResp = (HttpWebResponse)req.GetResponse();
if (myResp.StatusCode == HttpStatusCode.Redirect)
{
  MessageBox.Show("redirected to:" + myResp.GetResponseHeader("Location"));
}

当我执行code以上它给了我的HTTPStatus $ C $库克群岛。我觉得很奇怪,为什么它不考虑它重定向。如果我打开Internet Explorer中的链接,然后它会重定向到另一个网址,打开PDF文件。

When I execute the code above it gives me HttpStatusCodeOk. I am surprised why it is not considering it a redirection. If I open the link in Internet Explorer then it will redirect to another URL and open the PDF file.

有人可以帮助我了解为什么它不能正常工作的例子网址是什么?

Can someone help me understand why it is not working properly for the example URL?

顺便说一句,我使用Hotmail的URL检查( http://www.hotmail.com ),它正确返回重定向的URL

By the way, I checked with Hotmail's URL (http://www.hotmail.com) and it correctly returns the redirected URL.

谢谢

推荐答案

你提到使用一个JavaScript重定向的URL,这只会重定向浏览器。因此,有没有简单的方法来检测重定向。

The URL you mentioned uses a JavaScript redirect, which will only redirect a browser. So there's no easy way to detect the redirect.

有关正确(HTTP状态code和地点:)重定向,你可能想要删除

For proper (HTTP Status Code and Location:) redirects, you might want to remove

req.AllowAutoRedirect = false;

和使用得到最终的网址

myResp.ResponseUri

如可以有一个以上的重定向。

as there can be more than one redirect.

更新:有关重定向更多的澄清:

有重定向一个浏览器到另一个URL不止一种方法

There's more than one way to redirect a browser to another URL.

第一种方法是使用3xx的HTTP状态code和位置:头。这是路神仙打算HTTP重定向到工作,也被称为唯一正确的方法。这种方法将工作在所有浏览器和爬虫。

The first way is to use a 3xx HTTP status code, and the Location: header. This is the way the gods intended HTTP redirects to work, and is also known as "the one true way." This method will work on all browsers and crawlers.

再有魔鬼的方式。这些措施包括元刷新,刷新:头,和JavaScript。虽然这些方法在大多数浏览器上工作,他们肯定是不能保证工作,偶尔产生奇怪的行为(又名。断背按钮)。

And then there are the devil's ways. These include meta refresh, the Refresh: header, and JavaScript. Although these methods work in most browsers, they are definitely not guaranteed to work, and occasionally result in strange behavior (aka. breaking the back button).

大多数网络爬虫,包括Googlebot的,忽略这些重定向方式,你也应该这样。如果你绝对的有无的检测所有重定向,那么你就必须解析HTML的META标签,查找刷新:标题中的反应,并评估的Javascript。祝是最后一个。

Most web crawlers, including the Googlebot, ignore these redirection methods, and so should you. If you absolutely have to detect all redirects, then you would have to parse the HTML for META tags, look for Refresh: headers in the response, and evaluate Javascript. Good luck with the last one.

这篇关于从原始地址获取重定向的URL的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆