如何使用 python 取消缩短 URL? [英] How can I un-shorten a URL using python?
问题描述
我已经看过这个帖子 - 如何取消缩短网址?
I have seen this thread already - How can I unshorten a URL?
我对已解决答案的问题(即使用 unshort.me API)是我专注于取消缩短 youtube 链接.由于 unshort.me 很容易使用,这会返回几乎 90% 的带有验证码的结果,我无法解决.
My issue with the resolved answer (that is using the unshort.me API) is that I am focusing on unshortening youtube links. Since unshort.me is used readily, this returns almost 90% of the results with captchas which I am unable to resolve.
到目前为止,我坚持使用:
So far I am stuck with using:
def unshorten_url(url):
resolvedURL = urllib2.urlopen(url)
print resolvedURL.url
#t = Test()
#c = pycurl.Curl()
#c.setopt(c.URL, 'http://api.unshort.me/?r=%s&t=xml' % (url))
#c.setopt(c.WRITEFUNCTION, t.body_callback)
#c.perform()
#c.close()
#dom = xml.dom.minidom.parseString(t.contents)
#resolvedURL = dom.getElementsByTagName("resolvedURL")[0].firstChild.nodeValue
return resolvedURL.url
注意:评论中的所有内容都是我在使用返回验证码链接的 unshort.me 服务时尝试做的.
Note: everything in the comments is what I tried to do when using the unshort.me service which was returning captcha links.
有没有人知道一种更有效的方法来完成这个操作而不使用 open(因为它浪费带宽)?
Does anyone know of a more efficient way to complete this operation without using open (since it is a waste of bandwidth)?
推荐答案
在该问题中使用评分最高的答案(不是接受的答案):
Use the best rated answer (not the accepted answer) in that question:
# This is for Py2k. For Py3k, use http.client and urllib.parse instead, and
# use // instead of / for the division
import httplib
import urlparse
def unshorten_url(url):
parsed = urlparse.urlparse(url)
h = httplib.HTTPConnection(parsed.netloc)
resource = parsed.path
if parsed.query != "":
resource += "?" + parsed.query
h.request('HEAD', resource )
response = h.getresponse()
if response.status/100 == 3 and response.getheader('Location'):
return unshorten_url(response.getheader('Location')) # changed to process chains of short urls
else:
return url
这篇关于如何使用 python 取消缩短 URL?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!