如何使用python解压缩URL？ [英] How can I un-shorten a URL using python?

查看：208 发布时间：2017/3/5 22:14:52 python curl youtube hyperlink urllib

本文介绍了如何使用python解压缩URL？的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我的问题与解决的答案（即使用unshort.me API）是我专注于不收缩youtube链接。由于unshort.me很容易使用，这会返回几乎90％的结果与我无法解决的验证码。

到目前为止我一直在使用：

  def unshorten_url url）：
 resolvedURL = urllib2.urlopen（url）
 print resolvedURL.url 
 
 #t = Test（）
 #c = pycurl.Curl b $ b＃c.setopt（c.URL，'http://api.unshort.me/?r=%s&t=xml'％（url））
＃c.setopt（c.WRITEFUNCTION ，t.body_callback）
＃c.perform（）
＃c.close（）
 #dom = xml.dom.minidom.parseString（t.contents）
 #resolvedURL = dom.getElementsByTagName（resolvedURL）[0] .firstChild.nodeValue 
 return resolvedURL.url

注意：注释中的一切都是我尝试做的，当使用unshort.me服务，它返回验证码链接。

有没有人知道更有效的方式来完成这个操作而不使用开放（因为这是浪费带宽）？

 <$> 
 c $ c>＃这是为Py2k。对于Py3k，使用http.client和urllib.parse，而
＃使用//而不是/为分部
 import httplib 
 import urlparse 
 
 def unshorten_url （url）：
 parsed = urlparse.urlparse（url）
h = httplib.HTTPConnection（parsed.netloc）
 resource = parsed.path 
如果parsed.query！= ：
 resource + =？ + parsed.query 
 h.request（'HEAD'，resource）
 response = h.getresponse（）
如果response.status / 100 == 3和response.getheader ）：
 return unshorten_url（response.getheader（'Location'））＃更改为处理短链的链
 else：
 return url

I have seen this thread already - How can I unshorten a URL using python?

My issue with the resolved answer (that is using the unshort.me API) is that I am focusing on unshortening youtube links. Since unshort.me is used readily, this returns almost 90% of the results with captchas which I am unable to resolve.

So far I am stuck with using:

def unshorten_url(url):
    resolvedURL = urllib2.urlopen(url)  
    print resolvedURL.url

    #t = Test()
    #c = pycurl.Curl()
    #c.setopt(c.URL, 'http://api.unshort.me/?r=%s&t=xml' % (url))
    #c.setopt(c.WRITEFUNCTION, t.body_callback)
    #c.perform()
    #c.close()
    #dom = xml.dom.minidom.parseString(t.contents)
    #resolvedURL = dom.getElementsByTagName("resolvedURL")[0].firstChild.nodeValue
    return resolvedURL.url

Note: everything in the comments is what I tried to do when using the unshort.me service which was returning captcha links.

Does anyone know of a more efficient way to complete this operation without using open (since it is a waste of bandwidth)?

解决方案

Use the best rated answer (not the accepted answer) in that question:

# This is for Py2k.  For Py3k, use http.client and urllib.parse instead, and
# use // instead of / for the division
import httplib
import urlparse

def unshorten_url(url):
    parsed = urlparse.urlparse(url)
    h = httplib.HTTPConnection(parsed.netloc)
    resource = parsed.path
    if parsed.query != "":
        resource += "?" + parsed.query
    h.request('HEAD', resource )
    response = h.getresponse()
    if response.status/100 == 3 and response.getheader('Location'):
        return unshorten_url(response.getheader('Location')) # changed to process chains of short urls
    else:
        return url

这篇关于如何使用python解压缩URL？的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

如何使用python解压缩URL？ [英] How can I un-shorten a URL using python?

问题描述

相关文章

Python最新文章

热门教程

热门工具

登录关闭

如何使用python解压缩URL？ [英] How can I un-shorten a URL using python?

问题描述

相关文章

Python最新文章

热门教程

热门工具

登录 关闭

登录关闭