如何通过使用美丽的汤和python获取收藏夹图标 [英] How to get favicon by using beautiful soup and python

查看:54
本文介绍了如何通过使用美丽的汤和python获取收藏夹图标的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我写了一些愚蠢的代码只是为了学习,但不适用于任何站点. 这是代码:

I wrote some stupid code for learning just, but it doesn't work for any sites. here is the code:

import urllib2, re
from BeautifulSoup import BeautifulSoup as Soup

class Founder:
    def Find_all_links(self, url):
        page_source = urllib2.urlopen(url)
        a = page_source.read()
        soup = Soup(a)

        a = soup.findAll(href=re.compile(r'/.a\w+'))
        return a
    def Find_shortcut_icon (self, url):
        a = self.Find_all_links(url)
        b = ''
        for i in a:
            strre=re.compile('shortcut icon', re.IGNORECASE)
            m=strre.search(str(i))
            if m:
                b = i["href"]
        return b
    def Save_icon(self, url):
        url = self.Find_shortcut_icon(url)
        print url
        host = re.search(r'[0-9a-zA-Z]{1,20}\.[a-zA-Z]{2,4}', url).group()
        opener = urllib2.build_opener()
        icon = opener.open(url).read()
        file = open(host+'.ico', "wb")
        file.write(icon)
        file.close()
        print '%s icon successfully saved' % host
c = Founder()
print c.Save_icon('http://lala.ru')

最奇怪的是它适用于站点: http://habrahabr.ru http://5pd.ru

The most strange thing is it works for site: http://habrahabr.ru http://5pd.ru

但不适用于我检查过的大多数其他人.

But doesn't work for most others that I've checked.

推荐答案

您正在使它变得比所需的复杂得多.这是一种简单的方法:

You're making it far more complicated than it needs to be. Here's a simple way to do it:

import urllib
page = urllib.urlopen("http://5pd.ru/")
soup = BeautifulSoup(page)
icon_link = soup.find("link", rel="shortcut icon")
icon = urllib.urlopen(icon_link['href'])
with open("test.ico", "wb") as f:
    f.write(icon.read())

这篇关于如何通过使用美丽的汤和python获取收藏夹图标的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆