BeautifulSoup:提取IMG ALT数据 [英] BeautifulSoup: Extract img alt data

查看:775
本文介绍了BeautifulSoup:提取IMG ALT数据的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有下面的图像的HTML和我试图解析在中高音的信息。目前,我能够成功提取图像。

I have following image html and I am trying to parse information that is in alt. Currently I am able to successfully extract images.

HTML(我目前解析

html (What I currently parse

<img class="rslp-p" alt="Sony Cyber-shot DSC-W570 16.1 MP Digital Camera - Silver" src="http://i.ebayimg.com/00/$(KGrHqZ,!j!E5dyh0jTpBO(3yE7Wg!~~_26.JPG?set_id=89040003C1" itemprop="image" />

我从我解析构建映像名称:

I construct the image name from what I parse:

当前code

def main(url, output_folder="~/images"):
         """Download the images at url"""
         soup = bs(urlopen(url))
         parsed = list(urlparse.urlparse(url))
         count = 0
         for image in soup.findAll("img"):
             print image
             count += 1
             print count
             print "Image: %(src)s" % image
             image_url = urlparse.urljoin(url, image['src'])
             filename = image["src"].split("/")[-1].split("?")[0].replace("$",'').replace(".JPG",".jpg").replace("~~_26",str(count)).lstrip("(")
             parsed[2] = image["src"]
             outpath = os.path.join(output_folder, filename)
             urlretrieve(image_url, outpath)

我想这样做的是提取物

What I would like to do is extract is

alt="Sony Cyber-shot DSC-W570 16.1 MP Digital Camera - Silver"

我也想用ALT数据作为文件名时,我提取图像。

also I want to use alt data as the file name when I extract the image.

推荐答案

在你的循环,你可以通过简单地做获得

Inside your for loop, you can obtain that by simply doing

image.get('alt', '')

这是在 BeautifulSoup的文档解释(标签)的属性。

This is explained in BeautifulSoup's documentation ("The attributes of Tags").

这篇关于BeautifulSoup:提取IMG ALT数据的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆