BeautifulSoup:提取 img alt 数据 [英] BeautifulSoup: Extract img alt data

查看:42
本文介绍了BeautifulSoup:提取 img alt 数据的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有以下图片 html,我正在尝试解析 alt 中的信息.目前我能够成功提取图像.

I have following image html and I am trying to parse information that is in alt. Currently I am able to successfully extract images.

html(我目前解析的内容

html (What I currently parse

<img class="rslp-p" alt="Sony Cyber-shot DSC-W570 16.1 MP Digital Camera - Silver" src="http://i.ebayimg.com/00/$(KGrHqZ,!j!E5dyh0jTpBO(3yE7Wg!~~_26.JPG?set_id=89040003C1" itemprop="image" />

我根据解析的内容构造图像名称:

I construct the image name from what I parse:

当前代码

def main(url, output_folder="~/images"):
         """Download the images at url"""
         soup = bs(urlopen(url))
         parsed = list(urlparse.urlparse(url))
         count = 0
         for image in soup.findAll("img"):
             print image
             count += 1
             print count
             print "Image: %(src)s" % image
             image_url = urlparse.urljoin(url, image['src'])
             filename = image["src"].split("/")[-1].split("?")[0].replace("$",'').replace(".JPG",".jpg").replace("~~_26",str(count)).lstrip("(")
             parsed[2] = image["src"]
             outpath = os.path.join(output_folder, filename)
             urlretrieve(image_url, outpath)

我想做的是提取

alt="Sony Cyber-shot DSC-W570 16.1 MP Digital Camera - Silver"

我还想在提取图像时使用 alt 数据作为文件名.

also I want to use alt data as the file name when I extract the image.

推荐答案

在你的 for 循环中,你可以通过简单地执行

Inside your for loop, you can obtain that by simply doing

image.get('alt', '')

这在 BeautifulSoup 的文档 中有说明(标签的属性").

This is explained in BeautifulSoup's documentation ("The attributes of Tags").

这篇关于BeautifulSoup:提取 img alt 数据的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆