从Beautifulsoup标签提取src [英] Extracting src from Beautifulsoup Tag

查看:104
本文介绍了从Beautifulsoup标签提取src的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正尝试使用beautifulsoup刮取newegg的产品名称,描述,价格和图像.我有以下bs4.element.Tag类型,我想从标记中提取"src"链接.以下是我的标签:

I was trying to scrape newegg for product name, description, price and image using beautifulsoup. I have got following bs4.element.Tag type and I want to extract "src" link from tag. Following is my tag:

df = <a class="itemImage" href="http://www.newegg.com/Product/Product.aspx?Item=N82E16875169194&amp;cm_re=Samsung_edge-_-75-169-194-_-Product" id="img_75-169-194" title='Samsung Galaxy S7 Edge Dual SIM Unlocked Smart Phone, Dual Edge 5.5" AMOLED Display, black Color, 32GB Storage 4GB RAM International Version - No US Warranty'>\n<img alt='Samsung Galaxy S7 Edge Dual SIM Unlocked Smart Phone, Dual Edge 5.5" AMOLED Display, black Color, 32GB Storage 4GB RAM International Version - No US Warranty' src="http://images10.newegg.com/ProductImageCompressAll200/75-169-194-04.jpg" title='Samsung Galaxy S7 Edge Dual SIM Unlocked Smart Phone, Dual Edge 5.5" AMOLED Display, black Color, 32GB Storage 4GB RAM International Version - No US Warranty'/>\n</a>

我如何提取

src="http://images10.newegg.com/ProductImageCompressAll200/75-169-194-04.jpg"

从这个标签?我尝试过

df.attrs['src']

但是我收到了Keyerror.

but I received Keyerror.

推荐答案

src在 img 标记中:

from bs4 import BeautifulSoup
tag = """<a class="itemImage" href="http://www.newegg.com/Product/Product.aspx?Item=N82E16875169194&amp;cm_re=Samsung_edge-_-75-169-194-_-Product" id="img_75-169-194" title='Samsung Galaxy S7 Edge Dual SIM Unlocked Smart Phone, Dual Edge 5.5" AMOLED Display, black Color, 32GB Storage 4GB RAM International Version - No US Warranty'>\n<img alt='Samsung Galaxy S7 Edge Dual SIM Unlocked Smart Phone, Dual Edge 5.5" AMOLED Display, black Color, 32GB Storage 4GB RAM International Version - No US Warranty' src="http://images10.newegg.com/ProductImageCompressAll200/75-169-194-04.jpg" title='Samsung Galaxy S7 Edge Dual SIM Unlocked Smart Phone, Dual Edge 5.5" AMOLED Display, black Color, 32GB Storage 4GB RAM International Version - No US Warranty'/>\n</a>"""

soup = BeautifulSoup(tag,"lxml")

src = soup.img["src"]

哪个会给你:

http://images10.newegg.com/ProductImageCompressAll200/75-169-194-04.jpg

这篇关于从Beautifulsoup标签提取src的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆