使用没有HTML'img'标签的Beautifulsoup下载图像 [英] Downloading Images with Beautifulsoup without HTML 'img' tag

查看:105
本文介绍了使用没有HTML'img'标签的Beautifulsoup下载图像的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在使用beautifulsoup从给定的网站中查找和下载图像,但是该网站包含的图像不是通常的<img src="icon.gif"/>格式:

Im using beautifulsoup to find and download images from a given website, however the website contains images which aren't in the usual <img src="icon.gif"/> format:

例如导致我出现问题的问题是这样的:

The ones that are causing me problems for example are like this :

<form action="example.jpg">

<!-- <img src="big.jpg" /> -->

background-image:url("xine.png");

我找到图片的代码是:

webpage = "https://example.com/images/"
soup = BeautifulSoup(urlopen(webpage), "html.parser")

for img in soup.find_all('img'):
    img_url = urljoin(webpage, img['src'])
    file_name = img['src'].split('/')[-1]
    file_path = os.path.join("C:\\users\\images", file_name)
    urlretrieve(img_url, file_path)

我认为我可能必须使用正则表达式,但希望我不必使用.

I think i might have to use a regex but hopefully i don't have to.

预先感谢

推荐答案

修改您传递给 urlretrieve 指定要复制文件的确切位置:

Modify the path you pass to urlretrieve to specify exactly where you want the file to be copied to:

file_path = os.path.join('c:\files\cw\downloads', file_name)
urlretrieve(img_url, file_path)

看来您也在尝试在注释内找到img标记.基于使用python 查找HTML代码中的特定注释:

It looks like you are also trying to find img tags inside comments. Building off of Find specific comments in HTML code using python:

...
imgs = soup.find_all('img')
comments = soup.findAll(text=lambda text:isinstance(text, bs4.Comment))
for comment in comments:
    comment_soup = bs4.BeautifulSoup(comment)
    imgs.extend(comment_soup.findAll('img'))

for img in imgs:
    ...

这篇关于使用没有HTML'img'标签的Beautifulsoup下载图像的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆