如何使用beautifulSoup提取和下载网站上的所有图像? [英] How to extract and download all images from a website using beautifulSoup?
本文介绍了如何使用beautifulSoup提取和下载网站上的所有图像?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!
问题描述
我正在尝试从URL中提取并下载所有图像. 我写了一个脚本
I am trying to extract and download all images from a url. I wrote a script
import urllib2
import re
from os.path import basename
from urlparse import urlsplit
url = "http://filmygyan.in/katrina-kaifs-top-10-cutest-pics-gallery/"
urlContent = urllib2.urlopen(url).read()
# HTML image tag: <img src="url" alt="some_text"/>
imgUrls = re.findall('img .*?src="(.*?)"', urlContent)
# download all images
for imgUrl in imgUrls:
try:
imgData = urllib2.urlopen(imgUrl).read()
fileName = basename(urlsplit(imgUrl)[2])
output = open(fileName,'wb')
output.write(imgData)
output.close()
except:
pass
我不想提取此页面的图像,请参见此图像 http://i. share.pho.to/1c9884b1_l.jpeg 我只想获取所有图像而无需单击下一步"按钮 我不知道如何获取下一个"课程中的所有图片.在findall中应该做哪些更改?
I don't want to extract image of this page see this image http://i.share.pho.to/1c9884b1_l.jpeg I just want to get all the images without clicking on "Next" button I am not getting how can I get the all pics within "Next" class.?What changes I should do in findall?
推荐答案
以下内容应从给定页面提取所有图像并将其写入运行脚本的目录.
The following should extract all images from a given page and write it to the directory where the script is being run.
import re
import requests
from bs4 import BeautifulSoup
site = 'http://pixabay.com'
response = requests.get(site)
soup = BeautifulSoup(response.text, 'html.parser')
img_tags = soup.find_all('img')
urls = [img['src'] for img in img_tags]
for url in urls:
filename = re.search(r'/([\w_-]+[.](jpg|gif|png))$', url)
if not filename:
print("Regex didn't match with the url: {}".format(url))
continue
with open(filename.group(1), 'wb') as f:
if 'http' not in url:
# sometimes an image source can be relative
# if it is provide the base url which also happens
# to be the site variable atm.
url = '{}{}'.format(site, url)
response = requests.get(url)
f.write(response.content)
这篇关于如何使用beautifulSoup提取和下载网站上的所有图像?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!
查看全文