使用python从HTML页面源下载图像文件? [英] Download image file from the HTML page source using python?

查看:54
本文介绍了使用python从HTML页面源下载图像文件?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在编写一个从 HTML 页面下载所有图像文件并将它们保存到特定文件夹的抓取工具.所有图片都是 HTML 页面的一部分.

I am writing a scraper that downloads all the image files from a HTML page and saves them to a specific folder. all the images are the part of the HTML page.

推荐答案

这里是一些代码,用于从提供的 URL 下载所有图像,并将它们保存在指定的输出文件夹中.您可以根据自己的需要对其进行修改.

Here is some code to download all the images from the supplied URL, and save them in the specified output folder. You can modify it to your own needs.

"""
dumpimages.py
    Downloads all the images on the supplied URL, and saves them to the
    specified output file ("/test/" by default)

Usage:
    python dumpimages.py http://example.com/ [output]
"""
from bs4 import BeautifulSoup as bs
from urllib.request import (
    urlopen, urlparse, urlunparse, urlretrieve)
import os
import sys

def main(url, out_folder="/test/"):
    """Downloads all the images at 'url' to /test/"""
    soup = bs(urlopen(url))
    parsed = list(urlparse(url))

    for image in soup.findAll("img"):
        print("Image: %(src)s" % image)
        filename = image["src"].split("/")[-1]
        parsed[2] = image["src"]
        outpath = os.path.join(out_folder, filename)
        if image["src"].lower().startswith("http"):
            urlretrieve(image["src"], outpath)
        else:
            urlretrieve(urlunparse(parsed), outpath)

def _usage():
    print("usage: python dumpimages.py http://example.com [outpath]")

if __name__ == "__main__":
    url = sys.argv[-1]
    out_folder = "/test/"
    if not url.lower().startswith("http"):
        out_folder = sys.argv[-1]
        url = sys.argv[-2]
        if not url.lower().startswith("http"):
            _usage()
            sys.exit(-1)
    main(url, out_folder)

您现在可以指定输出文件夹.

You can specify the output folder now.

这篇关于使用python从HTML页面源下载图像文件?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆