如何从 JavaScript 呈现的响应式页面下载最高分辨率的图像? [英] How do I download the highest resolution image from a JavaScript rendered responsive page?

查看:49
本文介绍了如何从 JavaScript 呈现的响应式页面下载最高分辨率的图像?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

假设这是网站页面:"https://www.dior.com/en_us/products/couture-943C105A4655_C679-technical-fabric-cargo-pants-covered-in-tulle",我想从中下载所有所展示产品的图片(在本例中为 4 张图片).

Suppose this is the website page: "https://www.dior.com/en_us/products/couture-943C105A4655_C679-technical-fabric-cargo-pants-covered-in-tulle", from which I want to download all the images of the product showcased (4 images in this case).

我正在使用 Selenium 并提取图像链接.问题是,如果我单击它们甚至 2000x3000 像素大的图像,但我只能获得 480 像素分辨率的图像.这些图像存储在哪里?我如何提取它们?(基本上我想下载这些图像的最大可能大小)

I am using Selenium and extracting image links. The problem is if I click the images they are even 2000x3000 pixels big, but I am only able to get 480 around pixels resolution images of them. Where are these images stored? How do I extract them? ( basically I want to download the maximum possible size of those images )

推荐答案

在您提供的页面源代码中,有提供页面链接和内容的 json 数据.一旦从源代码中的脚本中剥离数据,就可以轻松检索高分辨率链接并下载图像.如果您还没有,pip install requestspip install bs4.

Withing the source code of the page you provided, there is json data that provides the links and content for the page. Once the data is stripped from the script in the source code, it is easy to retrieve the high resolution links and download the image. If you have not already, pip install requests and pip install bs4.

import requests, re, json
from bs4 import BeautifulSoup

url = 'https://www.dior.com/en_us/products/couture-943C105A4655_C679-technical-fabric-cargo-pants-covered-in-tulle'

r = requests.get(url)
soup = BeautifulSoup(r.text, 'lxml')
script = [script.text for script in soup.find_all('script') if 'window.initialState' in script.text][0]
json_data_s = re.search(r'{.+}', script).group(0)
json_data = json.loads(json_data_s)
for holder in json_data['CONTENT']['cmsContent']['elements']:
    if holder.get('type') == 'PRODUCTMEDIAS':
        for image in holder['items']:
            name = image['galleryImages']['imageZoom']['viewCode']
            img_src = image['galleryImages']['imageZoom']['uri']
            image_page = requests.get(img_src)
            with open(name + '.jpg', 'wb') as img:
                img.write(image_page.content)

*您之前下载的图片是缩略图.

*The images you were downloading before were the thumbnail photos.

这篇关于如何从 JavaScript 呈现的响应式页面下载最高分辨率的图像?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆