无法使用 puppeteer 获取现有对象内的 URL [英] unable to get URL inside existing object using puppeteer
本文介绍了无法使用 puppeteer 获取现有对象内的 URL的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!
问题描述
以下是我的 HTML :
<源源srcset="https://assets.myntassets.com/f_webp,dpr_1.0,q_60,w_210,c_limit,fl_progressive/assets/images/4318138/2018/5/4/11525433792765-HERENOW-Men-oundBlack-Printed-RT-shirt-2881525433792598-1.jpg ,https://assets.myntassets.com/f_webp,dpr_1.5,q_60,w_210,c_limit,fl_progressive/assets/images/4318138/2018/5/4/11525433792765-HERENOW-Men-oundBlack-Printed-RT-shirt-2881525433792598-1.jpg 1.5x,https://assets.myntassets.com/f_webp,dpr_1.8,q_60,w_210,c_limit,fl_progressive/assets/images/4318138/2018/5/4/11525433792765-HERENOW-Men-oundBlack-Printed-RT-shirt-2881525433792598-1.jpg 1.8x,https://assets.myntassets.com/f_webp,dpr_2.0,q_60,w_210,c_limit,fl_progressive/assets/images/4318138/2018/5/4/11525433792765-HERENOW-Men-oundBlack-Printed-RT-shirt-2881525433792598-1.jpg 2.0x,https://assets.myntassets.com/f_webp,dpr_2.2,q_60,w_210,c_limit,fl_progressive/assets/images/4318138/2018/5/4/11525433792765-HERENOW-Men-oundBlack-Printed-RT-shirt-2881525433792598-1.jpg 2.2x,https://assets.myntassets.com/f_webp,dpr_2.4,q_60,w_210,c_limit,fl_progressive/assets/images/4318138/2018/5/4/11525433792765-HERENOW-Men-oundBlack-Printed-RT-shirt-2881525433792598-1.jpg 2.4x,https://assets.myntassets.com/f_webp,dpr_2.6,q_60,w_210,c_limit,fl_progressive/assets/images/4318138/2018/5/4/11525433792765-HERENOW-Men-oundBlack-Printed-RT-shirt-2881525433792598-1.jpg 2.6x,https://assets.myntassets.com/f_webp,dpr_2.8,q_60,w_210,c_limit,fl_progressive/assets/images/4318138/2018/5/4/11525433792765-HERENOW-Men-oundBlack-Printed-RT-shirt-2881525433792598-1.jpg 2.8x"type=image/webp"><img src="https://assets.myntassets.com/dpr_2,q_60,w_210,c_limit,fl_progressive/assets/images/4318138/2018/5/4/11525465379-HERENOW-Men-Black-Printed-Round-Neck-T-shirt-2881525433792598-1.jpg"class="img-responsive";alt=HERE&NOW 男士黑色印花圆领 T 恤";title=HERE&NOW 男士黑色印花圆领 T 恤";样式=宽度:100%;显示:块;></图片></div></div></div></div><div类=product-productMetaInfo"><h3类=product-brand">HERE&NOW</h3><h4 class=product-product">男士印花圆领 T 恤</h4><h4 class="product-sizes"><!-- react-text: 396 -->Sizes: <!--/react-text --><span class="product-sizeInventoryPresent">S, <;/span><span class="product-sizeInventoryPresent">M, </span><span class="product-sizeInventoryPresent">L, </span><span class=";product-sizeInventoryPresent">XL,</span><span class=product-sizeInventoryPresent">XXL</span></h4><div class=product-price"><span><span class="product-discountedPrice"><!-- react-text: 405 -->Rs.<!--/react-text --><!-- react-text: 406 -->374<!--/react-text --></span><span class=产品罢工"><!--反应文本:408->Rs.<!--/react-text --><!-- react-text: 409 -->749<!--/react-text --></span></span><span class="product-discountPercentage">(50% OFF)</span></div></div></a><div class="image-grid-similarColorsCtaproduct-similarItemCta"><span class="myntraweb-sprite image-grid-similarColorsIcon sprites-similarProductsIcon"</span><span class="image-grid-iconText">VIEW SIMILAR</span></div><div class="product-actions><span class="product-actionsButton product-wishlist"样式=宽度:100%;text-align: center;"><!-- react-text: 416 -->wishlist<!--/react-text --></span></div><divclass="product-sizeDisplayDiv"><div class="product-sizeDisplayHeader"><span>选择尺寸</span><span class="myntraweb-sprite product-sizeDisplayRemoveMark sprites-remove";></span></div><div class="product-sizeButtonsContaier"><button class="product-sizeButton">S</button><button class=";product-sizeButton">M</button><button class=product-sizeButton">L</button><button class=product-sizeButton">XL</button>button class=product-sizeButton">XXL</button></div></div>";当前代码:
const res = await page.evaluate(() => {const productArry = [...document.querySelectorAll(.product-base")];返回 productArry.map((product) => {让 productSizeText = product.querySelector(".product-sizes").innerText;让 productSizeArr = productSizeText.replace("尺寸:", "").修剪().split(",");返回 {imageurl: product.querySelector("div>图片>.img-responsive").src,品牌名称:product.querySelector(.product-brand").innerText,产品名称:product.querySelector(.product-product").innerText,productSizes: productSizeArr,};});});
是不是由于延迟加载我在从上面的标签中获取 src 时出现空错误
解决方案 请尝试以下代码片段:
//获取所有源标签url让 imageURLArr = 等待 page.evaluate(() => {//这将获得DOM的第一个sourceTag,如果它有更多的源标签并且不是第一个源标签元素,则根据您要废弃的DOM更改值0让 sourceTag = document.getElementsByTagName('source')[0];//检查选择器是否存在如果(源标签){//这将为您提供源标签的所有图像 URL让 imagURLs = sourceTag.getAttribute('srcset')返回 imageURL;}});控制台日志(imageURLArr);//要获得产品品牌名称,您可以这样做等待 page.waitForSelector('h3');const BrandName = await page.evaluate(() => document.getElementsByClassName('product-brand').textContent);console.log('品牌名称 = ' + 品牌名称);//要获得产品尺寸,您可以这样做让 productSizes = await page.$$eval('.product-sizeInventoryPresent', elements => {让大小 = elements.map((element) => element.textContent);返回大小;});
following is my HTML :
<div class="product-thumbShim"></div><a target="_blank" href="tshirts/herenow/herenow-men-black-printed-round-neck-t-shirt/4318138/buy" style="display: block;"><div class="product-imageSliderContainer"><div class="product-sliderContainer" style="display: block;"><div style="background: rgb(244, 255, 249);"><div style="height: 280px; width: 100%;"><picture class="img-responsive" style="width: 100%; height: 100%; display: block;"><source srcset="
https://assets.myntassets.com/f_webp,dpr_1.0,q_60,w_210,c_limit,fl_progressive/assets/images/4318138/2018/5/4/11525433792765-HERENOW-Men-Black-Printed-Round-Neck-T-shirt-2881525433792598-1.jpg ,
https://assets.myntassets.com/f_webp,dpr_1.5,q_60,w_210,c_limit,fl_progressive/assets/images/4318138/2018/5/4/11525433792765-HERENOW-Men-Black-Printed-Round-Neck-T-shirt-2881525433792598-1.jpg 1.5x,
https://assets.myntassets.com/f_webp,dpr_1.8,q_60,w_210,c_limit,fl_progressive/assets/images/4318138/2018/5/4/11525433792765-HERENOW-Men-Black-Printed-Round-Neck-T-shirt-2881525433792598-1.jpg 1.8x,
https://assets.myntassets.com/f_webp,dpr_2.0,q_60,w_210,c_limit,fl_progressive/assets/images/4318138/2018/5/4/11525433792765-HERENOW-Men-Black-Printed-Round-Neck-T-shirt-2881525433792598-1.jpg 2.0x,
https://assets.myntassets.com/f_webp,dpr_2.2,q_60,w_210,c_limit,fl_progressive/assets/images/4318138/2018/5/4/11525433792765-HERENOW-Men-Black-Printed-Round-Neck-T-shirt-2881525433792598-1.jpg 2.2x,
https://assets.myntassets.com/f_webp,dpr_2.4,q_60,w_210,c_limit,fl_progressive/assets/images/4318138/2018/5/4/11525433792765-HERENOW-Men-Black-Printed-Round-Neck-T-shirt-2881525433792598-1.jpg 2.4x,
https://assets.myntassets.com/f_webp,dpr_2.6,q_60,w_210,c_limit,fl_progressive/assets/images/4318138/2018/5/4/11525433792765-HERENOW-Men-Black-Printed-Round-Neck-T-shirt-2881525433792598-1.jpg 2.6x,
https://assets.myntassets.com/f_webp,dpr_2.8,q_60,w_210,c_limit,fl_progressive/assets/images/4318138/2018/5/4/11525433792765-HERENOW-Men-Black-Printed-Round-Neck-T-shirt-2881525433792598-1.jpg 2.8x" type="image/webp"><img src="https://assets.myntassets.com/dpr_2,q_60,w_210,c_limit,fl_progressive/assets/images/4318138/2018/5/4/11525433792765-HERENOW-Men-Black-Printed-Round-Neck-T-shirt-2881525433792598-1.jpg" class="img-responsive" alt="HERE&NOW Men Black Printed Round Neck T-shirt" title="HERE&NOW Men Black Printed Round Neck T-shirt" style="width: 100%; display: block;"></picture></div></div></div></div><div class="product-productMetaInfo"><h3 class="product-brand">HERE&NOW</h3><h4 class="product-product">Men Printed Round Neck T-shirt</h4><h4 class="product-sizes"><!-- react-text: 396 -->Sizes: <!-- /react-text --><span class="product-sizeInventoryPresent">S, </span><span class="product-sizeInventoryPresent">M, </span><span class="product-sizeInventoryPresent">L, </span><span class="product-sizeInventoryPresent">XL, </span><span class="product-sizeInventoryPresent">XXL</span></h4><div class="product-price"><span><span class="product-discountedPrice"><!-- react-text: 405 -->Rs. <!-- /react-text --><!-- react-text: 406 -->374<!-- /react-text --></span><span class="product-strike"><!-- react-text: 408 -->Rs. <!-- /react-text --><!-- react-text: 409 -->749<!-- /react-text --></span></span><span class="product-discountPercentage">(50% OFF)</span></div></div></a><div class="image-grid-similarColorsCta product-similarItemCta"><span class="myntraweb-sprite image-grid-similarColorsIcon sprites-similarProductsIcon"></span><span class="image-grid-iconText">VIEW SIMILAR</span></div><div class="product-actions "><span class="product-actionsButton product-wishlist " style="width: 100%; text-align: center;"><!-- react-text: 416 -->wishlist<!-- /react-text --></span></div><div class="product-sizeDisplayDiv"><div class="product-sizeDisplayHeader"><span>Select a size</span><span class="myntraweb-sprite product-sizeDisplayRemoveMark sprites-remove"></span></div><div class="product-sizeButtonsContaier"><button class="product-sizeButton">S</button><button class="product-sizeButton">M</button><button class="product-sizeButton">L</button><button class="product-sizeButton">XL</button><button class="product-sizeButton">XXL</button></div></div>"
current code :
const res = await page.evaluate(() => {
const productArry = [...document.querySelectorAll(".product-base")];
return productArry.map((product) => {
let productSizeText = product.querySelector(".product-sizes").innerText;
let productSizeArr = productSizeText
.replace("Sizes:", "")
.trim()
.split(",");
return {
imageurl: product.querySelector("div > picture > .img-responsive")
.src,
brandName: product.querySelector(".product-brand").innerText,
productName: product.querySelector(".product-product").innerText,
productSizes: productSizeArr,
};
});
});
is it due to lazy-loading I am getting null error while getting src from the above tag
解决方案 Please try the below code snippets:
//To get all the source tag urls
let imageURLArr = await page.evaluate(() => {
//This will get the first sourceTag of the DOM, change the value 0 according to your DOM that you are scrapping if it has more source tags and is not the first source tag element
let sourceTag = document.getElementsByTagName('source')[0];
// check selector exists
if (sourceTag) {
// This will give you all the image URLs of source tag
let imagURLs = sourceTag.getAttribute('srcset')
return imagURLs;
}
});
console.log(imageURLArr);
//To get the product brand name you can do this
await page.waitForSelector('h3');
const brandName = await page.evaluate(() => document.getElementsByClassName('product-brand').textContent);
console.log('Brand Name = ' + brandName);
// To get the product Sizes you can do this
let productSizes = await page.$$eval('.product-sizeInventoryPresent', elements => {
let sizes = elements.map((element) => element.textContent);
return sizes;
});
这篇关于无法使用 puppeteer 获取现有对象内的 URL的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!
查看全文
相关文章
- Puppeteer 无法获取完整的源代码;
- 无法使用 Puppeteer 登录;
- 无法使用 puppeteer 获取页面的完全加载的 html;
- 如何使用 Puppeteer 在 div 内滚动?;
- 如何使用猫鼬更新数组内的现有对象;
- 如何使用Puppeteer从输入中删除现有文本?;
- 使用Puppeteer获取HTML属性的值;
- Puppeteer - 如何填写 iframe 内的表单?;
- 无法使用 puppeteer 单击不同的链接;
- 获取现有对象的成员/字段;
- Puppeteer:获取innerHTML;
- 使用 Puppeteer 在循环中抓取多个 URL;
- 获取SignalR轮毂内完整的URL;
- Puppeteer 返回空对象;
- 如果 URL 包含 puppeteer 中的函数;
- 使用Puppeteer从localStorage获取所有值;
- 如何使用Puppeteer-Sharp在IFRAME内填写表单;
- Puppeteer:获取内部HTML;
- Puppeteer - 获取父元素;
- 领域-无法使用现有的主键值创建对象;
- 如何使用 Puppeteer 获取请求的原始下载大小?;
- 获取对象的 Django 管理 url;
- 获取数据对象内div的内容;
- 无法识别地图函数内的对象;
- 使用现有对象插入新对象;
其他开发最新文章
- 拒绝显示一个框架,因为它将'X-Frame-Options'设置为'sameorigin';
- 什么是&QUOT; AW&QUOT;在部分标志属性是什么意思?;
- 在运行npm install命令时获取'npm WARN弃用'警告;
- cmake无法找到openssl;
- 从Spark的scala中的* .tar.gz压缩文件中读取HDF5文件;
- Twitter :: Error :: Forbidden - 无法验证您的凭据;
- 我什么时候需要一个fb:app_id或者fb:admins?;
- 将.db文件导入R;
- npm通知创建一个lockfile作为package-lock.json。你应该提交这个文件;
- 拒绝执行内联脚本,因为它违反了以下内容安全策略指令:“script-src'self'”;