如何从所有这些元素创建一个 CSV 文件? [英] How do I create a CSV file from all of these elements?

查看:57
本文介绍了如何从所有这些元素创建一个 CSV 文件?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试从这两个部分获取文本并将其转换为来自 puppeteer 的 CSV 列表:

商品编号:(商品 1055688)

价格:(16.59 美元)

这是我尝试过的,但似乎无法找到 SKU,例如:

let elements = await.self.page.$$('div[class="row item-row"]');for(让元素的元素){let sku = await element.$eval(('div[class="body-copy custom-body-copy"]'), node => node.innerText.trim());}

这是我试图从中提取数据的代码:

<div class="product_desc_txt"><a href=" https://www.costcobusinessdelivery.com/.product.1055688.html" class="body-copy-link">品客薯片零食包,原味,0.67 盎司,60 克拉</a><div class="body-copy custom-body-copy">项目&nbsp;1055688

<div class="margin_tp_10"></div><div class="body-copy hidden visible-md可见-sm可见-xs可见-lg"><span data-wishlist-linkfee="false" >$16.59

<div class="col-xl-2 col-lg-2 body-copy text-right hidden visible-xl"><span data-wishlist-linkfee="false" >$16.59

这是我目前的代码:

const puppeteer = require("puppeteer-extra")const pluginStealth = require("puppeteer-extra-plugin-stealth")puppeteer.use(pluginStealth())puppeteer.launch({ headless: false }).then(async browser => {const page = await browser.newPage()等待 page.setViewport({ 宽度:1920,高度:1080 })await page.goto("https://www.costcobusinessdelivery.com")等待页面.waitFor(5000);await page.waitForSelector("#header_sign_in");等待 page.click("#header_sign_in");等待 page.waitForSelector("#logonId");等待 page.type('#logonId', 'username', {delay: 20});等待 page.type('#logonPassword_id', 'password', {delay: 20});等待 page.type('#deliveryZipCode', 'zipcode', {delay: 20});等待 page.click('#sign_in_button');await page.waitForSelector('body > div.bd-specific > div > div > div > div > div > ul > li.set-zip-code.left-lg.colo-md-5.zipped > ul > li:nth-child(1) > a');await page.click('body > div.bd-specific > div > div > div > div > div > ul > li.set-zip-code.left-lg.colo-md-5.zipped > ul > li:nth-child(1) > a');await page.waitForSelector('#tiles-body-attribute > div:nth-child(2) > div.myaccount-lists > div > div:nth-child(2) > div > span >h5>a');await page.click('#tiles-body-attribute > div:nth-child(2) > div.myaccount-lists > div > div:nth-child(2) > div > span >h5>a');

我是 puppeteer 的新手,所以我不确定我这样做是否正确,任何帮助或指导将不胜感激.谢谢!

解决方案

我想你的页面结构类似于 这个

在这种情况下,您可以使用以下代码:

//查找产品说明const csv = await page.$$eval('.product_desc_txt', function(products){//迭代产品描述让 csvLines = products.map(function(product){//在每个产品中查找产品 SKU 及其价格让 productId = product.querySelector(".custom-body-copy").innerText.trim();让 productPrice = product.querySelector("span[data-wishlist-linkfee]").innerText.trim();//将它们格式化为 csv 行返回`${productId};${productPrice}`})//将所有行合并为一个文件返回 csvLines.join("\n");});

这个带有链接的 HTML 结构的代码产生了这个:

<块引用>

商品 1055688;16.59 美元
商品 1055688;16.59 美元
商品 1055688;16.59 美元
商品 1055688;16.59 美元

<小时>

使用箭头函数重写它的更紧凑的方法如下(虽然我认为它不太可读)

const csv = await page.$$eval('.product_desc_txt', products => products.map(product => product.querySelector(".custom-body-copy").innerText.trim() + ";" + product.querySelector("span[data-wishlist-linkfee]").innerText.trim()).join("\n"));

I am trying to get the text from both of these sections and turn it into a CSV list from puppeteer:

item number: (Item 1055688)

price: ( $16.59)

here's what I tried but it doesn't seem to work to find the SKU for example:

let elements = await.self.page.$$('div[class="row item-row"]');
for (let element of elements) {
    let sku = await element.$eval(('div[class="body-copy custom-body- 
copy"]'), node => node.innerText.trim());
}

Here is the code I am trying to extract the data from:

<div class="col-xl-3 col-lg-3 col-md-6 col-sm-8 col-xs-6">
<div class="product_desc_txt">

    <a href=" https://www.costcobusinessdelivery.com/.product.1055688.html 
" class="body-copy-link">
        Pringles Snack Pack Potato Crisps, Original, 0.67 oz, 60 ct
    </a>
    <div class="body-copy custom-body-copy">
       Item&nbsp;1055688
    </div>

    <div class="margin_tp_10"></div>

    <div class="body-copy hidden visible-md visible-sm visible-xs 
visible-lg">

        <span  data-wishlist-linkfee="false" > $16.59</span>

    </div>

</div>
</div>
<div class="col-xl-2 col-lg-2 body-copy text-right hidden visible-xl ">

<span  data-wishlist-linkfee="false" > $16.59</span>


</div>

Here is my code so far:

const puppeteer = require("puppeteer-extra")

const pluginStealth = require("puppeteer-extra-plugin-stealth")
puppeteer.use(pluginStealth())

puppeteer.launch({ headless: false }).then(async browser => {
const page = await browser.newPage()
await page.setViewport({ width: 1920, height: 1080 })
await page.goto("https://www.costcobusinessdelivery.com")
await page.waitFor(5000);
await page.waitForSelector("#header_sign_in");
await page.click("#header_sign_in");
await page.waitForSelector("#logonId");

await page.type('#logonId', 'username', {delay: 20});
await page.type('#logonPassword_id', 'password', {delay: 20});
await page.type('#deliveryZipCode', 'zipcode', {delay: 20});
await page.click('#sign_in_button');

await page.waitForSelector('body > div.bd-specific > div > div > div > div > div > ul > li.set-zip-code.left-lg.colo-md-5.zipped > ul > li:nth-child(1) > a');
await page.click('body > div.bd-specific > div > div > div > div > div > ul > li.set-zip-code.left-lg.colo-md-5.zipped > ul > li:nth-child(1) > a');
await page.waitForSelector('#tiles-body-attribute > div:nth-child(2) > div.myaccount-lists > div > div:nth-child(2) > div > span > h5 > a');
await page.click('#tiles-body-attribute > div:nth-child(2) > div.myaccount-lists > div > div:nth-child(2) > div > span > h5 > a');

I am new to puppeteer so I am not sure if I am doing this right at all, any help or guidance would be appreciated. Thank you!

解决方案

I suppose the structure of your page is similar to this one

In this case you could use the following code:

// Find product descriptions
const csv = await page.$$eval('.product_desc_txt', function(products){

    // Iterate over product descriptions
    let csvLines = products.map(function(product){

        // Inside of each product find product SKU and its price
        let productId = product.querySelector(".custom-body-copy").innerText.trim();
        let productPrice = product.querySelector("span[data-wishlist-linkfee]").innerText.trim();

        // Fomrat them as a csv line
        return `${productId};${productPrice}`
    })

    // Join all lines into one file
    return csvLines.join("\n");

});

This code with the linked HTML structure produces this:

Item 1055688;$16.59
Item 1055688;$16.59
Item 1055688;$16.59
Item 1055688;$16.59


A more compact way to rewrite that with arrow functions would be the following (although I don't think it's very readable)

const csv = await page.$$eval('.product_desc_txt', products => products.map(product => product.querySelector(".custom-body-copy").innerText.trim() + ";" + product.querySelector("span[data-wishlist-linkfee]").innerText.trim()).join("\n"));

这篇关于如何从所有这些元素创建一个 CSV 文件?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
相关文章
前端开发最新文章
热门教程
热门工具
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆