想要使用Puppeteer抓取表格.如何获取所有行,遍历行,然后获取"td"每行? [英] Want to scrape table using Puppeteer. How can I get all rows, iterate through rows, and then get "td's" for each row?

查看:714
本文介绍了想要使用Puppeteer抓取表格.如何获取所有行,遍历行,然后获取"td"每行?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我已经设置了Puppeteer,并且可以使用以下命令获取所有行:

I have Puppeteer setup, and I was able get all of the rows using:

let rows = await page.$$eval('#myTable tr', row => row);

现在,我希望每一行都获得"td",然后从中获取innerText.

Now I want for each row to get "td's" and then get the innerText from those.

基本上我想这样做:

var tds = myRow.querySelectorAll("td");

其中myRow是带有Puppeteer的表行.

Where myRow is a table row, with Puppeteer.

推荐答案

实现此目的的一种方法是使用评估,该评估首先获取所有TD's的数组,然后返回每个TD

One way to achieve this is to use evaluate that first gets an array of all the TD's then returns the textContent of each TD

const puppeteer = require('puppeteer');

const html = `
<html>
    <body>
      <table>
      <tr><td>One</td><td>Two</td></tr>
      <tr><td>Three</td><td>Four</td></tr>
      </table>
    </body>
</html>`;

(async () => {
  const browser = await puppeteer.launch();
  const page = await browser.newPage();
  await page.goto(`data:text/html,${html}`);

  const data = await page.evaluate(() => {
    const tds = Array.from(document.querySelectorAll('table tr td'))
    return tds.map(td => td.innerText)
  });

  //You will now have an array of strings
  //[ 'One', 'Two', 'Three', 'Four' ]
  console.log(data);
  //One
  console.log(data[0]);
  await browser.close();
})();

您还可以使用以下内容:-

You could also use something like:-

const data = await page.$$eval('table tr td', tds => tds.map((td) => {
  return td.innerText;
}));

//[ 'One', 'Two', 'Three', 'Four' ]
console.log(data);

这篇关于想要使用Puppeteer抓取表格.如何获取所有行,遍历行,然后获取"td"每行?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆