想用puppeteer.js刮桌子。如何获取所有行,遍历行然后获取“td”的行。对于每一行 [英] Want to scrape table using puppeteer.js. How can I get all rows, iterate through rows and then get "td's" for each row

查看:431
本文介绍了想用puppeteer.js刮桌子。如何获取所有行,遍历行然后获取“td”的行。对于每一行的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有puppeteer js设置并且能够使用

I have puppeteer js setup and was able get all rows using

let rows = await page.$$eval('#myTable tr', row => row);

现在我想让每一行获得td,然后从那些行获取内部文本。

Now I want for each row to get "td's" and then get inner text from those.

基本上我想这样做:

var tds = myRow.querySelectorAll("td");

其中myRow是一个表行,有puppeteer.js

where myRow is a table row, with puppeteer.js

推荐答案

实现此目的的一种方法是使用evaluate首先获取所有 TD的的数组,然后返回textContent每个 TD

One way to achieve this is to use evaluate that first gets an array of all the TD's then returns the textContent of each TD

const puppeteer = require('puppeteer');

const html = `
<html>
    <body>
      <table>
      <tr><td>One</td><td>Two</td></tr>
      <tr><td>Three</td><td>Four</td></tr>
      </table>
    </body>
</html>`;

(async () => {
  const browser = await puppeteer.launch();
  const page = await browser.newPage();
  await page.goto(`data:text/html,${html}`);

  const data = await page.evaluate(() => {
    const tds = Array.from(document.querySelectorAll('table tr td'))
    return tds.map(td => td.innerHTML)
  });

  //You will now have an array of strings
  //[ 'One', 'Two', 'Three', 'Four' ]
  console.log(data);
  //One
  console.log(data[0]);
  await browser.close();
})();

您还可以使用以下内容: -

You could also use something like:-

const data = await page.$$eval('table tr td', tds => tds.map((td) => {
  return td.innerHTML;
}));

//[ 'One', 'Two', 'Three', 'Four' ]
console.log(data);

这篇关于想用puppeteer.js刮桌子。如何获取所有行,遍历行然后获取“td”的行。对于每一行的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆