节点js puppeteer元数据 [英] node js puppeteer metadata

查看:126
本文介绍了节点js puppeteer元数据的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我是Puppeteer的新手,我正尝试使用Node.JS和Puppeteer从网站中提取元数据.我只是似乎无法正确理解语法.下面的代码使用两种不同的方法以及段落标记中的文本,可以完美地提取Title标记.例如,如何提取名称为"description"的元数据的内容文本?

I am new to Puppeteer, and I am trying to extract meta data from a Web site using Node.JS and Puppeteer. I just can't seem to get the syntax right. The code below works perfectly extracting the Title tag, using two different methods, as well as text from a paragraph tag. How would I extract the content text for the meta data with the name of "description" for example?

meta name ="description" content =堆栈溢出量最大,依此类推"

meta name="description" content="Stack Overflow is the largest, etc"

如果有任何建议,我将不胜感激!我似乎在任何地方都找不到此示例(稍后需要5个小时的搜索和代码入侵).我的示例代码:

I would be seriously grateful for any suggestions! I can't seem to find any examples of this anywhere (5 hours of searching and code hacking later). My sample code:

const puppeteer = require('puppeteer');

async function main() {
  const browser = await puppeteer.launch({headless: false});
  const page = await browser.newPage();
  await page.goto('https://stackoverflow.com/', {waitUntil: 'networkidle2'});

  const pageTitle1 = await page.evaluate(() => document.querySelector('title').textContent);
  const pageTitle2 = await page.title();
  const innerText = await page.evaluate(() => document.querySelector('p').innerText);
  console.log(pageTitle1);
  console.log(pageTitle2);
  console.log(innerText);
};  

main();

推荐答案

您需要有关CSS选择器的深入教程

You need a deep tutorial for CSS selectors MDN CSS Selectors.

我强烈建议您在将要应用自动化的页面上直接在控制台上测试选择器,这将节省数小时的运行-停止系统.试试这个:

Something that I highly recommend is testing your selectors on the console directly in the page you will apply the automation, this will save hours of running-stop your system. Try this:

document.querySelectorAll("head > meta[name='description']")[0].content;

现在为puppeteer使用,您需要复制该选择器并通过puppeteer函数使用,我更喜欢这种表示法:

Now for puppeteer, you need to copy that selector and past on puppeteer function also I like more this notation:

await page.$eval("head > meta[name='description']", element => element.content);

还有其他问题或疑问吗?

Any other question or problem just comment.

这篇关于节点js puppeteer元数据的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆