如何抓取 javascript 哈希链接内容? [英] How to scrape javascript hash links content?

查看：39 发布时间：2021/6/23 18:59:55 javascript node.js web-scraping puppeteer

本文介绍了如何抓取 javascript 哈希链接内容?的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我在使用 Puppeter 进行网页抓取方面有点新，我目前面临下一个问题:

Hi im a bit new in web scraping using Puppeter im currently im facing the next problem:

在我试图提取信息的站点中，我有一个带有典型 js 分页的引导表，例如以下示例:https://getbootstrap.com/docs/4.1/components/pagination/

in the site where im trying to extract information i have a bootstrap table with a typical js pagination like the examples from: https://getbootstrap.com/docs/4.1/components/pagination/

当我用 Chrome 检查器检查页面 html 时，我只能看到 2，当我检查链接位置时，我看到

when i check the page html with Chrome inspector all i can see is 2 and when i check link location i see

https://webpage.com/works#

我怎么知道总共有多少页?我如何点击它们?我不明白如何访问这种类型的分页的每个页面.

how i can know how many pages are in total? and how i can click them? i don't understand how i can visit every page for this type of pagination.

谢谢！

推荐答案

使用属性 footerTemplate 和 displayHeaderFooter 来显示最初使用 puppeteer API 的页面

use attribute footerTemplate with displayHeaderFooter for show pages originally using puppeteer API

await page.pdf({
  path: 'hacks.pdf',
  format: 'A4',
  displayHeaderFooter: true,
  footerTemplate: '<div><div class='pageNumber'></div> <div>/</div><div class='totalPages'></div></div>'
});

https://github.com/puppeteer/puppeteer/blob/master/docs/api.md#pagepdfoptions

footerTemplate 用于打印页脚的 HTML 模板.

footerTemplate HTML template for the print footer.

//应该是有效的 HTML 标记，具有以下用于将打印值注入其中的 CSS 类:

// Should be valid HTML markup with following CSS classes used to inject printing values into them:

//- 日期格式化打印日期

// - date formatted print date

//- title 文档标题

//- url 文档位置

//- pageNumber 当前页码

//- totalPages 文档中的总页数

// - totalPages total pages in the document

这篇关于如何抓取 javascript 哈希链接内容?的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

如何抓取 javascript 哈希链接内容? [英] How to scrape javascript hash links content?

问题描述

推荐答案

相关文章

前端开发最新文章

热门教程

热门工具

登录关闭

如何抓取 javascript 哈希链接内容? [英] How to scrape javascript hash links content?

问题描述

推荐答案

相关文章

前端开发最新文章

热门教程

热门工具

登录 关闭

登录关闭