使用 Apps Script 抓取 javascript 渲染的网页 [英] Using Apps Script to scrape javascript rendered web page

查看:29
本文介绍了使用 Apps Script 抓取 javascript 渲染的网页的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在努力将脚本放在一起来处理通过 Apps 脚本抓取 javascript 呈现的网页.在如何使用Javascript抓取Javascript呈现的网站?在这里找到了这个,但我不知道如何把它放在一起.比如负载木偶.任何帮助将不胜感激.

I am struggling to put a script together to handle the scraping of a javascript rendered web page through Apps Script. Found this How to scrape Javascript rendered websites using Javascript? here, but I don't know how to put this together. Such as load puppeteer. Any help would be appreciated.

推荐答案

您可以尝试抓取初始 HTML,因为实际上抓取呈现的 HTML 非常困难,您必须使用无头浏览器.

You can try to scrape the initial HTML, since actually scraping the rendered HTML is extremely hard to do, you'd have to use a headless browser.

>

有这个库:https://github.com/tautologistics/node-htmlparser 可用于从 JavaScript 解析 HTML,它位于 node 中,但由于它不使用任何依赖项,因此您只需复制和粘贴所需的函数即可.

There is this library: https://github.com/tautologistics/node-htmlparser which you can use to parse HTML from JavaScript, it is in node, but because it doesn't use any dependencies, you can just copy and paste the functions you need.

恐怕解析它不是一件容易的事.

Parsing it's not a very easy task I'm afraid.

这篇关于使用 Apps Script 抓取 javascript 渲染的网页的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆