使用Apps脚本抓取javascript呈现的网页 [英] Using Apps Script to scrape javascript rendered web page

查看:111
本文介绍了使用Apps脚本抓取javascript呈现的网页的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在努力将一个脚本放在一起,以处理通过Apps脚本抓取JavaScript呈现的网页的过程.在这里找到这个如何使用Javascript抓取Javascript呈现的网站? ,但我不知道如何将它们放在一起.如负载p.任何帮助将不胜感激.

解决方案

您可以尝试抓取初始HTML,因为实际上抓取呈现的HTML非常困难,因此您必须使用无头浏览器.

有这个库: https://github.com/tautologistics/node-htmlparser可以用来从JavaScript解析HTML,它位于节点中,但是由于它不使用任何依赖关系,因此您只需复制并粘贴所需的函数即可.

恐怕这不是一件容易的事.

I am struggling to put a script together to handle the scraping of a javascript rendered web page through Apps Script. Found this How to scrape Javascript rendered websites using Javascript? here, but I don't know how to put this together. Such as load puppeteer. Any help would be appreciated.

解决方案

You can try to scrape the initial HTML, since actually scraping the rendered HTML is extremely hard to do, you'd have to use a headless browser.

There is this library: https://github.com/tautologistics/node-htmlparser which you can use to parse HTML from JavaScript, it is in node, but because it doesn't use any dependencies, you can just copy and paste the functions you need.

Parsing it's not a very easy task I'm afraid.

这篇关于使用Apps脚本抓取javascript呈现的网页的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆