使用Google Apps脚本来抓取动态网页 [英] Using Google Apps Script to scrape Dynamic Web Pages

查看:136
本文介绍了使用Google Apps脚本来抓取动态网页的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我想使用 Google脚本阅读其他网站上的某些项目的数据。问题中的页面是 Dyanmic ;它们包含在初始页面加载后通过JavaScript调用服务器加载的内容。通常,在静态内容方面,这种方式可以正常工作,但我对JavaScript和Google Apps脚本不熟悉,因此不知道如何通过JavaScript异步加载内容(例如,通过AJAX)。

I would like to read some data from other websites for a project using Google Script. The pages in questions are Dyanmic; they contain content that is loaded after the initial page load, via JavaScript calls to the server. Usually, with somewhat static content, this works fine but I am new to JavaScript and to Google Apps Script and thus do not know how to get the content if it is loaded asynchronously via JavaScript (e.g. via AJAX).

一个例子可以在这里找到在电台播放的最后曲目。然而,这些轨道是使用JavaScript加载的,而不是包含我得到的字符串的表

An example can be found here showing the last tracks played at a radio station. However, these tracks are loaded using JavaScript and instead of the table containing the Strings I get

<td class="row2"><span id="track_2">&nbsp;</span></td>

当我使用时:

UrlFetchApp.fetch(url).getContentText();

然而,如果我将HTML保存在浏览器中,那么正确的数据字符串就在那里:

If I save the HTML in my browser, though, the right data Strings are there:

<td class="row2" id="track_2">15:12 Will Smith - Men In Black</td>
                     ^^^^^^^  ^^^^^ ^^^^^^^^^^   ^^^^^^^^^^^^

有什么方法可以使用Google Apps Script来做到这一点吗?

Is there any way to do this with Google Apps Script?

推荐答案

通常不,不。如果您可以对其进行的反向工程进行逆向工程,则可能可以执行相同的JavaScript调用,但如果需要任何服务器协调,则可能会发生这种情况。理论上,人们可以在Google Apps脚本中运行一个JavaScript浏览器实现(如env-js),但实际上我认为如果不是不可能的话,它会很困难。

Not generally, no. If you can reverse engineer what it's doing, you might be able to do the same JavaScript calls, but the odds are against it if it requires any server coordination. In theory one could run a JavaScript browser implementation inside of Google Apps Script (like env-js) which could do this, but in practice I think it would be very difficult if not impossible to make it work.

这篇关于使用Google Apps脚本来抓取动态网页的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆