使用 Google Apps 脚本抓取动态网页 [英] Using Google Apps Script to scrape Dynamic Web Pages

查看:32
本文介绍了使用 Google Apps 脚本抓取动态网页的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我想使用 Google Script 从其他网站读取某个项目的一些数据.问题中的页面动态;它们包含在初始页面加载后通过 JavaScript 调用服务器加载的内容.通常,对于静态内容,这可以正常工作,但我是 JavaScript 和 Google Apps 脚本的新手,因此不知道如何获取通过 JavaScript(例如通过 AJAX)异步加载的内容.

I would like to read some data from other websites for a project using Google Script. The pages in questions are Dyanmic; they contain content that is loaded after the initial page load, via JavaScript calls to the server. Usually, with somewhat static content, this works fine but I am new to JavaScript and to Google Apps Script and thus do not know how to get the content if it is loaded asynchronously via JavaScript (e.g. via AJAX).

可以在此处找到一个示例,该示例显示了在收音机中播放的最后一首曲目车站.但是,这些曲目是使用 JavaScript 加载的,而不是包含我得到的字符串的表格

An example can be found here showing the last tracks played at a radio station. However, these tracks are loaded using JavaScript and instead of the table containing the Strings I get

<td class="row2"><span id="track_2">&nbsp;</span></td>

当我使用时:

UrlFetchApp.fetch(url).getContentText();

如果我将 HTML 保存在浏览器中,那么正确的数据字符串就在那里:

If I save the HTML in my browser, though, the right data Strings are there:

<td class="row2" id="track_2">15:12 Will Smith - Men In Black</td>
                     ^^^^^^^  ^^^^^ ^^^^^^^^^^   ^^^^^^^^^^^^

有没有办法使用 Google Apps 脚本来做到这一点?

Is there any way to do this with Google Apps Script?

推荐答案

一般不,不.如果您可以对其进行的操作进行逆向工程,您也许可以执行相同的 JavaScript 调用,但如果它需要任何服务器协调,则可能性很小.从理论上讲,可以在 Google Apps Script(如 env-js)中运行 JavaScript 浏览器实现,这可以做到这一点,但在实践中,我认为即使不是不可能,也很难让它工作.

Not generally, no. If you can reverse engineer what it's doing, you might be able to do the same JavaScript calls, but the odds are against it if it requires any server coordination. In theory one could run a JavaScript browser implementation inside of Google Apps Script (like env-js) which could do this, but in practice I think it would be very difficult if not impossible to make it work.

这篇关于使用 Google Apps 脚本抓取动态网页的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆