在R中搜索javascript网站 [英] Scraping javascript website in R

查看：1085 发布时间：2019/2/19 18:39:13 javascript r screen-scraping rvest

本文介绍了在R中搜索javascript网站的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我想从这个网址中删除比赛时间和日期：

I want to scrape the match time and date from this url:

http://www.scoreboard.com/game/rosol-l-goffin-d-2014/8drhX07d/#game-summary

通过使用chrome dev工具，我可以看到这似乎是使用以下代码生成的：

By using the chrome dev tools, I can see this appears to be generated using the following code:

<td colspan="3" id="utime" class="mstat-date">01:20 AM, October 29, 2014</td>

但这不在源html中。

But this is not in the source html.

我认为这是因为它的java（如果我错了，请纠正我）。如何使用R来抓取此信息？

I think this is because its java (correct me if Im wrong). How can I scrape this information using R?

推荐答案

因此，RSelenium不是唯一的答案（不再）。如果您可以安装PhantomJS二进制文件（从这里获取phantomjs二进制文件： http://phantomjs.org/ ）那么你可以使用它来呈现HTML并使用 rvest 进行抓取（类似于RSelenium方法，但不需要java）：

So, RSelenium is not the only answer (anymore). If you can install the PhantomJS binary (grab phantomjs binaries from here: http://phantomjs.org/) then you can use it to render the HTML and scrape it with rvest (similar to the RSelenium approach but doesn't require java):

library(rvest)

# render HTML from the site with phantomjs

url <- "http://www.scoreboard.com/game/rosol-l-goffin-d-2014/8drhX07d/#game-summary"

writeLines(sprintf("var page = require('webpage').create();
page.open('%s', function () {
    console.log(page.content); //page source
    phantom.exit();
});", url), con="scrape.js")

system("phantomjs scrape.js > scrape.html", intern = T)

# extract the content you need
pg <- html("scrape.html")
pg %>% html_nodes("#utime") %>% html_text()

## [1] "10:20 AM, October 28, 2014"

这篇关于在R中搜索javascript网站的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

在R中搜索javascript网站 [英] Scraping javascript website in R

问题描述

推荐答案

相关文章

前端开发最新文章

热门教程

热门工具

登录关闭

在R中搜索javascript网站 [英] Scraping javascript website in R

问题描述

推荐答案

相关文章

前端开发最新文章

热门教程

热门工具

登录 关闭

登录关闭