使用R连接Selenium-Server-Standalone [英] Using R to connect Selenium-Server-Standalone

查看:193
本文介绍了使用R连接Selenium-Server-Standalone的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

请参阅帖子 在R ,在此网站上,我可以创建一个网络驱动程序.但是,我无法像Python一样获得元素的详细信息.我可以知道怎么办吗?

As refer to the post Accessing the Selenium API in R in this website, I can create a webdriver. However I unable to get the element details as same as Python can. May I know how to do?

我想刮擦每一轮的足球比赛表...

I would like to scrape the soccer matches table of every single round...

# using R
library(RCurl)
library(RJSONIO)
library(XML)

# running selenium
system("java -jar selenium-server-standalone-2.35.0.jar")
baseURL<-"http://localhost:4444/wd/hub/"
server<-list(desiredCapabilities=list(browserName='firefox',javascriptEnabled=TRUE))

getURL(paste0(baseURL,"session"),
       customrequest="POST",
       httpheader=c('Content-Type'='application/json;charset=UTF-8'),
       postfields=toJSON(server))

serverDetails<-fromJSON(rawToChar(getURLContent('http://localhost:4444/wd/hub/sessions',binary=TRUE)))
serverId<-serverDetails$value[[1]]$id

# navigate to 7m.cn
URL = "http://data2.7m.cn/history_Matches_Data/2009-2010/92/en/index.shtml"
getURL(paste0(baseURL,"session/",serverId,"/url"),
       customrequest="POST",
       httpheader=c('Content-Type'='application/json;charset=UTF-8'),
       postfields=toJSON(list(url=URL)))

下面是Python中的代码,用于获取7m.cn的html元素详细信息.此外,还有什么更好的建议吗?谢谢.

Below are codes in Python to get the html element details of 7m.cn. Besides, any better idea to suggest? Thanks.

# using Python
import codecs
import lxml.html as lh
from selenium import webdriver

URL = 'http://data2.7m.cn/history_Matches_Data/2009-2010/92/en/index.shtml'
browser = webdriver.Firefox()
browser.get(URL)
content = browser.page_source
browser.quit()

推荐答案

您可以使用软件包 relenium (硒代表R).免责声明:我是开发人员之一.

You can use the package relenium (Selenium for R). Disclaimer: I'm one of the developers.

require(relenium)

firefox <- firefoxClass$new()
firefox$get('http://data2.7m.cn/history_Matches_Data/2009-2010/92/en/index.shtml')
content <- firefox$getPageSource()
firefox$close()

这篇关于使用R连接Selenium-Server-Standalone的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
相关文章
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆