使用 RSelenium 用 R 刮表 [英] scraping table with R using RSelenium

查看:18
本文介绍了使用 RSelenium 用 R 刮表的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我想抓取一个类似于this 的表格(点击搜索,您将获得一张合作伙伴表格).我想刮掉合作伙伴的名字.问题是我不知道这是一张什么样的桌子,也不知道如何刮它.

I want to scrape a table like this (click on search then you will get a table of partners). I'd want to scrape the partner names. The problem is I don't know what kind of a table that is nor how to scrape it.

我正在使用 RSelenium 包.如果可以使用 rvest 来完成,那将会很有帮助.

I am using RSelenium package. If it can be done using rvest then it would be much helpful.

那么这是一个什么样的表,是否可以用 RSeleniumrvest 抓取它,如果是,如何抓取?

So what kind of a table is this, is it possible to scrape it with RSelenium or rvest and if so, how?

ul="http://partnerlocator.symantec.com"
remDr$navigate(ul)
webElem<-remDr$findElement(using = "class", value = "button")
webElem$clickElement()
Sys.sleep(10)
webElem<-remDr$findElement(using = "class", value = "results")
unlist(webElem$getElementText())

但是我得到了一个非常复杂的文本输出 -

But I get a very complex text output like this -

CDW\nCDW\n200 North Milwaukee Avenue\nVernon Hills ,Illinois ,60061\nUnited States\nDistance: 0 mi\nSymantec Platinum Partner\nCore Security - Platinum\nThreat Protection - Platinum\nCyber Security Services - Platinum\nInformation Protection - Platinum\nDLT Solutions\nDLT Solutions\n2411 Dulles Corner Park Suite 800\nHerndon ,Virginia ,20171\nUnited States\nDistance: 0 mi\nSymantec Platinum Partner\nInformation Protection - Platinum\nThreat Protection - Platinum\nCore Security - Platinum\nCyber Security Services - Platinum\nInsight Direct USA\nInsight Direct USA\n3480 Lotus Drive\nPlano ,Texas ,75075\nUnited States\nDistance: 0 mi\nSymantec Platinum Partner\nCyber Security Services - Platinum\nCore Security - Platinum\nThreat Prot.........

推荐答案

这看起来像一个非常基本的 HTML 表格,折叠成一行,可以这样展开:

This looks like a pretty basic HTML table collapsed into one line which can be expanded as such:

library(RSelenium)

checkForServer()
ul="http://partnerlocator.symantec.com"
startServer()
remDr <- remoteDriver()
remDr$open()
remDr$navigate(ul)
webElem<-remDr$findElement(using = "class", value = "button")
webElem$clickElement()
Sys.sleep(10)
webElem<-remDr$findElement(using = "class", value = "results")
results <- webElem$getElementText()
results_chr <- unlist(strsplit(results[[1]], "\n"))

head(results_chr)
[1] "CDW"                           "CDW"                           "200 North Milwaukee Avenue"   
[4] "Vernon Hills ,Illinois ,60061" "United States"                 "Distance: 0 mi" 

您或许可以使用 rvest 从 HTML 表格中为该结果页面返回更清晰的结果,但我无法这样做.

You might be able to return a cleaner result from the HTML table for that results page with rvest but I was unable to do so.

这篇关于使用 RSelenium 用 R 刮表的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆