R中的网页抓取? [英] Web scraping in R?

library(rvest)
url_nba <- 'https://projects.fivethirtyeight.com/2017-nba-predictions/'

webpage_nba <- read_html(url_nba)

#Using CSS selectors to scrap the rankings section
data_nba <- html_nodes(webpage_nba,'#standings-table')

#Converting the ranking data to text
data_nba <- html_text(data_nba)
write.csv(data_nba,"web scraping test.csv")

From my understanding the numbers that I want to get ( e.g. For Warriors it would be 94%, 79%, 66%, 59%) are "coded" in a different way. In other words, what it is written in the web scraping test.csv is not readable.

Is there any way that I can transform the "coded numbers" into "regular numbers" ?

解决方案

Thanks to @Alexey answer and this, the following code worked for me

library(RSelenium)
library(rvest)
library(wdman)

url_nba <- 'https://projects.fivethirtyeight.com/2017-nba-predictions/'


#initiate RSelenium. If it doesn't work, try other browser engines
# rD <- rsDriver()
# remDr <- rD$client

pDrv <- phantomjs(port = 4567L)
remDr <- remoteDriver(browserName = "phantomjs", port = 4567L)
remDr$open()
#navigate to main page
remDr$navigate(url_nba)

#find the box and click option 10 (April 14 before playoffs)
webElem <- remDr$findElement(using = 'xpath', value = "//*[@id='forecast-selector']/div[2]/select/option[10]")
webElem$clickElement()

# Save html
webpage <- remDr$getPageSource()[[1]]
# Close RSelenium
remDr$close()
pDrv$stop()

# rD[["server"]]$stop() 


# Select one of the tables and get it to dataframe
webpage_nba <- read_html(webpage) %>% html_table(fill = TRUE)
df <- webpage_nba[[3]]

# Clear the dataframe
names(df) <- df[3,]
df <- tail(df,-3)
df <- head(df,-4)
df <- df[ , -which(names(df) == "NA")]
df

这篇关于R中的网页抓取?的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

R中的网页抓取? [英] Web scraping in R?

问题描述

相关文章

前端开发最新文章

热门教程

热门工具

登录关闭

R中的网页抓取? [英] Web scraping in R?

问题描述

相关文章

前端开发最新文章

热门教程

热门工具

登录 关闭

登录关闭