R中的网页抓取? [英] Web scraping in R?
问题描述
我想抓取
请注意,我在右上角选择了一个特定日期.
通过遵循 根据我的理解,我想要获取的数字(例如,对于勇士来说,分别为94%,79%,66%,59%)以不同的方式编码".换句话说,在 有什么方法可以将编码数字"转换为常规数字"? 感谢@Alexey的回答和 I would like to web scrape this web site In particular I would like to take the information that it is in that table: Please note that I choose a specific date on the upper right corner. By following this guide I wrote the following code From my understanding the numbers that I want to get ( e.g. For Warriors it would be 94%, 79%, 66%, 59%) are "coded" in a different way. In other words, what it is written in the Is there any way that I can transform the "coded numbers" into "regular numbers" ? Thanks to @Alexey answer and this, the following code worked for me
这篇关于R中的网页抓取?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋! web scraping test.csv
中编写的内容不可读.library(rvest)
url_nba <- 'https://projects.fivethirtyeight.com/2017-nba-predictions/'
webpage_nba <- read_html(url_nba)
#Using CSS selectors to scrap the rankings section
data_nba <- html_nodes(webpage_nba,'#standings-table')
#Converting the ranking data to text
data_nba <- html_text(data_nba)
write.csv(data_nba,"web scraping test.csv")
web scraping test.csv
is not readable. library(RSelenium)
library(rvest)
library(wdman)
url_nba <- 'https://projects.fivethirtyeight.com/2017-nba-predictions/'
#initiate RSelenium. If it doesn't work, try other browser engines
# rD <- rsDriver()
# remDr <- rD$client
pDrv <- phantomjs(port = 4567L)
remDr <- remoteDriver(browserName = "phantomjs", port = 4567L)
remDr$open()
#navigate to main page
remDr$navigate(url_nba)
#find the box and click option 10 (April 14 before playoffs)
webElem <- remDr$findElement(using = 'xpath', value = "//*[@id='forecast-selector']/div[2]/select/option[10]")
webElem$clickElement()
# Save html
webpage <- remDr$getPageSource()[[1]]
# Close RSelenium
remDr$close()
pDrv$stop()
# rD[["server"]]$stop()
# Select one of the tables and get it to dataframe
webpage_nba <- read_html(webpage) %>% html_table(fill = TRUE)
df <- webpage_nba[[3]]
# Clear the dataframe
names(df) <- df[3,]
df <- tail(df,-3)
df <- head(df,-4)
df <- df[ , -which(names(df) == "NA")]
df