RVest刮擦NBA统计数据表 [英] Scraping table of NBA stats with rvest

查看:67
本文介绍了RVest刮擦NBA统计数据表的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我想用RVest刮擦NBA球队的数据表,我尝试使用:

I'd like to scrape a table of NBA team stats with rvest, I've tried using:


  1. the表格元素

  1. the table element

library(rvest)

url_nba <- "http://stats.nba.com/teams/advanced/#!?sort=TEAM_NAME&dir=-1"

team_stats <- url_nba %>% read_html %>% html_nodes('table') %>% html_table


  • xpath(通过Google Chrome浏览器检查)

  • the xpath (via google chrome inspect)

    team_stats <- url_nba %>% 
          read_html %>%
          html_nodes(xpath="/html/body/main/div[2]/div/div[2]/div/div/nba-stat-table/div[1]/div[1]/table") %>%
          html_table
    


  • css选择器(通过mozilla inspect):

  • the css selector (via mozilla inspect):

    team_stats <- url_nba %>% 
          read_html %>%
          html_nodes(".nba-stat-table__overflow > table:nth-child(1)") %>%
          html_table
    


  • 但是没有运气。任何帮助将不胜感激。

    but with no luck. Any help would be greatly appreciated.

    推荐答案

    此问题与以下问题非常相似:如何在R中选择JSON数据的特定部分?

    This question is very similar to this one: How to select a particular section of JSON Data in R?

    您请求的数据未存储在html代码中,因此使用rvest失败。所请求的数据存储为XHR文件,可以直接访问:

    The data you are requesting is not stored in the html code, thus the failures using rvest. The requested data is stored as a XHR file which and can be accessed directly:

    library(httr)
    library(jsonlite)
    
    nba<-GET('http://stats.nba.com/stats/leaguedashteamstats?Conference=&DateFrom=&DateTo=&Division=&GameScope=&GameSegment=&LastNGames=0&LeagueID=00&Location=&MeasureType=Advanced&Month=0&OpponentTeamID=0&Outcome=&PORound=0&PaceAdjust=N&PerMode=PerGame&Period=0&PlayerExperience=&PlayerPosition=&PlusMinus=N&Rank=N&Season=2016-17&SeasonSegment=&SeasonType=Regular+Season&ShotClockRange=&StarterBench=&TeamID=0&VsConference=&VsDivision=' )
    

    将数据加载到nba变量后,使用httr和jsonlite清理数据:

    Once the data is loaded into a the nba variable, using httr and jsonlite to clean-up the data:

    #access the data
    out<- content(nba, as="text") %>% fromJSON(flatten=FALSE) 
    
    #convert into dataframe.  
    #  str(out) to determine the structure
    df<-data.frame(out$resultSets$rowSet)
    names(df)<-out$resultSets$headers[[1]]
    

    我强烈建议阅读我上面链接的问题的答案。

    I highly recommend reading the answer to the question which I linked above.

    这篇关于RVest刮擦NBA统计数据表的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

    查看全文
    登录 关闭
    扫码关注1秒登录
    发送“验证码”获取 | 15天全站免登陆