用 rvest 刮:如何在一行中填充空白数字以在数据框中进行转换? [英] Scraping with rvest: how to fill blank numbers in a row to transform in a data frame?

查看:30
本文介绍了用 rvest 刮:如何在一行中填充空白数字以在数据框中进行转换?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试使用我在 IMDB 上抓取的 2 个数据构建一个数据框:第一个有 50 个值,第二个只有 29 个.是否有一种简单的方法可以让 R 自动填充另一个 NA他没有找到的 21 个值?

I'm trying to build a dataframe with 2 data I've scraped on IMDB: the first one has 50 values and the second one has only 29. Is there an easy way to ask R to automatically fill with NA the other 21 values that he didn't find?

我的代码:

imdb <- read_html("http://www.imdb.com/search/title?genres=horror&genres=mystery&sort=moviemeter,asc&view=advanced")
title <- html_nodes(imdb, '.lister-item-header a')
title <- html_text(title)
metascore <- html_nodes(imdb, '.ratings-metascore')
metascore <- html_text(metascore)
df <- data.frame(Title = title, Metascore = metascore)
Error in data.frame(Title = title, Metascore = metascore) : 
  arguments imply differing number of rows: 50, 29

谢谢!

推荐答案

您需要更改第四行.您希望 metascore 拥有与 title 一样多的元素,对于那些没有的 title 使用 NA列出了 metascore.这样做的方法是提取 item-content 节点,然后从每个节点中选择 ratings-metascore 节点(如果存在),或者 NA 如果没有.html_nodehtml_nodes 的区别参见 ?html_nodes.我还添加了 span 以确保只返回数字,而没有以下单词metascore".

You need to change your fourth line. You want metascore to have as many elements as title, with NA for those titles that don't have a metascore listed. The way to do this is to extract the item-content nodes, and then, from each of these, to select the ratings-metascore node if it exists, or NA if it doesn't. See ?html_nodes for the difference between html_node and html_nodes. I've also added span to ensure that just the number is returned, without the following word 'metascore'.

imdb <- read_html("http://www.imdb.com/search/title?genres=horror&genres=mystery&sort=moviemeter,asc&view=advanced")
title <- html_nodes(imdb, '.lister-item-header a')
title <- html_text(title)
metascore <- html_node(html_nodes(imdb, '.lister-item-content'), '.ratings-metascore span')
metascore <- html_text(metascore)
df <- data.frame(Title = title, Metascore = metascore)

head(df,10)
                 Title  Metascore
1              Mother!       <NA>
2  Annabelle: Creation 62        
3      Stranger Things       <NA>
4         Supernatural       <NA>
5                   It       <NA>
6  The Vampire Diaries       <NA>
7              Get Out 84        
8        The Originals       <NA>
9            Annabelle 37        
10               Grimm       <NA>

这篇关于用 rvest 刮:如何在一行中填充空白数字以在数据框中进行转换?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆