用 rvest 刮:如何在一行中填充空白数字以在数据框中进行转换? [英] Scraping with rvest: how to fill blank numbers in a row to transform in a data frame?
问题描述
我正在尝试使用我在 IMDB 上抓取的 2 个数据构建一个数据框:第一个有 50 个值,第二个只有 29 个.是否有一种简单的方法可以让 R 自动填充另一个 NA他没有找到的 21 个值?
I'm trying to build a dataframe with 2 data I've scraped on IMDB: the first one has 50 values and the second one has only 29. Is there an easy way to ask R to automatically fill with NA the other 21 values that he didn't find?
我的代码:
imdb <- read_html("http://www.imdb.com/search/title?genres=horror&genres=mystery&sort=moviemeter,asc&view=advanced")
title <- html_nodes(imdb, '.lister-item-header a')
title <- html_text(title)
metascore <- html_nodes(imdb, '.ratings-metascore')
metascore <- html_text(metascore)
df <- data.frame(Title = title, Metascore = metascore)
Error in data.frame(Title = title, Metascore = metascore) :
arguments imply differing number of rows: 50, 29
谢谢!
推荐答案
您需要更改第四行.您希望 metascore
拥有与 title
一样多的元素,对于那些没有的 title
使用 NA
列出了 metascore
.这样做的方法是提取 item-content
节点,然后从每个节点中选择 ratings-metascore
节点(如果存在),或者 NA
如果没有.html_node
和 html_nodes
的区别参见 ?html_nodes
.我还添加了 span
以确保只返回数字,而没有以下单词metascore".
You need to change your fourth line. You want metascore
to have as many elements as title
, with NA
for those title
s that don't have a metascore
listed. The way to do this is to extract the item-content
nodes, and then, from each of these, to select the ratings-metascore
node if it exists, or NA
if it doesn't. See ?html_nodes
for the difference between html_node
and html_nodes
. I've also added span
to ensure that just the number is returned, without the following word 'metascore'.
imdb <- read_html("http://www.imdb.com/search/title?genres=horror&genres=mystery&sort=moviemeter,asc&view=advanced")
title <- html_nodes(imdb, '.lister-item-header a')
title <- html_text(title)
metascore <- html_node(html_nodes(imdb, '.lister-item-content'), '.ratings-metascore span')
metascore <- html_text(metascore)
df <- data.frame(Title = title, Metascore = metascore)
head(df,10)
Title Metascore
1 Mother! <NA>
2 Annabelle: Creation 62
3 Stranger Things <NA>
4 Supernatural <NA>
5 It <NA>
6 The Vampire Diaries <NA>
7 Get Out 84
8 The Originals <NA>
9 Annabelle 37
10 Grimm <NA>
这篇关于用 rvest 刮:如何在一行中填充空白数字以在数据框中进行转换?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!