使用 rvest 从数据框列提交 URL [英] Submit URLs from a data frame column using rvest

查看:30
本文介绍了使用 rvest 从数据框列提交 URL的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个名为 dogs 的数据框,如下所示:

I have a data frame called dogs that looks like this:

url 
https://en.wikipedia.org/wiki/Dog
https://en.wikipedia.org/wiki/Dingo
https://en.wikipedia.org/wiki/Canis_lupus_dingo

我想将所有网址提交给 rvest,但我不知道如何

I would like to submit all the urls to rvest but I am not sure how to

我试过了

dogstext <-html(dogs$url) %>%
    html_nodes("p:nth-child(4)") %>%
    html_text() 

但是我遇到了这个错误

Error in UseMethod("parse") : 
  no applicable method for 'parse' applied to an object of class "factor"

推荐答案

如错误所说,解析前需要将factor列转换为字符:

As the error says, you need to convert factor column into character before parsing:

dogs$url<-as.character(dogs$url)

然后你的代码在这之后.

and then your code follows after this.

更新:

dog<-data.frame(url=c("https://en.wikipedia.org/wiki/Dog","https://en.wikipedia.org/wiki/Dingo","https://en.wikipedia.org/wiki/Canis_lupus_dingo"))
> str(dog)
'data.frame':   3 obs. of  1 variable:
 $ url: Factor w/ 3 levels "https://en.wikipedia.org/wiki/Canis_lupus_dingo",..: 3 2 1
> lapply(as.character(dog$url),function(i)dogstext <-html(i) %>%
          html_nodes("p:nth-child(4)") %>%
            html_text() )
[[1]]
[1] "The domestic dog (Canis lupus familiaris or Canis familiaris) is a domesticated canid which has been selectively bred for millennia for various behaviors, sensory capabilities, and physical attributes.[2] The global dog population is estimated to between 700 million[3] to over one billion, thus making the dog the most abundant member of order Carnivora.[4]"

[[2]]
[1] "The dingo's habitat ranges from deserts to grasslands and the edges of forests. Dingoes will normally make their dens in deserted rabbit holes and hollow logs close to an essential supply of water."

[[3]]
character(0)

这篇关于使用 rvest 从数据框列提交 URL的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆