R:使用 rvest 进行 LinkedIn 抓取 [英] R: LinkedIn scraping using rvest

查看:44
本文介绍了R:使用 rvest 进行 LinkedIn 抓取的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

使用 rvest 包,我试图从我的

LinkedIn 有一个 API,但是由于某种原因,下面只返回前两个职位的经验,没有其他项目(如教育、项目).因此采用了抓取方法.

library("Rlinkedin")auth = inOAuth(application_name,consumer_key,consumer_secret)getProfile(auth, connections = FALSE, id = NULL) # 返回非常有限的数据

解决方案

你让事情变得不必要地困难...你需要做的就是向 https://api.linkedin.com/v1/people/~?format=json 从 Linkedin 获取 OAuth 2.0 令牌后.在 R 中,您可以使用 jsonlite:

库(jsonlite)Linkedin <- fromJSON('https://api.linkedin.com/v1/people/~?format=json')位置 <-linkedin$headline

您的 oauth 令牌必须具有r_basicprofile"成员权限.

Using rvest package, I am trying to scrape data from my LinkedIn profile.

These attempts:

library(rvest)
url = "https://www.linkedin.com/profile/view?id=AAIAAAFqgUsBB2262LNIUKpTcr0cF_ekoX9ZJh0&trk=nav_responsive_tab_profile"
li = read_html(url)
html_nodes(li, "#experience-316254584-view span.field-text")
html_nodes(li, xpath='//*[@id="experience-610617015-view"]/p/span/text()')

don't find any nodes:

#> {xml_nodeset (0)}

Q: How to return just the text?

#> "Quantitative hedge fund manager selection for $650m portfolio of alternative investments"

EDIT:

LinkedIn has an API, however for some reason, below returns only the first two positions of experience, no other items (like education, projects). Hence the scraping approach.

library("Rlinkedin")
auth = inOAuth(application_name, consumer_key, consumer_secret)
getProfile(auth, connections = FALSE, id = NULL) # returns very limited data

解决方案

You are making things unnecessarily difficult... All you need to do is issue a GET request to https://api.linkedin.com/v1/people/~?format=json after obtaining an OAuth 2.0 token from Linkedin. In R, you can do this using jsonlite:

library(jsonlite)
linkedin <- fromJSON('https://api.linkedin.com/v1/people/~?format=json')
position <- linkedin$headline

You must have the 'r_basicprofile' member permission on your oauth token.

这篇关于R:使用 rvest 进行 LinkedIn 抓取的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆