使用 rvest 抓取带有登录页面的网站 [英] Using rvest to scrape a website w/ a login page
本文介绍了使用 rvest 抓取带有登录页面的网站的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!
问题描述
这是我的代码:
library(rvest)
#login
url <- "https://secure.usnews.com/member/login?ref=https%3A%2F%2Fpremium.usnews.com%2Fbest-graduate-schools%2Ftop-medical-schools%2Fresearch-rankings"
session <- html_session(url)
form <- html_form(read_html(url))[[1]]
filled_form <- set_values(form,
username = "notmyrealemail",
password = "notmyrealpassword")
submit_form(session, filled_form)
这是我在 submit_form
之后得到的输出:
Here's what I get as output after submit_form
:
<session> https://premium.usnews.com/best-graduate-schools/top-medical-schools/research-rankings
Status: 200
Type: text/html; charset=utf-8
Size: 286846
我认为这意味着它有效吗?如果是这样,我如何在登录后出现的页面上read_html
?
I assume this means it worked? If so, how do I read_html
on the page that appears after I log in?
推荐答案
Nvm,通过使用 url <- jump_to(session, "https://premium.usnews.com/best-graduate-schools/top-medical-schools/research-rankings")
这篇关于使用 rvest 抓取带有登录页面的网站的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!
查看全文