JSOUP-如何抓取“需要登录"使用JSOUP的页面 [英] JSOUP - How to crawl a "login required" page using JSOUP

查看:112
本文介绍了JSOUP-如何抓取“需要登录"使用JSOUP的页面的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我在爬网一个想要爬网的网站时遇到了麻烦.问题是:成功登录该网站后,我无法访问需要有效登录的链接.

I'm having trouble at crawling a determined website I wish to crawl. The problem is: after successfully logging in to that website I can't access a link which requires a valid login.

例如:

public Document executeLogin(String user, String password) {
    try {
        Connection.Response loginForm = Jsoup.connect(url)
                .method(Connection.Method.GET)
                .execute();

        Document mainPage = Jsoup.connect(login-validation-url)
                .data("user", user)
                .data("senha", password)
                .cookies(loginForm.cookies())
                .post();

        Document evaluationPage = Jsoup.connect(login-required-url)
                .get();

       return evaluationPage;
    } catch (IOException ioe) {
        return null;
    }

我在这里做的是:

  • 从登录页面获取cookie,这样我就可以正确登录;
  • 然后我发布到登录验证网址,该网址将在登录后返回主页;
  • 最后,我尝试登录主页后尝试访问登录所需的URL,但是该请求使我返回登录页面,就像会话已过期一样.

我知道我必须存储cookie才能使会话保持活动状态,但是当我连接到登录验证url时,它将返回一个Document对象,并且没有从该对象获取的cookie.

I know I have to store cookies to keep the session alive, but when I connect to the login validation url, it returns me a Document object, and there are no cookies to get from that object.

是否有任何方法可以获取成功登录创建的会话"并将其发送到其他Jsoup.connects中?我想做的是抓取一个只有登录用户才能访问的页面.

Is there any way to get the "session" created by the successful log in and send it within other Jsoup.connects? What I want to do, is to crawl a page that can only be accessed by logged users.

非常感谢您.

推荐答案

登录后获取cookie:

Get the cookie after you login:

    Connection.Response loginForm = Jsoup.connect(url)
            .method(Connection.Method.GET)
            .execute();

    Connection.Response mainPage = Jsoup.connect(login-validation-url)
            .data("user", user)
            .data("senha", password)
            .cookies(loginForm.cookies())
            .execute();

    Map<String, String> cookies = mainPage.cookies();

    Document evaluationPage = Jsoup.connect(login-required-url)
            .cookies(cookies)
            .execute.parse();

   return evaluationPage;

当您获得第二个网页时,还必须使用cookie:

When you get your second webpage, you also have to use the cookie:

(来源:几天前我遇到了这个问题)

(Source: I had this problem a few days ago)

因此,将cookie放入Map会更容易:

So it's easier to just put the cookies in a Map:

Map<String, String> cookies = loginForm.cookies();

并使用这些cookie提交表单.

And submit the forms using these cookies.

这篇关于JSOUP-如何抓取“需要登录"使用JSOUP的页面的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆