使用Jsoup要求登录的Java抓取网站 [英] Java scrape website with login required using Jsoup

查看:61
本文介绍了使用Jsoup要求登录的Java抓取网站的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我想从streetinsider.com打印一些数据(带有class ="news_article"的div).我创建了一个帐户,我需要登录才能访问这些数据.

I'd like to printsome datas (div with class="news_article") from streetinsider.com. I created an account and I need to log in to access those datas.

有人可以向我解释为什么此代码无法正常工作吗?我已经尝试了很多,但没有任何效果.

Can anyone explain me why this code is not working ? I've tried a lot but nothing is working.

    public static final String SPLIT_INTERNET_URL = "http://www.streetinsider.com/Special+Dividends?offset=55";
public static final String SPLIT_LOGIN = "https://www.streetinsider.com/login.php";

/**
 * @param args the command line arguments
 * @throws java.io.FileNotFoundException
 * @throws java.io.UnsupportedEncodingException
 * @throws java.text.ParseException
 * @throws java.lang.ClassNotFoundException
 */
public static void main(String[] args) throws FileNotFoundException, UnsupportedEncodingException, IOException, ParseException, ClassNotFoundException {
    // TODO code application logic here
    Response res = Jsoup.connect(SPLIT_LOGIN)
            .data("loginemail", "XXXXX", "password", "XXXX")
            .method(Method.POST)
            .execute();
    Document doc = res.parse();

    Map<String, String> cookies = res.cookies();

    Document pageWhenAlreadyLoggedIn = Jsoup.connect(SPLIT_INTERNET_URL).cookies(cookies).get();
    Elements elems = pageWhenAlreadyLoggedIn.select("div[class=news_article]");
    for (Element elem : elems) {
        System.out.println(elem);
    }
}

推荐答案

您的代码无法将您登录到网站....请尝试以下代码登录该网站.

Your code doesn't log you in to the website....Try the below code to login to the website.

要登录网站:

Connection.Response res = Jsoup.connect(SPLIT_LOGIN)
            .data("action", "account", 
                "redirect", "account_home.php?",
                "radiobutton", "old", 
                "loginemail", "XXXXX",
                "password", "XXXXX", 
                "LoginChoice", "Sign In to Secure Area")
            .method(Connection.Method.POST)
            .followRedirects(true)
            .userAgent("Mozilla/5.0 (Macintosh; Intel Mac OS X 10_11_4) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/49.0.2623.110 Safari/537.36")
            .execute();

因此,您现在已经登录,但是该网站似乎检测到您是在其他浏览器还是在连接中登录,因此要求您先终止该连接.因此,下面是终止连接的代码:

So you are now logged in, however the website seems to detect whether you are logged in in other browser or connection, requests that you terminate that connection first. So below is the code for terminating the connection:

Connection.Response res2 = Jsoup.connect("http://www.streetinsider.com/login_duplicate.php")
            .data("ok", "End Prior Session")
            .method(Connection.Method.POST)
            .cookies(res.cookies())
            .followRedirects(true)
            .userAgent("Mozilla/5.0 (Macintosh; Intel Mac OS X 10_11_4) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/49.0.2623.110 Safari/537.36")
            .execute();

一切顺利,现在 res2 将包含您帐户的主页,然后您可以继续转到所需的任何页面.有关如何使用 Jsoup 登录网站的更多信息,请查看以下教程:

All good, now res2 will contains the home page of your account, you can then proceed to go to whatever page you want. For more information on how to login to a website with Jsoup, take a look at the following tutorial:

如何使用Jsoup登录网站

这篇关于使用Jsoup要求登录的Java抓取网站的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆