使用Java登录后解析HTML源 [英] Parse HTML source after login with Java

查看:91
本文介绍了使用Java登录后解析HTML源的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我一直试图访问一个网站来解析我正在开发的Android应用程序的数据,但是登录时我没有运气.

I've been trying to access a website to parse data for an Android application I am developing, but I am having no luck when it comes to logging in.

该网站为 https://giffgaff.com/mobile/login

下面是该页面(HTML)的表格的精简版:

And below is a stripped out version of the form from that page (HTML):

<form action="/mobile/login" method="post">
    <input type="hidden" name="login_security_token" value="b22155c7259f402f8e005a771c460670">    
    <input type="hidden" name="redirect" value="/mobile">    
    <input type="hidden" name="p_next_page" value="">    


    <input name="nickname" maxlength="25" type="text" value="" />            
    <input name="password" type="password" value="" />

    <button name="step" type="submit" value="Login">Login</button>
</form>

任何人都可以建议我如何使用Java登录到该网站然后解析重定向的页面吗?

Can anyone please suggest how I can login to this website using Java then parse the redirected page?

到目前为止,我已经尝试过以下过程:

Up to now, I've tried processes on the lines of:

public static void main(Context context) {
    try {
        // Construct data
        String data = URLEncoder.encode("nickname", "UTF-8") + "=" + URLEncoder.encode("testingA", "UTF-8");
        data += "&" + URLEncoder.encode("password", "UTF-8") + "=" + URLEncoder.encode("testing", "UTF-8");

        // Send data
        URL url = new URL("https://giffgaff.com/mobile/login");
        URLConnection conn = url.openConnection();
        conn.setDoOutput(true);
        OutputStreamWriter wr = new OutputStreamWriter(conn.getOutputStream());
        wr.write(data);
        wr.flush();

        // Get the response
        BufferedReader rd = new BufferedReader(new InputStreamReader(conn.getInputStream()));
        String str = "";
        String line;
        while ((line = rd.readLine()) != null) {
            str += line;
        }

        AlertDialog alertDialog = new AlertDialog.Builder(context).create();
        alertDialog.setTitle("Output");
        alertDialog.setMessage(str);
        alertDialog.setButton("Okay", new DialogInterface.OnClickListener() {
            public void onClick(DialogInterface dialog, int which) {

            }
        });
        alertDialog.show();

        wr.close();
        rd.close();
    } catch (Exception e) {
        AlertDialog alertDialog = new AlertDialog.Builder(context).create();
        alertDialog.setTitle("ERROR");
        alertDialog.setMessage(e.toString());
        alertDialog.setButton("Okay", new DialogInterface.OnClickListener() {
            public void onClick(DialogInterface dialog, int which) {

            }
        });

        alertDialog.show();
    }
}

但是我的尝试返回的页面好像登录信息不正确.

But my attempts return the page as if the login information was incorrect.

如果您想亲自查看登录页面的行为,请参考以下测试登录详细信息: 昵称(用户名):testingA 密码:测试 该网站似乎还依赖于名为"napaSessionId"的Cookie

If you would like to see for yourself how the login page behaves, here's some test login details: Nickname (username): testingA Password: testing The site also seems to depend on a Cookie called "napaSessionId"

推荐答案

首先请注意,如果您没有直接权限进行此操作,请注意,所涉及的网站可能会在其服务条款中对此加以排除.

First a word of caution, if you don't have direct permission to do this, beware, the site in question may preclude this in their terms of service.

要回答这个问题,网站拒绝登录的原因有很多.要成功完成此操作,您需要尽可能接近浏览器处理事务的方式.为此,您需要查看真正的浏览器在做什么.

To answer the question, there are many, many reasons a site would reject a login. To do this successfully you need to get as close as possible to how a browser would handle the transaction. To do that you need to see what a real browser is doing.

https更为棘手,因为许多http嗅探器无法处理它,但httpwatch声称可以.检出HTTP事务,然后尝试复制它们.

https is more tricky as many http sniffers can't deal with it but httpwatch claims it can. Check out the HTTP transactions and then try to replicate them.

您的url.openConnection()调用实际上将返回HTTPURLConnction的实例,并转换为该&那么您就可以轻松设置各种http标头,例如User-Agent.

Your url.openConnection() call will actually return an instance of HTTPURLConnction, cast to that & then you'll be able to easily set various http headers such as the User-Agent.

最后一点,您说可能需要一个cookie.您的代码不会处理Cookie.为此,您需要使用Cookie管理器,例如: http: //download.oracle.com/javase/tutorial/networking/cookies/index.html

A final note, you say a cookie may be required. Your code isn't going to deal with cookies. To do that you'll need to use a cookie manager, e.g.: http://download.oracle.com/javase/tutorial/networking/cookies/index.html

这篇关于使用Java登录后解析HTML源的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆