使用Java登录后解析HTML源 [英] Parse HTML source after login with Java
问题描述
我一直试图访问一个网站来解析我正在开发的Android应用程序的数据,但是登录时我没有运气.
I've been trying to access a website to parse data for an Android application I am developing, but I am having no luck when it comes to logging in.
该网站为 https://giffgaff.com/mobile/login
下面是该页面(HTML)的表格的精简版:
And below is a stripped out version of the form from that page (HTML):
<form action="/mobile/login" method="post">
<input type="hidden" name="login_security_token" value="b22155c7259f402f8e005a771c460670">
<input type="hidden" name="redirect" value="/mobile">
<input type="hidden" name="p_next_page" value="">
<input name="nickname" maxlength="25" type="text" value="" />
<input name="password" type="password" value="" />
<button name="step" type="submit" value="Login">Login</button>
</form>
任何人都可以建议我如何使用Java登录到该网站然后解析重定向的页面吗?
Can anyone please suggest how I can login to this website using Java then parse the redirected page?
到目前为止,我已经尝试过以下过程:
Up to now, I've tried processes on the lines of:
public static void main(Context context) {
try {
// Construct data
String data = URLEncoder.encode("nickname", "UTF-8") + "=" + URLEncoder.encode("testingA", "UTF-8");
data += "&" + URLEncoder.encode("password", "UTF-8") + "=" + URLEncoder.encode("testing", "UTF-8");
// Send data
URL url = new URL("https://giffgaff.com/mobile/login");
URLConnection conn = url.openConnection();
conn.setDoOutput(true);
OutputStreamWriter wr = new OutputStreamWriter(conn.getOutputStream());
wr.write(data);
wr.flush();
// Get the response
BufferedReader rd = new BufferedReader(new InputStreamReader(conn.getInputStream()));
String str = "";
String line;
while ((line = rd.readLine()) != null) {
str += line;
}
AlertDialog alertDialog = new AlertDialog.Builder(context).create();
alertDialog.setTitle("Output");
alertDialog.setMessage(str);
alertDialog.setButton("Okay", new DialogInterface.OnClickListener() {
public void onClick(DialogInterface dialog, int which) {
}
});
alertDialog.show();
wr.close();
rd.close();
} catch (Exception e) {
AlertDialog alertDialog = new AlertDialog.Builder(context).create();
alertDialog.setTitle("ERROR");
alertDialog.setMessage(e.toString());
alertDialog.setButton("Okay", new DialogInterface.OnClickListener() {
public void onClick(DialogInterface dialog, int which) {
}
});
alertDialog.show();
}
}
但是我的尝试返回的页面好像登录信息不正确.
But my attempts return the page as if the login information was incorrect.
如果您想亲自查看登录页面的行为,请参考以下测试登录详细信息: 昵称(用户名):testingA 密码:测试 该网站似乎还依赖于名为"napaSessionId"的Cookie
If you would like to see for yourself how the login page behaves, here's some test login details: Nickname (username): testingA Password: testing The site also seems to depend on a Cookie called "napaSessionId"
推荐答案
首先请注意,如果您没有直接权限进行此操作,请注意,所涉及的网站可能会在其服务条款中对此加以排除.
First a word of caution, if you don't have direct permission to do this, beware, the site in question may preclude this in their terms of service.
要回答这个问题,网站拒绝登录的原因有很多.要成功完成此操作,您需要尽可能接近浏览器处理事务的方式.为此,您需要查看真正的浏览器在做什么.
To answer the question, there are many, many reasons a site would reject a login. To do this successfully you need to get as close as possible to how a browser would handle the transaction. To do that you need to see what a real browser is doing.
https更为棘手,因为许多http嗅探器无法处理它,但httpwatch声称可以.检出HTTP事务,然后尝试复制它们.
https is more tricky as many http sniffers can't deal with it but httpwatch claims it can. Check out the HTTP transactions and then try to replicate them.
您的url.openConnection()调用实际上将返回HTTPURLConnction的实例,并转换为该&那么您就可以轻松设置各种http标头,例如User-Agent.
Your url.openConnection() call will actually return an instance of HTTPURLConnction, cast to that & then you'll be able to easily set various http headers such as the User-Agent.
最后一点,您说可能需要一个cookie.您的代码不会处理Cookie.为此,您需要使用Cookie管理器,例如: http: //download.oracle.com/javase/tutorial/networking/cookies/index.html
A final note, you say a cookie may be required. Your code isn't going to deal with cookies. To do that you'll need to use a cookie manager, e.g.: http://download.oracle.com/javase/tutorial/networking/cookies/index.html
这篇关于使用Java登录后解析HTML源的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!