Java - 不从URL获取HTML代码 [英] Java - Not getting html code from a URL
本文介绍了Java - 不从URL获取HTML代码的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!
问题描述
我想获取的HTML源代码https://www2.cslb.ca.gov/OnlineServices/CheckLicenseII/LicenseDetail.aspx?LicNum=872423
,为此,我使用这种方法,但是我没有得到html源代码。
pre $ public static String getHTML url){
HttpURLConnection conn; //到网页的实际连接
BufferedReader rd; //用于从网页读取结果
String line; //网页的单独一行HTML
String result =; //包含所有HTML的长字符串
try {
conn =(HttpURLConnection)url.openConnection();
conn.setRequestMethod(GET);
rd = new BufferedReader(new InputStreamReader(conn.getInputStream())); ((line = rd.readLine())!= null){
result + = line;
}
rd.close();
} catch(Exception e){
e.printStackTrace();
}
返回结果;
}
解决方案
服务器过滤掉Java的默认值用户代理
。这样做:
public static String getHTML(URL url){
try {
final URLConnection urlConnection = url.openConnection();
urlConnection.addRequestProperty(User-Agent,Foo?);
final InputStream inputStream = urlConnection.getInputStream();
final String html = IOUtils.toString(inputStream);
inputStream.close();
返回html;
} catch(Exception e){
throw new RuntimeException(e);
}
看起来用户代理是黑名单。默认情况下,我的JDK发送:
用户代理:Java / 1.6.0_26
请注意,我正在使用 IOUtils
类来简化示例,但关键的是:
urlConnection.addRequestProperty(User-Agent,Foo?);
I want to get the html source code of https://www2.cslb.ca.gov/OnlineServices/CheckLicenseII/LicenseDetail.aspx?LicNum=872423
and for that I am using this method but I am not getting the html source code.
public static String getHTML(URL url) {
HttpURLConnection conn; // The actual connection to the web page
BufferedReader rd; // Used to read results from the web page
String line; // An individual line of the web page HTML
String result = ""; // A long string containing all the HTML
try {
conn = (HttpURLConnection) url.openConnection();
conn.setRequestMethod("GET");
rd = new BufferedReader(new InputStreamReader(conn.getInputStream()));
while ((line = rd.readLine()) != null) {
result += line;
}
rd.close();
} catch (Exception e) {
e.printStackTrace();
}
return result;
}
解决方案
The server filters out Java's default User-Agent
. This works:
public static String getHTML(URL url) {
try {
final URLConnection urlConnection = url.openConnection();
urlConnection.addRequestProperty("User-Agent", "Foo?");
final InputStream inputStream = urlConnection.getInputStream();
final String html = IOUtils.toString(inputStream);
inputStream.close();
return html;
} catch (Exception e) {
throw new RuntimeException(e);
}
Looks like the user agents are black listed. By default my JDK sends:
User-Agent: Java/1.6.0_26
Note that I'm using IOUtils
class to simplify example, but the key things is:
urlConnection.addRequestProperty("User-Agent", "Foo?");
这篇关于Java - 不从URL获取HTML代码的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!
查看全文