Java - 不从URL获取HTML代码 [英] Java - Not getting html code from a URL

查看：245 发布时间：2018/6/26 10:13:39 java html url httpurlconnection

本文介绍了Java - 不从URL获取HTML代码的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我想获取的HTML源代码https://www2.cslb.ca.gov/OnlineServices/CheckLicenseII/LicenseDetail.aspx?LicNum=872423
，为此，我使用这种方法，但是我没有得到html源代码。

pre $ public static String getHTML url）{
HttpURLConnection conn; //到网页的实际连接
BufferedReader rd; //用于从网页读取结果
String line; //网页的单独一行HTML
String result =; //包含所有HTML的长字符串
try {
conn =（HttpURLConnection）url.openConnection（）;
conn.setRequestMethod（GET）;
rd = new BufferedReader（new InputStreamReader（conn.getInputStream（）））; （（line = rd.readLine（））！= null）{
result + = line;
}
rd.close（）;
} catch（Exception e）{
e.printStackTrace（）;
}
返回结果;
}

解决方案

服务器过滤掉Java的默认值用户代理。这样做：

  public static String getHTML（URL url）{
 try {
 final URLConnection urlConnection = url.openConnection（）; 
 urlConnection.addRequestProperty（User-Agent，Foo？）; 
 final InputStream inputStream = urlConnection.getInputStream（）; 
 final String html = IOUtils.toString（inputStream）; 
 inputStream.close（）; 
返回html; 
} catch（Exception e）{
 throw new RuntimeException（e）; 
}

看起来用户代理是黑名单。默认情况下，我的JDK发送：

 用户代理：Java / 1.6.0_26

请注意，我正在使用 IOUtils 类来简化示例，但关键的是：

  urlConnection.addRequestProperty（User-Agent，Foo？）;

I want to get the html source code of https://www2.cslb.ca.gov/OnlineServices/CheckLicenseII/LicenseDetail.aspx?LicNum=872423 and for that I am using this method but I am not getting the html source code.

public static String getHTML(URL url) {
    HttpURLConnection conn; // The actual connection to the web page
    BufferedReader rd; // Used to read results from the web page
    String line; // An individual line of the web page HTML
    String result = ""; // A long string containing all the HTML
    try {
        conn = (HttpURLConnection) url.openConnection();
        conn.setRequestMethod("GET");
        rd = new BufferedReader(new InputStreamReader(conn.getInputStream()));
        while ((line = rd.readLine()) != null) {
            result += line;
        }
        rd.close();
    } catch (Exception e) {
        e.printStackTrace();
    }
    return result;
}

解决方案

The server filters out Java's default User-Agent. This works:

public static String getHTML(URL url) {
    try {
        final URLConnection urlConnection = url.openConnection();
        urlConnection.addRequestProperty("User-Agent", "Foo?");
        final InputStream inputStream = urlConnection.getInputStream();
        final String html = IOUtils.toString(inputStream);
        inputStream.close();
        return html;
    } catch (Exception e) {
        throw new RuntimeException(e);
    }

Looks like the user agents are black listed. By default my JDK sends:

User-Agent: Java/1.6.0_26

Note that I'm using IOUtils class to simplify example, but the key things is:

urlConnection.addRequestProperty("User-Agent", "Foo?");

这篇关于Java - 不从URL获取HTML代码的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

Java - 不从URL获取HTML代码 [英] Java - Not getting html code from a URL

问题描述

相关文章

Java开发最新文章

热门教程

热门工具

登录关闭

Java - 不从URL获取HTML代码 [英] Java - Not getting html code from a URL

问题描述

相关文章

Java开发最新文章

热门教程

热门工具

登录 关闭

登录关闭