从 URL 获取页面内容? [英] Get page content from URL?

查看:37
本文介绍了从 URL 获取页面内容?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我想通过此代码从 URL 获取页面内容:

I want to get content of page from URL by this code :

public static String getContentResult(URL url) throws IOException{

    InputStream in = url.openStream();
    StringBuffer sb = new StringBuffer();

    byte [] buffer = new byte[256];

    while(true){
        int byteRead = in.read(buffer);
        if(byteRead == -1)
            break;
        for(int i = 0; i < byteRead; i++){
            sb.append((char)buffer[i]);
        }
    }
    return sb.toString();
}

但是有了这个网址:http://portal.acm.org/citation.cfm?id=152610.152611&coll=DL&dl=GUIDE&CFID=114782066&CFTOKEN=85539315我无法获得 Asbtract:数据库管理系统将继续管理......

But with this URL : http://portal.acm.org/citation.cfm?id=152610.152611&coll=DL&dl=GUIDE&CFID=114782066&CFTOKEN=85539315 i can't get Asbtract :Database management systems will continue to manage.....

你能给我解决问题的解决方案吗?提前致谢

Can you give me solution for solve problem ? Thanks in advance

推荐答案

输出get请求的header:

Outputting the header of of the get request:

HTTP/1.1 302 Moved Temporarily
Connection: close
Date: Thu, 18 Nov 2010 15:35:24 GMT
Server: Microsoft-IIS/6.0
location: http://portal.acm.org/citation.cfm?id=152610.152611&coll=DL&dl=GUIDE
Content-Type: text/html; charset=UTF-8

这意味着服务器希望您下载新的位置地址.因此,要么直接从 UrlConnection 获取标头并点击该链接,要么使用 HttpClient自动跟随重定向.下面的代码基于 HttpClient:

This means that the server wants you to download the new locations address. So either you get the header directly from the UrlConnection and follow that link or you use HttpClient automatically which automatically follow redirects. The code below is based on HttpClient:

public class HttpTest {
    public static void main(String... args) throws Exception {

        System.out.println(readPage(new URL("http://portal.acm.org/citation.cfm?id=152610.152611&coll=DL&dl=GUIDE&CFID=114782066&CFTOKEN=85539315")));
    }

    private static String readPage(URL url) throws Exception {

        DefaultHttpClient client = new DefaultHttpClient();
        HttpGet request = new HttpGet(url.toURI());
        HttpResponse response = client.execute(request);

        Reader reader = null;
        try {
            reader = new InputStreamReader(response.getEntity().getContent());

            StringBuffer sb = new StringBuffer();
            {
                int read;
                char[] cbuf = new char[1024];
                while ((read = reader.read(cbuf)) != -1)
                    sb.append(cbuf, 0, read);
            }

            return sb.toString();

        } finally {
            if (reader != null) {
                try {
                    reader.close();
                } catch (IOException e) {
                    e.printStackTrace();
                }
            }
        }
    }
}

这篇关于从 URL 获取页面内容?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆