Google App Engine(Java):URL抓取响应太大的问题 [英] Google App Engine ( Java ) : URL Fetch Response too large problems

查看:110
本文介绍了Google App Engine(Java):URL抓取响应太大的问题的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述



现在问题是,我需要从网站获取数据(HTML Scraping)。



请求如下所示:

 网址url =新网址(金银丝); 
con =(HttpURLConnection)url.openConnection();
InputStreamReader in = new InputStreamReader(con.getInputStream());
BufferedReader reader = new BufferedReader(in);

String result =;
String line =; ((line = reader.readLine())!= null)

{
System.out.println(line);
}
返回结果;

现在,App Engine在第3行给了我例外:

  com.google.appengine.api.urlfetch.ResponseTooLargeException 

这是因为最大请求限制为1mb,而HTML页面总共大约为1.5mb。



现在我的问题是:
我只需要html的前20行。有没有办法只得到HTML的一部分,以免引发ResponseTooLargeException?



预先感谢!



并将allowtruncate选项设置为true;

使用低级URLFetch api解决了问题。 / p>

http://code.google.com/intl/nl-NL/appengine/docs/java/javadoc/com/google/appengine/api/urlfetch/FetchOptions.html



基本上它是这样的:

  HTTPRequest request = new HTTPRequest(_url,HTTPMethod.POST,Builder.allowTruncate()); 
URLFetchService服务= URLFetchServiceFactory.getURLFetchService();
HTTPResponse response = service.fetch(request);


I'm trying to build some sort of webservice on google apps.

Now the problem is, I need to get data from a website (HTML Scraping).

The request looks like :

URL url = new URL(p_url);
con = (HttpURLConnection) url.openConnection();
InputStreamReader in = new InputStreamReader(con.getInputStream());
BufferedReader reader = new BufferedReader(in);

        String result = "";
        String line = "";
        while((line = reader.readLine()) != null)
        {
            System.out.println(line);
        }
        return result;

Now App Engine gives me the follwing exception at the 3th line:

com.google.appengine.api.urlfetch.ResponseTooLargeException

This is because the maximum request limit is at 1mb and the total HTML from the page is about 1.5mb.

Now my question: I only need the first 20 lines of the html to scrape. Is there a way to only get a part of the HTML so that the ResponseTooLargeException will not be thrown?

Thanks in advance!

解决方案

Solved the problem by using the low level URLFetch api.

And setting the allowtruncate option to true;

http://code.google.com/intl/nl-NL/appengine/docs/java/javadoc/com/google/appengine/api/urlfetch/FetchOptions.html

Basicly it works like this :

HTTPRequest request = new HTTPRequest(_url, HTTPMethod.POST, Builder.allowTruncate());
URLFetchService service = URLFetchServiceFactory.getURLFetchService();
HTTPResponse response = service.fetch(request);

这篇关于Google App Engine(Java):URL抓取响应太大的问题的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆