Google App Engine(Java):URL抓取响应太大的问题 [英] Google App Engine ( Java ) : URL Fetch Response too large problems
问题描述
现在问题是,我需要从网站获取数据(HTML Scraping)。
请求如下所示:
网址url =新网址(金银丝);
con =(HttpURLConnection)url.openConnection();
InputStreamReader in = new InputStreamReader(con.getInputStream());
BufferedReader reader = new BufferedReader(in);
String result =;
String line =; ((line = reader.readLine())!= null)
{
System.out.println(line);
}
返回结果;
现在,App Engine在第3行给了我例外:
com.google.appengine.api.urlfetch.ResponseTooLargeException
这是因为最大请求限制为1mb,而HTML页面总共大约为1.5mb。
现在我的问题是:
我只需要html的前20行。有没有办法只得到HTML的一部分,以免引发ResponseTooLargeException?
预先感谢!
并将allowtruncate选项设置为true;
使用低级URLFetch api解决了问题。 / p>基本上它是这样的:
HTTPRequest request = new HTTPRequest(_url,HTTPMethod.POST,Builder.allowTruncate());
URLFetchService服务= URLFetchServiceFactory.getURLFetchService();
HTTPResponse response = service.fetch(request);
I'm trying to build some sort of webservice on google apps.
Now the problem is, I need to get data from a website (HTML Scraping).
The request looks like :
URL url = new URL(p_url);
con = (HttpURLConnection) url.openConnection();
InputStreamReader in = new InputStreamReader(con.getInputStream());
BufferedReader reader = new BufferedReader(in);
String result = "";
String line = "";
while((line = reader.readLine()) != null)
{
System.out.println(line);
}
return result;
Now App Engine gives me the follwing exception at the 3th line:
com.google.appengine.api.urlfetch.ResponseTooLargeException
This is because the maximum request limit is at 1mb and the total HTML from the page is about 1.5mb.
Now my question: I only need the first 20 lines of the html to scrape. Is there a way to only get a part of the HTML so that the ResponseTooLargeException will not be thrown?
Thanks in advance!
Solved the problem by using the low level URLFetch api.
And setting the allowtruncate option to true;
Basicly it works like this :
HTTPRequest request = new HTTPRequest(_url, HTTPMethod.POST, Builder.allowTruncate());
URLFetchService service = URLFetchServiceFactory.getURLFetchService();
HTTPResponse response = service.fetch(request);
这篇关于Google App Engine(Java):URL抓取响应太大的问题的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!