我如何从servlet获取给定URL的来源? [英] How do I get the source of a given URL from a servlet?

查看:106
本文介绍了我如何从servlet获取给定URL的来源?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我想从我的servlet中读取给定URL的源代码(HTML标记)。

例如,URL是 http://www.google.com ,我的servlet需要阅读HTML源代码。为什么我需要这样呢,我的网络应用程序将会读取其他网页并获取有用的内容并对其进行处理。



可以说,我的应用程序显示一个店铺列表在一个城市中的一个类别。如何生成该列表是,我的Web应用程序(servlet)通过给定的网页显示各种商店并阅读内容。通过源代码,我的servlet可以过滤该源代码并获取有用的详细信息。最后创建列表(因为我的servlet无法访问给定的URL的Web应用程序数据库)。



任何知道任何解决方案吗? (特别是我需要这样做的servlet)如果你认为有另一种最好的方式来从另一个网站获得详细信息,请让我知道。



谢谢

解决方案

您所要做的就是所谓的网页抓取。皮艇和类似的网站做到这一点。在网上搜索它;)在Java中,你可以做到这一点。

 网址url =新网址(<您的网址> ); 

BufferedReader in = new BufferedReader(new InputStreamReader(url.openStream()));
String inputLine;
StringBuffer response = new StringBuffer(); ((inputLine = in.readLine())!= null){
response.append(inputLine +\\\
);


}

in.close();

响应会为您提供该网址返回的完整HTML内容。


I want to read a source code (HTML tags) of a given URL from my servlet.

For example, URL is http://www.google.com and my servlet needs to read the HTML source code. Why I need this is, my web application is going to read other web pages and get useful content and do something with it.

Lets say, my application shows a shop list of one category in a city. How that list is generated is, my web application (servlet) goes through a given web page which is displaying various shops and read content. With the source code my servlet filters that source and get useful details. Finally creates the list (because my servlet has no access to the given URL's web applications database).

Any know any solution? (specially I need this to do in servlets) If do you think that there is another best way to get details from another site, please let me know.

Thank you

解决方案

What you are trying to do is called web scraping. Kayak and similar websites do it. Do search for it on web ;) Well in java you can do this.

URL url = new URL(<your URL>);

BufferedReader in = new BufferedReader(new InputStreamReader(url.openStream()));
String inputLine;
StringBuffer response = new StringBuffer();

while ((inputLine = in.readLine()) != null) {
  response.append(inputLine + "\n");
}

in.close();

response will give you complete HTML content returned by that URL.

这篇关于我如何从servlet获取给定URL的来源?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆