使用java检索任何网站 [英] retrieve any website using java

查看：164 发布时间：2019/6/16 9:41:39 Java

本文介绍了使用java检索任何网站的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

如何使用java检索完整的网站，我可以在离线模式下使用网站，但不使用任何工具。仅使用java程序。

解决方案

您需要开发 Web抓取的技术：http://en.wikipedia.org/wiki/Web_scraping [ ^ ]。

你可以使用类 HttpURLConnection ：

http://docs.oracle.com /javase/7/docs/api/java/net/HttpURLConnection.html [ ^ ]。

有一个Google选项，类 com.google.appengine.api.urlfetch.HTTPRequest ：

https://developers.google.com/appengine/docs/java/java doc / com / google / appengine / api / urlfetch / HTTPRequest [ ^ ]。

获得内容后，您很可能需要收集页面上的部分或全部链接，并使用它们进行进一步的抓取。您需要使用一些适当的HTML解析器。你可以自己找一个。

你可以考虑几个选项：

http://htmlcleaner.sourceforge.net/ [ ^ ]，

http://htmlparser.sourceforge.net/ [ ^ ]。

-SA

how to retrieve a complete web site using java in that i can use the web site in offline mode but without using any tool. Only using java program.

解决方案

You need to develop the techniques of Web scraping: http://en.wikipedia.org/wiki/Web_scraping[^].

You can use the class HttpURLConnection:
http://docs.oracle.com/javase/7/docs/api/java/net/HttpURLConnection.html[^].

There is a Google option, the class com.google.appengine.api.urlfetch.HTTPRequest:
https://developers.google.com/appengine/docs/java/javadoc/com/google/appengine/api/urlfetch/HTTPRequest[^].

After you get the content, you will most likely need to collect some or all links on a page and use them to do further scraping. You would need to utilize some appropriate HTML parser. You can find one by yourself.

Just a couple of options you can consider:
http://htmlcleaner.sourceforge.net/[^],
http://htmlparser.sourceforge.net/[^].

—SA

这篇关于使用java检索任何网站的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

使用java检索任何网站 [英] retrieve any website using java

问题描述

相关文章

其他开发语言最新文章

热门教程

热门工具

登录关闭

使用java检索任何网站 [英] retrieve any website using java

问题描述

相关文章

其他开发语言最新文章

热门教程

热门工具

登录 关闭

登录关闭