使用套接字使用Java获取网页 [英] using sockets to fetch a webpage with java

查看:47
本文介绍了使用套接字使用Java获取网页的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我想获取一个网页,仅获取数据(不解析或呈现任何内容),仅捕获http请求后返回的数据.

I'd like to fetch a webpage, just fetching the data (not parsing or rendering anything), just catch the data returned after a http request.

我正在尝试使用JavaRuntime库的高级类 Socket 来做到这一点.

I'm trying to do this using the high-level Class Socket of the JavaRuntime Library.

我想知道是否可行,因为我不容易确定用于两点通信的底层,或者我不知道问题是否出在我自己的系统上.

I wonder if this is possible since I'm not at ease figuring out the beneath layer used for this two-point communication or I don't know if the trouble is coming from my own system.

.

这是我的代码正在做的事情:

1).设置套接字.

this.socket = new Socket( "www.example.com", 80 );

2)设置用于此通信的适当流.

2) setting the appropriate streams used for this communication.

this.out = new PrintWriter( socket.getOutputStream(), true);
this.in = new BufferedReader( new InputStreamReader( socket.getInputStream() ) );

3)(请求页面)(这是我不确定这样做的地方).

3) requesting the page (and this is where I'm not sure it's alright to do like this).

String query = "";
query += "GET / HTTP/1.1\r\n";
query += "Host: www.example.com\r\n";
...
query += "\r\n";

this.out.print(query);

4)读取结果(对于我而言,什么都没有).

4) reading the result (nothing in my case).

System.out.print( this.in.readLine() );

5).关闭套接字和流.

推荐答案

如果您使用的是* nix系统,请查看 CURL ,它使您可以使用命令行从Internet检索信息.比Java套接字连接轻巧.

If you're on a *nix system, look into CURL, which allows you to retrieve information off the internet using the command line. More lightweight than a Java socket connection.

如果您想使用Java,并且只是从网页上检索信息,请查看Java URL库(

If you want to use Java, and are just retrieving information from a webpage, check out the Java URL library (java.net.URL). Some sample Java code:

URL ur = new URL("www.google.com");
URLConnection conn = ur.openConnection();
InputStream is = conn.getInputStream();
String foo = new Scanner(is).useDelimiter("\\A").next();
System.out.println(foo);

这将获取指定的URL,获取数据(在这种情况下为html),然后将其吐出到控制台.可能必须微调定界符,但这对于大多数发送数据的网络端点都适用.

That'll grab the specified URL, grab the data (html in this case) and spit it out to the console. Might have to tweak the delimiter abit, but this will work with most network endpoints sending data.

这篇关于使用套接字使用Java获取网页的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆