什么是抢在C网页最简单的方法? [英] What's the easiest way to grab a web page in C?
问题描述
我工作的一个老同学的Linux变种(QNX是精确的),并需要一种方法来抓取网页用什么,但插座和数组(无cookie或登录,目标URL只是一个文本文件)。
I'm working on an old school linux variant (QNX to be exact) and need a way to grab a web page (no cookies or login, the target URL is just a text file) using nothing but sockets and arrays.
任何人都得到一个片段为这个?
Anyone got a snippet for this?
请注意:我不控制服务器和我有很少与除什么已经是包装盒上的工作(添加额外的库是不是真的易中给出的约束上 - 虽然我还是爱你的libcurl )
note: I don't control the server and I've got very little to work with besides what is already on the box (adding in additional libraries is not really "easy" given the contraints -- although I do love libcurl)
推荐答案
我有一些code,但它也支持(开)SSL所以这是一个有点长在这里发表。
I do have some code, but it also supports (Open)SSL so it's a bit long to post here.
在本质:
-
解析URL(拆出URL方案,主机名,端口号,计划特定部分
parse the URL (split out URL scheme, host name, port number, scheme specific part
创建套接字:
S =插座(PF_INET,SOCK_STREAM,原);
填充 SOCKADDR_IN
结构与远程IP和端口
populate a sockaddr_in
structure with the remote IP and port
插座连接到远端:
ERR =连接(S,放大器,地址,的sizeof(地址));
发出请求字符串:
N = snprinf(头,GET /%S HTTP / 1.0 \\ r \\ n主机:%S \\ r \\ n \\ r \\ n,...);
发送请求字符串:
写(S,头,N);
读取数据:
而(N =读取(S,缓冲区,BUFSIZE)0){
...
}
关闭套接字:
关闭(S);
注:伪code以上会同时收集响应报头的和的数据。两者之间的分裂是第一个空行。
nb: pseudo-code above would collect both response headers and data. The split between the two is the first blank line.
这篇关于什么是抢在C网页最简单的方法?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!