下载HTTP通过套接字(C) [英] Download HTTP thru sockets (C)

查看:160
本文介绍了下载HTTP通过套接字(C)的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

最近,我开始采用本指南,让我自己开始从互联网上下载文件。我读了它,并提出了以下代码来下载网站的HTTP主体。唯一的问题是,它不工作。调用recv()调用时代码停止。它不会崩溃,它只是继续运行。这是我的错吗我用错了吗?我打算使用代码不仅仅是下载.html文件的内容,还可以下载其他文件(zip,png,jpg,dmg ...)。我希望有人可以帮助我。这是我的代码:

Recently I started taking this guide to get myself started on downloading files from the internet. I read it and came up with the following code to download the HTTP body of a website. The only problem is, it's not working. The code stops when calling the recv() call. It does not crash, it just keeps on running. Is this my fault? Am I using the wrong approch? I intent to use the code to not just download the contents of .html-files, but also to download other files (zip, png, jpg, dmg ...). I hope there's somebody that can help me. This is my code:

#include <stdio.h>
#include <sys/socket.h> /* SOCKET */
#include <netdb.h> /* struct addrinfo */
#include <stdlib.h> /* exit() */
#include <string.h> /* memset() */
#include <errno.h> /* errno */
#include <unistd.h> /* close() */
#include <arpa/inet.h> /* IP Conversion */

#include <stdarg.h> /* va_list */

#define SERVERNAME "developerief2.site11.com"
#define PROTOCOL "80"
#define MAXDATASIZE 1024*1024

void errorOut(int status, const char *format, ...);
void *get_in_addr(struct sockaddr *sa);

int main (int argc, const char * argv[]) {
    int status;

    // GET ADDRESS INFO
    struct addrinfo *infos; 
    struct addrinfo hints;

    // fill hints
    memset(&hints, 0, sizeof(hints));
    hints.ai_socktype = SOCK_STREAM;
    hints.ai_flags = AI_PASSIVE;
    hints.ai_family = AF_UNSPEC;

    // get address info
    status = getaddrinfo(SERVERNAME, 
                         PROTOCOL, 
                         &hints, 
                         &infos);
    if(status != 0)
        errorOut(-1, "Couldn't get addres information: %s\n", gai_strerror(status));

    // MAKE SOCKET
    int sockfd;

    // loop, use first valid
    struct addrinfo *p;
    for(p = infos; p != NULL; p = p->ai_next) {
        // CREATE SOCKET
        sockfd = socket(p->ai_family, 
                        p->ai_socktype, 
                        p->ai_protocol);
        if(sockfd == -1)
            continue;

        // TRY TO CONNECT
        status = connect(sockfd, 
                         p->ai_addr, 
                         p->ai_addrlen);
        if(status == -1) {
            close(sockfd);
            continue;
        }

        break;
    }

    if(p == NULL) {
        fprintf(stderr, "Failed to connect\n");
        return 1;
    }

    // LET USER KNOW
    char printableIP[INET6_ADDRSTRLEN];
    inet_ntop(p->ai_family,
              get_in_addr((struct sockaddr *)p->ai_addr),
              printableIP,
              sizeof(printableIP));
    printf("Connection to %s\n", printableIP);

    // GET RID OF INFOS
    freeaddrinfo(infos);

    // RECEIVE DATA
    ssize_t receivedBytes;
    char buf[MAXDATASIZE];
    printf("Start receiving\n");
    receivedBytes = recv(sockfd, 
                         buf, 
                         MAXDATASIZE-1, 
                         0);
    printf("Received %d bytes\n", (int)receivedBytes);
    if(receivedBytes == -1)
        errorOut(1, "Error while receiving\n");

    // null terminate
    buf[receivedBytes] = '\0';

    // PRINT
    printf("Received Data:\n\n%s\n", buf);

    // CLOSE
    close(sockfd);

    return 0;
}

void *get_in_addr(struct sockaddr *sa) {
    // IP4
    if(sa->sa_family == AF_INET)
        return &(((struct sockaddr_in *) sa)->sin_addr);

    return &(((struct sockaddr_in6 *) sa)->sin6_addr);
}

void errorOut(int status, const char *format, ...) {
    va_list args;
    va_start(args, format);
    vfprintf(stderr, format, args);
    va_end(args);
    exit(status);
}


推荐答案

如果要抓取文件使用HTTP,那么 libcURL 可能是您在C中最好的选择。但是,如果您将此作为学习网络编程的方式,那么在获取文件之前,您将不得不更多地了解HTTP。

If you want to grab files using HTTP, then libcURL is probably your best bet in C. However, if you are using this as a way to learn network programming, then you are going to have to learn a bit more about HTTP before you can retrieve a file.

您当前的程序中看到的是您需要向文件发送明确的请求,然后才能检索该文件。我将首先阅读 RFC2616 。不要试图理解这一切 - 这是很多的阅读这个例子。阅读第一部分以了解HTTP的工作原理,然后阅读部分< a href =http://tools.ietf.org/html/rfc2616#section-4 =noreferrer> 4,5和6 了解基本的邮件格式。

What you are seeing in your current program is that you need to send an explicit request for the file before you can retrieve it. I would start by reading through RFC2616. Don't try to understand it all - it is a lot to read for this example. Read the first section to get an understanding of how HTTP works, then read sections 4, 5, and 6 to understand the basic message format.

以下是一个HTTP请求的Stackoverflow问题页面的示例:

Here is an example of what an HTTP request for the stackoverflow Questions page looks like:

GET http://stackoverflow.com/questions HTTP/1.1\r\n
Host: stackoverflow.com:80\r\n
Connection: close\r\n
Accept-Encoding: identity, *;q=0\r\n
\r\n

我认为这是一个最小的请求。我明确地添加了CRLF,以显示空行用于终止请求标头块如所述在RFC2616中。如果您省略 Accept-Encoding 标题,那么结果文档可能会被转载为gzip压缩的流,因为HTTP允许这个明确的,除非你告诉服务器你不想要它。

I believe that is a minimal request. I added the CRLFs explicitly to show that a blank line is used to terminate the request header block as described in RFC2616. If you leave out the Accept-Encoding header, then the result document will probably be transfered as a gzip-compressed stream since HTTP allows for this explicitly unless you tell the server that you do not want it.

服务器响应还包含描述响应的元数据的HTTP头。以下是上一个请求的响应示例:

The server response also contains HTTP headers for the meta-data describing the response. Here is an example of a response from the previous request:

HTTP/1.1 200 OK\r\n
Server: nginx\r\n
Date: Sun, 01 Aug 2010 13:54:56 GMT\r\n
Content-Type: text/html; charset=utf-8\r\n
Connection: close\r\n
Cache-Control: private\r\n
Content-Length: 49731\r\n
\r\n
\r\n
\r\n
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01//EN" ... 49,667 bytes follow

这个简单的例子应该给你一个想知道如果你想使用HTTP抓取文件,你将会实现什么。这是最好的例子,最简单的例子。这不是我轻易做的事情,但它可能是学习和欣赏HTTP的最佳方法。

This simple example should give you an idea what you are getting into implementing if you want to grab files using HTTP. This is the best case, most simple example. This isn't something that I would undertake lightly, but it is probably the best way to learn and appreciate HTTP.

如果您正在寻找一种简单的方式来学习网络编程,这是一个体面的开始。我建议您提交一份 TCP / IP Illustrated,第1卷 UNIX网络编程,第1卷。这些可能是真正学习如何编写基于网络的应用程序的最佳方法。我可能从 FTP客户端。 ietf.org/html/rfc959rel =noreferrer> FTP 是一个简单的协议开始。

If you are looking for a simple way to learn network programming, this is a decent way to start. I would recommend picking up a copy of TCP/IP Illustrated, Volume 1 and UNIX Network Programming, Volume 1. These are probably the best way to really learn how to write network-based applications. I would probably start by writing an FTP client since FTP is a much simpler protocol to start with.

如果你试图学习与HTTP相关联的详细信息,然后:

If you are trying to learn the details associated with HTTP, then:


  1. 购买 HTTP:the Definitive Guide 并阅读

  2. 阅读 RFC2616 ,直到你明白


    • 尝试使用 telnet服务器80的示例并手动输入请求

    • 下载cURL客户端,并使用 - verbose - 包含命令行选项,以便您可以看到发生的情况

  1. Buy HTTP: the Definitive Guide and read it
  2. Read RFC2616 until you understand it
    • Try examples using telnet server 80 and typing in requests by hand
    • Download the cURL client and use the --verbose and --include command line options so that you can see what is happening

只是不打算编写自己的HTTP客户端用于企业使用。你不想这样做,相信我是一个现在一直保持这样一个错误的人...

Just don't plan on writing your own HTTP client for enterprise use. You do not want to do that, trust me as one who has been maintaining such a mistake for a little while now...

这篇关于下载HTTP通过套接字(C)的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆