将HTML源代码读取为字符串 [英] Read HTML source to string

查看:83
本文介绍了将HTML源代码读取为字符串的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我希望你不要对我太过分皱眉,但这应该可以很容易地由某人负责.我想将网站上的文件读成字符串,以便从中提取信息.

I hope you don't frown on me too much, but this should be answerable by someone fairly easily. I want to read a file on a website into a string, so I can extract information from it.

我只想要一种简单的方法来将HTML源代码读入字符串.逛了几个小时后,我看到了所有这些库以及卷曲的东西.我需要的只是原始HTML数据.我什至不需要明确的答案.可以帮助我完善搜索范围的内容.

I just want a simple way to get the HTML source read into a string. After looking around for hours I see all these libraries and curl and stuff. All I need is the raw HTML data. I don't even need a definite answer. Just something that will help me refine my search.

请明确说明,我希望可以在字符串中处理原始代码,不需要任何解析等.

Just to be clear I want the raw code in a string I can manipulate, don't need any parsing etc.

推荐答案

您需要一个HTTP客户端库,其中之一是 libcurl .然后,您将向URL发出 GET 请求,并以您选择的库提供的方式读回响应.

You need an HTTP Client library, one of many is libcurl. You would then issue a GET request to a URL and read the response back how ever your chosen library provides it.

下面是一个示例,它是C,所以我确保您可以解决问题.

Here is an example to get you started, it is C so I am sure you can work it out.

#include <stdio.h>
#include <curl/curl.h>

int main(void)
{
  CURL *curl;
  CURLcode res;

  curl = curl_easy_init();
  if(curl) {
    curl_easy_setopt(curl, CURLOPT_URL, "http://example.com");
    res = curl_easy_perform(curl);

    /* always cleanup */ 
    curl_easy_cleanup(curl);
  }
  return 0;
}

但是您标记了此C ++,因此,如果要为libcurl使用C ++包装器,请使用 curlpp

But you tagged this C++ so if you want a C++ wrapper for libcurl then use curlpp

#include <curlpp/curlpp.hpp>
#include <curlpp/Easy.hpp>
#include <curlpp/Options.hpp>

using namespace curlpp::options;

int main(int, char **)
{
  try
  {
    // That's all that is needed to do cleanup of used resources
    curlpp::Cleanup myCleanup;

    // Our request to be sent.
    curlpp::Easy myRequest;

    // Set the URL.
    myRequest.setOpt<Url>("http://example.com");

    // Send request and get a result.
    // By default the result goes to standard output.
    myRequest.perform();
  }

  catch(curlpp::RuntimeError & e)
  {
    std::cout << e.what() << std::endl;
  }

  catch(curlpp::LogicError & e)
  {
    std::cout << e.what() << std::endl;
  }

  return 0;
}

这篇关于将HTML源代码读取为字符串的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆