将HTML源代码读取为字符串 [英] Read HTML source to string
问题描述
我希望你不要对我太过分皱眉,但这应该可以很容易地由某人负责.我想将网站上的文件读成字符串,以便从中提取信息.
I hope you don't frown on me too much, but this should be answerable by someone fairly easily. I want to read a file on a website into a string, so I can extract information from it.
我只想要一种简单的方法来将HTML源代码读入字符串.逛了几个小时后,我看到了所有这些库以及卷曲的东西.我需要的只是原始HTML数据.我什至不需要明确的答案.可以帮助我完善搜索范围的内容.
I just want a simple way to get the HTML source read into a string. After looking around for hours I see all these libraries and curl and stuff. All I need is the raw HTML data. I don't even need a definite answer. Just something that will help me refine my search.
请明确说明,我希望可以在字符串中处理原始代码,不需要任何解析等.
Just to be clear I want the raw code in a string I can manipulate, don't need any parsing etc.
推荐答案
您需要一个HTTP客户端库,其中之一是 libcurl
.然后,您将向URL发出 GET
请求,并以您选择的库提供的方式读回响应.
You need an HTTP Client library, one of many is libcurl
. You would then issue a GET
request to a URL and read the response back how ever your chosen library provides it.
下面是一个示例,它是C,所以我确保您可以解决问题.
Here is an example to get you started, it is C so I am sure you can work it out.
#include <stdio.h>
#include <curl/curl.h>
int main(void)
{
CURL *curl;
CURLcode res;
curl = curl_easy_init();
if(curl) {
curl_easy_setopt(curl, CURLOPT_URL, "http://example.com");
res = curl_easy_perform(curl);
/* always cleanup */
curl_easy_cleanup(curl);
}
return 0;
}
但是您标记了此C ++,因此,如果要为libcurl使用C ++包装器,请使用 curlpp
But you tagged this C++ so if you want a C++ wrapper for libcurl then use curlpp
#include <curlpp/curlpp.hpp>
#include <curlpp/Easy.hpp>
#include <curlpp/Options.hpp>
using namespace curlpp::options;
int main(int, char **)
{
try
{
// That's all that is needed to do cleanup of used resources
curlpp::Cleanup myCleanup;
// Our request to be sent.
curlpp::Easy myRequest;
// Set the URL.
myRequest.setOpt<Url>("http://example.com");
// Send request and get a result.
// By default the result goes to standard output.
myRequest.perform();
}
catch(curlpp::RuntimeError & e)
{
std::cout << e.what() << std::endl;
}
catch(curlpp::LogicError & e)
{
std::cout << e.what() << std::endl;
}
return 0;
}
这篇关于将HTML源代码读取为字符串的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!