如何使用libcurl的登录到一个安全的网站,并在登录后的HTML得到 [英] How do I use libcurl to login to a secure website and get at the html behind the login

查看:272
本文介绍了如何使用libcurl的登录到一个安全的网站,并在登录后的HTML得到的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我想知道,如果你们能帮助我通过使用C和libcurl的访问登陆页面后面的HTML工作。

Hey guys, I was wondering if you guys could help me work through accessing the html behind a login page using C and libcurl.

具体例子:

该网站我想访问 https://onlineservices.ubs .COM / olsauth / EX / PBL / ubso /分升

是否有可能做这样的事情?

Is it possible to do something like this?

问题是,我们有很多客户的每一个都有一个单独的登录。我们需要从每一天他们的每一个账户中获取数据。这将是非常漂亮的,如果我们可以用C编写的东西要做到这一点,所有相关数据保存到一个文件中。 (如账户和位置,我可以从HTML解析值)

The problem is that we have a lot of clients each of which has a separate login. We need to get data from each of their accounts every day. It would be really slick if we could write something in C to do this and save all the pertinent data into a file. (like the values of the accounts and positions which I can parse from the html)

你们觉得呢?这是可能的,你可以帮我指出一些例子正确的方向,等...?

What do you guys think? Is this possible and could you help point me in the right direction with some examples, etc...?

推荐答案

在登录页面粗略地看一眼后,就可以用的libcurl要做到这一点,通过发布用户名/密码组合自己的认证页面,并假设他们使用cookies来重新present一个登录会话。第一步是要确保你有下列选项设置:

After a cursory glance at the login page, it is possible to do this with libcurl, by posting the username/password combo to their authenticating page, and assuming they use cookies to represent a login session. The first step is to make sure that you've got the following options set:


  • CURLOPT_FOLLOWLOCATION - 服务器可以验证后重定向,这是相当普遍

  • CURLOPT_POST - 这将告诉libcurl的切换到后模式

  • CURLOPT_POSTFIELDS - 这告诉libcurl的值来为后场设置。将此选项设置为用户id =<插入用户名>&放大器;密码=<插入密码> 。这个值是从源头code该页面的。

  • CURLOPT_USERAGENT - 设置一个简单的用户代理,使Web服务器将不会把它扔出去(有些严格的人会这么做)。

  • CURLOPT_FOLLOWLOCATION - The server may redirect after authenticating, this is quite common.
  • CURLOPT_POST - This tells libcurl to switch into post mode.
  • CURLOPT_POSTFIELDS - This tells libcurl the values to set for the post fields. Set this option to "userId=<insert username>&password=<insert password>". That value is derived from the source code for that page.
  • CURLOPT_USERAGENT - Set a simple user-agent, so that the web server won't throw it out (some strict ones will do this).

然后,一旦后完成,比如libcurl的应该包含一些由站点识别登录的用户授权的cookie。卷曲应该保持一个给定的实例中跟踪的cookie。如果你想调整饼干的行为有很多的卷曲的选择。

Then, once the post is complete, the libcurl instance should contain some sort of authorisation cookie used by the site to identify a logged-in user. Curl should keep track of cookies within a given instance. There are plenty of options for Curl if you want to tweak how cookies behave.

确认,一旦你是一样的libcurl实例用于该帐户下的每个请求已登录,否则会视你为退出。

Make sure that once you are 'logged-in' that the same libcurl instance is used for each request under that account, otherwise it will see you as logged out.

至于解析结果页面去,有HTML解析器的C吨 - 刚刚的谷歌。我会说的唯一事情就是不要试图自己编写一个HTML解析器。这是出了名的棘手,因为很多网站并不产生良好的(甚至工作)的HTML。

As for parsing the resulting pages go, there are tonnes of HTML parsers for c - just google. The only thing I will say is do not try to write an HTML parser yourself. It is notoriously tricky, because a lot of sites don't produce good (or even working) HTML.

这篇关于如何使用libcurl的登录到一个安全的网站,并在登录后的HTML得到的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆