如何使用 C++ 单击网站上的按钮 [英] How to click a button on website with C++

查看:41
本文介绍了如何使用 C++ 单击网站上的按钮的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在用 C++ 设计一个网络爬虫,但是有一个网页问我你至少 18 岁吗?"当我第一次使用 URLDownloadToFileW 获取网页时,当然我必须点击 YES.

I'm designing a web crawler with C++,but there is a web page asking me "Do you at least 18 years of age?" when I first fetch the web page by using URLDownloadToFileW,and of course I must click YES.

在javascript中,我可以使用document.getElementsByTagName('button')[0].click();来模拟按钮点击,那么有没有其他方法可以用C++解决这个问题?

In javascript,I can use document.getElementsByTagName('button')[0].click(); to simulate a button click,so is there any other way to solve such problem with C++?

推荐答案

这并不容易,但如果你想这样做,你需要多次请求.

That is not really easy to do, but if you want to do it, you need several requests.

点击(即 JavaScript 中的 document.getElementsByTagName('button')[0].click();)的作用是触发关联的点击事件.您的第一步应该是找到事件处理程序代码并查看它.例如,该事件可以向网站发送另一个 (AJAX) 请求.如果是这种情况,您也必须在爬虫中使用 C++ 执行请求.许多站点还使用 cookie 来存储用户对此类问题的回答(或者至少是用户选择了我至少 18 岁"这一事实).因此,您的抓取工具也必须接受此类 cookie,并将它们存储在请求之间.

What the click (i.e. document.getElementsByTagName('button')[0].click(); in JavaScript) does is to trigger an associated click event. Your first step should be to find the event handler code and take a look into it. The event may for example send another (AJAX) request to the website. If that is the case, you have to perform the request in C++ in your crawler, too. Many sites also use cookies to store the user's answer to such questions (or at least the fact that the user selected "I'm at least 18 years of age"). So your crawler has to accept such cookies, too, and store them between requests.

我知道这个答案相当笼统,但在不知道您正在抓取的确切网站的情况下很难给出更具体的答案.

I am aware of the fact that this answer is rather general, but it is difficult to give a more specific answer without knowing the exact website you are crawling.

替代方法:您可以使用诸如 .Selenium 允许自动化浏览器,旨在用于测试,但也可以使用它来抓取网站.优点是您还可以在浏览器中更轻松地执行诸如单击之类的操作,前提是您知道要单击的元素的 ID 或 XPath.这可能比经典"爬虫更容易做到.

Alternative approach: Instead of writing a crawler that downloads the website content directly, you might utilize frameworks like Selenium. Selenium allows to automate a browser and is intended to be used for testing, but one could also use it to crawl a website. The advantage is that you can also perfom things like clicks easier in the browser, given you know the ID or the XPath of the element you want to click. This might be easier to do than a "classical" crawler.

但是,您应该知道,许多网站都有某种保护措施,可以防止请求淹没它们.也就是说,如果您打算在短时间内向同一台服务器发出大量请求,您可能会被服务器阻止.所以尽量将请求限制在绝对最小值.

However, you should be aware that many websites have some kind of protection against flooding them with requests in place. That is, if you intent to do a lot of request to the same server in a short amount of time, you might get blocked from the server. So try to limit the requests to the absolute minimum.

这篇关于如何使用 C++ 单击网站上的按钮的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆