如何避免在Puppeteer和Phantomjs上被检测为bot? [英] How to avoid being detected as bot on Puppeteer and Phantomjs?
问题描述
Puppeteer和PhantomJS相似.我俩都遇到了这个问题,代码也很相似.
Puppeteer and PhantomJS are similar. The issue I'm having is happening for both, and the code is also similar.
我想从网站上获取一些信息,该网站需要进行身份验证才能查看这些信息.我什至无法访问主页,因为它被检测为可疑活动",例如SS: https://i.imgur.com/p69OIjO.png
I'd like to catch some informations from a website, which needs authentication for viewing those informations. I can't even access home page because it's detected like a "suspicious activity", like the SS: https://i.imgur.com/p69OIjO.png
我发现,当我使用名为 Cookie 的标头在Postman上进行测试并且该cookie的值在浏览器中被捕获时,该问题不会发生,但是此cookie会在一段时间后过期.因此,我猜Puppeteer/PhantomJS都没有捕获cookie,因为该网站拒绝了无头的浏览器访问.
I discovered that the problem doesn't happen when I tested on Postman using a header named Cookie and the value of it's cookie caught on browser, but this cookie expires after some time. So I guess Puppeteer/PhantomJS both are not catching cookies, because this site is denying the headless browser access.
我该怎么做才能绕过这个?
What could I do for bypass this?
// Simple Javascript example
var page = require('webpage').create();
var url = 'https://www.expertflyer.com';
page.open(url, function (status) {
if( status === "success") {
page.render("home.png");
phantom.exit();
}
});
推荐答案
通常可以帮助您解决的问题:
Things that can help in general :
- 标题应类似于常见的浏览器,包括:
- 用户代理:使用最新版本(请参见 https://developers.whatismybrowser.com/useragents/explore/),或者更好,如果您发出多个请求,请使用随机最近的请求(请参见 en,en-US; q = 0,5 "(适合您的语言)
- 接受:一个标准的标准是" text/html,application/xhtml + xml,application/xml; q = 0.9,/; q = 0.8 "
- Headers should be similar to common browsers, including :
- User-Agent : use a recent one (see https://developers.whatismybrowser.com/useragents/explore/), or better, use a random recent one if you make multiple requests (see https://github.com/skratchdot/random-useragent)
- Accept-Language : something like "en,en-US;q=0,5" (adapt for your language)
- Accept: a standard one would be like "text/html,application/xhtml+xml,application/xml;q=0.9,/;q=0.8"
- 检查是否在客户端JavaScript页面上下文中设置了" navigator.plugins "和" navigator.language "
- Check that "navigator.plugins" and "navigator.language" are set in the client javascript page context
这篇关于如何避免在Puppeteer和Phantomjs上被检测为bot?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!