如何避免在Puppeteer和Phantomjs上被检测为bot? [英] How to avoid being detected as bot on Puppeteer and Phantomjs?

查看:801
本文介绍了如何避免在Puppeteer和Phantomjs上被检测为bot?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

Puppeteer和PhantomJS相似.我俩都遇到了这个问题,代码也很相似.

Puppeteer and PhantomJS are similar. The issue I'm having is happening for both, and the code is also similar.

我想从网站上获取一些信息,该网站需要进行身份验证才能查看这些信息.我什至无法访问主页,因为它被检测为可疑活动",例如SS: https://i.imgur.com/p69OIjO.png

I'd like to catch some informations from a website, which needs authentication for viewing those informations. I can't even access home page because it's detected like a "suspicious activity", like the SS: https://i.imgur.com/p69OIjO.png

我发现,当我使用名为 Cookie 的标头在Postman上进行测试并且该cookie的值在浏览器中被捕获时,该问题不会发生,但是此cookie会在一段时间后过期.因此,我猜Puppeteer/PhantomJS都没有捕获cookie,因为该网站拒绝了无头的浏览器访问.

I discovered that the problem doesn't happen when I tested on Postman using a header named Cookie and the value of it's cookie caught on browser, but this cookie expires after some time. So I guess Puppeteer/PhantomJS both are not catching cookies, because this site is denying the headless browser access.

我该怎么做才能绕过这个?

What could I do for bypass this?

// Simple Javascript example
var page = require('webpage').create();
var url = 'https://www.expertflyer.com';

page.open(url, function (status) {
    if( status === "success") {
        page.render("home.png");
        phantom.exit();
    }
});

推荐答案

通常可以帮助您解决的问题:

Things that can help in general :

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆