亚马逊卖家平台登录刮板PhantomJS + CasperJS [英] Amazon Seller Central Login Scrape PhantomJS + CasperJS

查看:92
本文介绍了亚马逊卖家平台登录刮板PhantomJS + CasperJS的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

首先,我想说我们只抓取自己的帐户,因为我的公司需要我们自己的仪表板中的数据,而这些数据是无法从MWS API中获得的.我非常熟悉那些API.

I want to start off by saying that we only scrape our own account, because my company needs data from our own dashboard that we can't get from the MWS APIs. I am very familiar with those APIs.

我已经有登录/抓取脚本多年了.但是最近,亚马逊开始提供验证码.我以前的抓取方式是从PHP发出cURL请求以模仿浏览器.

I've had login/scraping scripts for years. But recently Amazon started offering up captchas. My old way of scraping was from PHP making cURL requests to mimic the browser.

我的新方法是使用PhantomJS和CasperJS来达到相同的效果.一天一切正常,但我又收到了验证码.

My new approach is using PhantomJS and CasperJS to achieve the same effect. Everything was working fine for a day, but I'm getting captcha again.

现在,我从内部消息中偶然得知亚马逊没有进行任何刮擦检测.但是,他们确实会进行黑客入侵/DDOS攻击检测.因此,我认为有关casperJS代码的某些事情已被标记为攻击.

Now, I happen to know from internal sources that Amazon isn't doing any scrape detection. They do however do hacking / DDOS attack detection. So I think something about this casperJS code is getting flagged as an attack.

我不认为我经常调用该脚本.而且我更改了请求来自的IP地址.

I don't think I'm calling the script too often. And I've changed my IP address that the requests are coming from.

这是一些casperJS代码

Here is some casperJS code

var fs = require('fs');
var casper = require('casper').create({
    pageSettings: {
        loadImages: false,
        loadPlugins: false,
        userAgent: 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_10_5) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/46.0.2490.86 Safari/537.36'
    }
});

// use any cookies
var cookieFilename = "cookies/_cookies.txt";
var data = fs.read(cookieFilename);
if(data) {
    phantom.cookies = JSON.parse(data);
}

//First step is to open Amazon
casper.start("https://sellercentral.amazon.com/gp/homepage.html", function() {
    console.log("Amazon website opened");
});

casper.wait(1000, function() {
    if(this.exists("form[name=signinWidget]")) {
        console.log("need to login");
        //Now we have to populate username and password, and submit the form
        casper.wait(1000, function(){
            console.log("Login using username and password");
            this.evaluate(function(){
                document.getElementById("username").value="*****";
                document.getElementById("password").value="*****";
                document.querySelector("form[name=signinWidget]").submit();
            });
        });
        // write the cookies
        casper.wait(1000, function() {
            var cookies = JSON.stringify(phantom.cookies);
            fs.write(cookieFilename, cookies, 644);
        })
    } else {
        console.log("already logged in");
    }
});


//Wait to be redirected to the Home page, and then make a screenshot
casper.wait(1000, function(){
    console.log("is login found?");
    console.log(this.exists("form[name=signinWidget]"));
    this.echo(this.getPageContent());
});

casper.run();

最后一行的结果只是带有验证码的登录页面.是什么赋予了?这应该是普通的浏览器.当我在计算机上使用相同的登录名时,根本没有任何问题.

The result of that last line is just a login page with captcha. What gives? This should be a normal browser. When I use the same login on my computer, I get no issues at all.

我还尝试了几种不同的用户代理字符串.有时会暂时更改这些作品.

I've also tried several different user agent strings. Sometimes changing those works temporarily.

此外,当我在本地加载所有这些文件时,它工作正常.但是在linux服务器上,它是验证码.请注意,我已经多次更改了远程linux服务器上的IP.它仍然是验证码.

Also, when I load all this locally, it works fine. But on the linux server it get's the captcha. Note that I've changed the IP on the remote linux server many times. It still get's the captcha.

推荐答案

在抓取/自动化过程中经常会发生这种情况,因此错误的原因不一定是脚本编写错误,而是上下文和底层基础结构.

As it often happens with scraping/automation the reason for errors is not necessarily incorrectly written script, but also the context, underlying infrastructure.

在这种情况下,我们确定(在注释中)该脚本仅在从特定服务器运行时才受到验证码的挑战,该服务器的IP地址似乎已放置在不受信任的列表中.

In this case we determined (in comments) that the script was challenged with captcha only when run from a particular server, IP-address of which seems to have been put in an untrusted list.

这篇关于亚马逊卖家平台登录刮板PhantomJS + CasperJS的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆