报废亚马逊所有交易PHP的卷曲? [英] Scrap amazon all deals php curl?

查看:671
本文介绍了报废亚马逊所有交易PHP的卷曲?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我要放弃亚马逊所有交易

http://www.amazon.com/gp/goldbox/所有的交易/ REF = sv_gb_1

所以我使用的卷曲PHP

  $请求='http://www.amazon.com/gp/goldbox/all-deals/ref=sv_gb_1';
        $ CH = curl_init();
        curl_setopt($ CH,CURLOPT_URL,$请求);
        curl_setopt($ CH,CURLOPT_HEADER,假);
        curl_setopt($ CH,CURLOPT_RETURNTRANSFER,真正的);
        curl_setopt($ CH,CURLOPT_SSL_VERIFYPEER,真正的);
        curl_setopt($沟道,CURLOPT_FOLLOWLOCATION,1);
        curl_setopt($沟道,CURLOPT_TIMEOUT,80);
        $ file_source = curl_exec($ CH);
        的print_r($ file_source);
        出口;
 

报废完成,但响应的页面内容股利空。内容全部来自于亚马逊的动态Ajax请求。我怎么能放弃使用PHP和卷曲的所有交易产品

我的回答像 链接

更新code

  $请求='http://www.amazon.com/gp/goldbox/all-deals/ref=sv_gb_1';

        $头[] =接受:text / html的,是application / xhtml + xml的,应用程序/ XML; Q = 0.9,* / *; Q = 0.8;
        / * $头[] =接受语言:EN-US,EN; Q = 0.5; * /
    / * $头[] =接受编码:gzip,紧缩; * /
        $头[] ='曲奇: x-wl-uid=1vlKm5hBxhHPg37UgkrAPYZZaV0wv+T5knGezWJq0AIEWI30hJYp0XouddMIZeemj1LKAi9fDQq7aoFN+mbvlVYPTBQVLFdzs0aeTGWtiCY0Ay63L0ezPfZRKXQHC
/ Wum4ywRviFW9es =;会话ID时间=2082787201升;会话ID = 192-9168386-7231424; ubid,主要= 187-6710460-8617661
; session-token="+SFC4vDx/BvcD8D1Mdgeo2jtnTD0qPHF5j2nWNwbFGcRyW7/o4LBOmBHJosU5W0SgoAd6lhi0NZWg/6o5WE6o45k
+VCT5a5dgj0tltSEkBT80oWT0CDk+jCDEEhIcxnCe6aqkUn6soFiMJHIsMWujo4qyA6A70PC1xKGKdIFMUm3H0DGSdIMqITs4Mjb1
/ 1vY6GxnPeh5ncasxl + tUN2dHVwwJbj1ZrmyJdDxSDd8 / O =; __utma = 194891197.2101747155.1434117141.1434356635.1434362529
0.4; __utmz=194891197.1434362529.4.4.utmccn=(referral)|utmcsr=stackoverflow.com|utmcct=/questions/11589556
/检索-AN-亚马逊商店,列表中的副产品 - 使用 -  PHP | utmcmd =转诊;的X主=Xi0312Ip8BrjoFoj6Zp9OLxDcU6kCvlm4DExlT5yNgHa2b3htenxvUsF2TZR3
?FN; s_pers =%20s_vnum%3D1866356399079%2526vn%253D2%7C1866356399079%3B%20s_invisit%3Dtrue%7C1434364356330
%3B%20s_nr%3D1434362556331重复%7C1442138556331%3B; CSM-命中= B-1RHERWP84F8S70KRQ903 | 1434453087266; preferred-GEO
=国家;用户preF = O9NYa0FpfOIAcRMnkQf7WL3LyhrjCsMBKgKfVxT4zK8uOTF5KjzPAwmz0DuVnfXhdkinEE1BEMgPn09eHwavl
+Hwl1BOSvjp1ewiG1iCXa0R77FsPOGbpq06MWB0MC7Wwff4gehUEAle5IfyFQqKGh1XvJ4YiMFsR2mwmyzzVJTo0WPGZzvvpCVLFmx22cRVwEi4sX8y
+IfEKu76B4p1GHPdZVo1HIwLooo8CT7lboNUi4Hhn6mhtyGCNEDLvWD8NII48Vd9EkcBjUpiSeNroRjYO9yNkj8SI3xJVI0befNipOfxAzPSnuQqeBpqm99bWArk9ZZl
+EM5QKzoPNJSF0FqVnnYavt4G6F/PHedaJVl8pU0A6N9lBjK6YZRFflyaoEYPtUW+nqK0xqO+YusAMAlhHBuW33KMdtt3i6oufQ4yTDqIgAiQ1ZTXcsb2tcu
; s_dslv = 1434370132739; LC-主要= EN_US; AWS-目标访问者-ID = 1434357190046-572838.22_02; AWS-目标数据
=%7B%22support%22%3A%221%22%7D; s_fid = 7BB6DD9CE8128EC3-2A07290402DD6AF6; s_vn = 1465893191447%26vn%3D1
; s_nr = 1434370132733  - 新; s_vnum = 1866370132735%26vn%3D1;皮肤= noskin; B2B,主要= 0;
        $头[] =连接:保持活动;
        $ reffer ='http://www.amazon.com/gp/goldbox/all-deals/ref=sv_gb_1';
        $ CH = curl_init();
        curl_setopt($ CH,CURLOPT_URL,$请求);
        curl_setopt($ CH,CURLOPT_USERAGENT,Mozilla的/ 5.0(Windows NT的5.1; RV:38.0)的Gecko / 20100101 Firefox的/ 38.0');
        curl_setopt($ CH,CURLOPT_HTTPHEADER,$头);
        curl_setopt($ CH,CURLOPT_RETURNTRANSFER,真正的);
        curl_setopt($沟道,CURLOPT_FOLLOWLOCATION,1);
        curl_setopt($ CH,CURLOPT_REFERER,$ reffer);
        curl_setopt($沟道,CURLOPT_TIMEOUT,80);
        curl_setopt($沟道,CURLOPT_MAXREDIRS,10);
        $ file_source = curl_exec($ CH);

        的print_r($ file_source);
 

解决方案

根据我的快速研制,你可以查询由亚马逊要求交易XHRs。

  

这样的动态网站获取数据直通阿贾克斯JSON调用。有人可能试图找出从数据的动态下载,(使用的开发者,工具或网络嗅探器),然后查询这些URL数据。

请参见拍摄。 但是,如果你用的查询他们的PHP卷曲的你应该使用/模拟特定的请求头(包括cookie)的HTTP头:

更新

根据您的新袅袅的请求......

  1. 亚马逊页面(它的JS逻辑),使XHR到其服务器的每个产品。 XHRs看起来像这样: http://www.amazon.com/xa/dealcontent/v2/GetDealMetadata?nocache=1434445645152 不可以 http://www.amazon.com/gp/goldbox/all-deals/ref=sv_gb_1 这是唯一的引用者。

  2. 有关产品项目的请求是发表,不是GET。

  3. 您可能得到的cookie,从您的浏览器并将其插入到PHP的卷曲头。错误。这些饼干都是浏览器会议,不涉及到PHP服务器,将请求XHRs的会话。因此,对于这种使用饼干罐,请参见帖子
  4. 在这篇文章的负载是一个对象,要形成与已知的结构。 表单数据: <$c$c>{"requestMetadata":{"marketplaceID":"ATVPDKIKX0DER","sessionID":"175-4567874-0146849","clientID":"goldbox"},"widgetContext":{"pageType":"GoldBox","subPageType":"AllDeals","deviceType":"pc","refRID":"1VFVJBKEYZT3DGWSANXQ","widgetID":"1969939662","slotName":"center-6"},"page":1,"dealsPerPage":8,"itemResponseSize":"NONE","queryProfile":{"featuredOnly":false,"dealTypes":["LIGHTNING_DEAL","BEST_DEAL"],"includedCategories":["283155","599858","154606011"],"excludedExtendedFilters":{"MARKETING_ID":["restrictedcontent"]}}}

查看开发工具的图片:

  • 迈克尔 - sqlbot 的所提到的,你尝试这样做,违反了使用亚马逊的方面的行动。但对于刮术的的缘故,我还是更新我的答案。
  • I want to scrap amazon all deals page

    http://www.amazon.com/gp/goldbox/all-deals/ref=sv_gb_1

    So i am using curl php

    $request = 'http://www.amazon.com/gp/goldbox/all-deals/ref=sv_gb_1';
            $ch = curl_init();
            curl_setopt($ch,CURLOPT_URL,$request);
            curl_setopt($ch, CURLOPT_HEADER, false);
            curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);
            curl_setopt($ch, CURLOPT_SSL_VERIFYPEER, true);
            curl_setopt($ch, CURLOPT_FOLLOWLOCATION, 1);
            curl_setopt($ch, CURLOPT_TIMEOUT, 80);
            $file_source = curl_exec($ch);
            print_r($file_source);
            exit;
    

    scrapping completed but response page content div empty. contents all came from dynamic ajax requests in amazon. how can i scrap the all deal products using php and curl

    My response image link

    Update Code

     $request = 'http://www.amazon.com/gp/goldbox/all-deals/ref=sv_gb_1';
    
            $header[] = "Accept: text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8";
            /*$header[] = "Accept-Language: en-US,en;q=0.5";*/
        /*  $header[] = "Accept-Encoding: gzip, deflate";*/
            $header[] = 'Cookie: x-wl-uid=1vlKm5hBxhHPg37UgkrAPYZZaV0wv+T5knGezWJq0AIEWI30hJYp0XouddMIZeemj1LKAi9fDQq7aoFN+mbvlVYPTBQVLFdzs0aeTGWtiCY0Ay63L0ezPfZRKXQHC
    /Wum4ywRviFW9es=; session-id-time=2082787201l; session-id=192-9168386-7231424; ubid-main=187-6710460-8617661
    ; session-token="+SFC4vDx/BvcD8D1Mdgeo2jtnTD0qPHF5j2nWNwbFGcRyW7/o4LBOmBHJosU5W0SgoAd6lhi0NZWg/6o5WE6o45k
    +VCT5a5dgj0tltSEkBT80oWT0CDk+jCDEEhIcxnCe6aqkUn6soFiMJHIsMWujo4qyA6A70PC1xKGKdIFMUm3H0DGSdIMqITs4Mjb1
    /1vY6GxnPeh5ncasxl+tUN2dHVwwJbj1ZrmyJdDxSDd8/o="; __utma=194891197.2101747155.1434117141.1434356635.1434362529
    .4; __utmz=194891197.1434362529.4.4.utmccn=(referral)|utmcsr=stackoverflow.com|utmcct=/questions/11589556
    /retrieving-an-amazon-stores-list-of-products-using-php|utmcmd=referral; x-main="Xi0312Ip8BrjoFoj6Zp9OLxDcU6kCvlm4DExlT5yNgHa2b3htenxvUsF2TZR3
    ?Fn"; s_pers=%20s_vnum%3D1866356399079%2526vn%253D2%7C1866356399079%3B%20s_invisit%3Dtrue%7C1434364356330
    %3B%20s_nr%3D1434362556331-Repeat%7C1442138556331%3B; csm-hit=b-1RHERWP84F8S70KRQ903|1434453087266; preferred-geo
    =national; UserPref=O9NYa0FpfOIAcRMnkQf7WL3LyhrjCsMBKgKfVxT4zK8uOTF5KjzPAwmz0DuVnfXhdkinEE1BEMgPn09eHwavl
    +Hwl1BOSvjp1ewiG1iCXa0R77FsPOGbpq06MWB0MC7Wwff4gehUEAle5IfyFQqKGh1XvJ4YiMFsR2mwmyzzVJTo0WPGZzvvpCVLFmx22cRVwEi4sX8y
    +IfEKu76B4p1GHPdZVo1HIwLooo8CT7lboNUi4Hhn6mhtyGCNEDLvWD8NII48Vd9EkcBjUpiSeNroRjYO9yNkj8SI3xJVI0befNipOfxAzPSnuQqeBpqm99bWArk9ZZl
    +EM5QKzoPNJSF0FqVnnYavt4G6F/PHedaJVl8pU0A6N9lBjK6YZRFflyaoEYPtUW+nqK0xqO+YusAMAlhHBuW33KMdtt3i6oufQ4yTDqIgAiQ1ZTXcsb2tcu
    ; s_dslv=1434370132739; lc-main=en_US; aws-target-visitor-id=1434357190046-572838.22_02; aws-target-data
    =%7B%22support%22%3A%221%22%7D; s_fid=7BB6DD9CE8128EC3-2A07290402DD6AF6; s_vn=1465893191447%26vn%3D1
    ; s_nr=1434370132733-New; s_vnum=1866370132735%26vn%3D1; skin=noskin; b2b-main=0';
            $header[] = "Connection: keep-alive";
            $reffer = 'http://www.amazon.com/gp/goldbox/all-deals/ref=sv_gb_1';
            $ch = curl_init();
            curl_setopt($ch,CURLOPT_URL,$request);
            curl_setopt($ch, CURLOPT_USERAGENT, 'Mozilla/5.0 (Windows NT 5.1; rv:38.0) Gecko/20100101 Firefox/38.0');
            curl_setopt($ch, CURLOPT_HTTPHEADER, $header); 
            curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);
            curl_setopt($ch, CURLOPT_FOLLOWLOCATION, 1);
            curl_setopt($ch, CURLOPT_REFERER, $reffer);
            curl_setopt($ch, CURLOPT_TIMEOUT, 80);
            curl_setopt($ch, CURLOPT_MAXREDIRS, 10);        
            $file_source = curl_exec($ch);
    
            print_r($file_source);
    

    解决方案

    Based on my quick reseach you might query XHRs made by amazon to request deals.

    Such dynamic websites get their data thru Ajax JSON calls. One might try to find out where from the data is dynamically downloaded, (using dev. tools or web sniffer), and then query those urls for data.

    See the shot. But if you to query them with php Curl you should use/imitate the http headers of that particular request headers (including cookies):

    Update

    Based on your new curl request...

    1. The amazon page (its js logic) makes XHR to its server for each product item. XHRs look like this: http://www.amazon.com/xa/dealcontent/v2/GetDealMetadata?nocache=1434445645152 not http://www.amazon.com/gp/goldbox/all-deals/ref=sv_gb_1 which is only the referer.

    2. A request for product item is POST, not GET.

    3. You probably got cookie from your browser and inserted it into the php curl header. Wrong. These cookie are of your browser session, not related to a session of your php server that will requests XHRs. Therefore for this use cookie jar, see the post.
    4. The POST's load is an object, should be formed with known structure. Form data: {"requestMetadata":{"marketplaceID":"ATVPDKIKX0DER","sessionID":"175-4567874-0146849","clientID":"goldbox"},"widgetContext":{"pageType":"GoldBox","subPageType":"AllDeals","deviceType":"pc","refRID":"1VFVJBKEYZT3DGWSANXQ","widgetID":"1969939662","slotName":"center-6"},"page":1,"dealsPerPage":8,"itemResponseSize":"NONE","queryProfile":{"featuredOnly":false,"dealTypes":["LIGHTNING_DEAL","BEST_DEAL"],"includedCategories":["283155","599858","154606011"],"excludedExtendedFilters":{"MARKETING_ID":["restrictedcontent"]}}}

    See the developer tools picture:

    1. As Michael - sqlbot mentioned, you try to do an action that violates Amazon's terms of Use. But for the scrape technique's sake I still update my answer.

    这篇关于报废亚马逊所有交易PHP的卷曲?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

    查看全文
    登录 关闭
    扫码关注1秒登录
    发送“验证码”获取 | 15天全站免登陆