无法下载R中的文件-状态503 [英] Cannot download file in R - status 503

查看:258
本文介绍了无法下载R中的文件-状态503的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试下载文件:

> URL <- "https://www.bitmarket.pl/graphs/BTCPLN/90m.json"
> download.file(URL, destfile = "res.json", method = "curl")
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100  4676    0  4676    0     0  56930          0 --:--:-- --:--:-- --:--:-- 57024

,但它返回503状态。整个输出:

but it returns 503 status. Whole output:

<!DOCTYPE HTML>
<html lang="en-US">
<head>
  <meta charset="UTF-8" />
  <meta http-equiv="Content-Type" content="text/html; charset=UTF-8" />
  <meta http-equiv="X-UA-Compatible" content="IE=Edge,chrome=1" />
  <meta name="robots" content="noindex, nofollow" />
  <meta name="viewport" content="width=device-width, initial-scale=1, maximum-scale=1" />
  <title>Just a moment...</title>
  <style type="text/css">
    html, body {width: 100%; height: 100%; margin: 0; padding: 0;}
    body {background-color: #ffffff; font-family: Helvetica, Arial, sans-serif; font-size: 100%;}
    h1 {font-size: 1.5em; color: #404040; text-align: center;}
    p {font-size: 1em; color: #404040; text-align: center; margin: 10px 0 0 0;}
    #spinner {margin: 0 auto 30px auto; display: block;}
    .attribution {margin-top: 20px;}
    @-webkit-keyframes bubbles { 33%: { -webkit-transform: translateY(10px); transform: translateY(10px); } 66% { -webkit-transform: translateY(-10px); transform: translateY(-10px); } 100% { -webkit-transform: translateY(0); transform: translateY(0); } }
    @keyframes bubbles { 33%: { -webkit-transform: translateY(10px); transform: translateY(10px); } 66% { -webkit-transform: translateY(-10px); transform: translateY(-10px); } 100% { -webkit-transform: translateY(0); transform: translateY(0); } }
    .bubbles { background-color: #404040; width:15px; height: 15px; margin:2px; border-radius:100%; -webkit-animation:bubbles 0.6s 0.07s infinite ease-in-out; animation:bubbles 0.6s 0.07s infinite ease-in-out; -webkit-animation-fill-mode:both; animation-fill-mode:both; display:inline-block; }
  </style>

    <script type="text/javascript">
  //<![CDATA[
  (function(){
    var a = function() {try{return !!window.addEventListener} catch(e) {return !1} },
    b = function(b, c) {a() ? document.addEventListener("DOMContentLoaded", b, c) : document.attachEvent("onreadystatechange", b)};
    b(function(){
      var a = document.getElementById('cf-content');a.style.display = 'block';
      setTimeout(function(){
        var s,t,o,p,b,r,e,a,k,i,n,g,f, eoQNdpG={"GwwAAtfX":+((+!![]+[])+(+!![]))};
        t = document.createElement('div');
        t.innerHTML="<a href='/'>x</a>";
        t = t.firstChild.href;r = t.match(/https?:\/\//)[0];
        t = t.substr(r.length); t = t.substr(0,t.length-1);
        a = document.getElementById('jschl-answer');
        f = document.getElementById('challenge-form');
        ;eoQNdpG.GwwAAtfX+=+((!+[]+!![]+!![]+!![]+[])+(!+[]+!![]+!![]));eoQNdpG.GwwAAtfX*=+((!+[]+!![]+!![]+!![]+[])+(!+[]+!![]+!![]+!![]+!![]+!![]+!![]+!![]+!![]));eoQNdpG.GwwAAtfX-=+((!+[]+!![]+!![]+[])+(!+[]+!![]+!![]+!![]+!![]+!![]+!![]+!![]));eoQNdpG.GwwAAtfX-=+((+!![]+[])+(+[]));eoQNdpG.GwwAAtfX-=+((+!![]+[])+(!+[]+!![]+!![]+!![]+!![]+!![]+!![]+!![]+!![]));eoQNdpG.GwwAAtfX+=+((!+[]+!![]+[])+(!+[]+!![]+!![]+!![]+!![]+!![]+!![]+!![]));eoQNdpG.GwwAAtfX+=+((!+[]+!![]+!![]+!![]+[])+(!+[]+!![]+!![]+!![]+!![]+!![]+!![]+!![]+!![]));eoQNdpG.GwwAAtfX*=+((!+[]+!![]+!![]+[])+(!+[]+!![]+!![]));a.value = parseInt(eoQNdpG.GwwAAtfX, 10) + t.length; '; 121'
        f.action += location.hash;
        f.submit();
      }, 4000);
    }, false);
  })();
  //]]>
</script>


</head>
<body>
  <table width="100%" height="100%" cellpadding="20">
    <tr>
      <td align="center" valign="middle">
          <div class="cf-browser-verification cf-im-under-attack">
  <noscript><h1 data-translate="turn_on_js" style="color:#bd2426;">Please turn JavaScript on and reload the page.</h1></noscript>
  <div id="cf-content" style="display:none">

    <div>
      <div class="bubbles"></div>
      <div class="bubbles"></div>
      <div class="bubbles"></div>
    </div>
    <h1><span data-translate="checking_browser">Checking your browser before accessing</span> bitmarket.pl.</h1>

    <p data-translate="process_is_automatic">This process is automatic. Your browser will redirect to your requested content shortly.</p>
    <p data-translate="allow_5_secs">Please allow up to 5 seconds&hellip;</p>
  </div>

  <form id="challenge-form" action="/cdn-cgi/l/chk_jschl" method="get">
    <input type="hidden" name="jschl_vc" value="51a7cb71596dbf54fdd307c1e65de941"/>
    <input type="hidden" name="pass" value="1512824604.589-Uwtm9TfzWe"/>
    <input type="hidden" id="jschl-answer" name="jschl_answer"/>
  </form>
</div>


          <div class="attribution">
            <a href="https://www.cloudflare.com/5xx-error-landing?utm_source=iuam" target="_blank" style="font-size: 12px;">DDoS protection by Cloudflare</a>
            <br>
            Ray ID: 3ca829f9aed06afb
          </div>
      </td>

    </tr>
  </table>
</body>
</html>

wget 也不起作用:

--2017-12-09 14:01:29--  https://www.bitmarket.pl/graphs/BTCPLN/90m.json
Resolving www.bitmarket.pl... 104.20.67.184, 104.20.68.184
Connecting to www.bitmarket.pl|104.20.67.184|:443... connected.
HTTP request sent, awaiting response... 503 Service Temporarily Unavailable
2017-12-09 14:01:29 ERROR 503: Service Temporarily Unavailable.

但是当您转到此链接时: https://www.bitmarket.pl/graphs/BTCPLN/90m.json 您的网络浏览器将返回正确的json文件。知道为什么它不起作用吗?

But when you go to this link: https://www.bitmarket.pl/graphs/BTCPLN/90m.json your web browser will return correct json file. Any ideas why it does not work?

推荐答案

那是因为该页面正在使用DDoS保护服务。在第一次加载时,页面本身会在5秒钟后执行JavaScript启动的重定向,以获取最终内容,因此该过程会因无法解释JavaScript的wget / curl之类的工具而失败。如果您认为这样做是合理的,那么一种选择是使用例如 phantomjs 并提供自定义脚本(例如 save.js ):

That's because the page is using a DDoS protection service. On the first load, the page itself does a JavaScript-initiated redirect after 5 seconds to fetch the final content so the process fails with tools like wget/curl which do not interpret JavaScript. If you think that it is justifiable to do so, then one option would be to use for example phantomjs and supply a custom script (say, save.js):

var system = require('system');
var page = require('webpage').create();

page.userAgent = 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_11_6) AppleWebKit/604.3.5 (KHTML, like Gecko) Version/11.0.1 Safari/604.3.5';

page.open(system.args[1], function(){
    setTimeout(function(){
        console.log(page.evaluate(function(){
            //gets the JSON from the first <pre> element rendered on the page
            return document.getElementsByTagName('pre')[0].textContent;
        }));
        phantom.exit();
    }, 6000); //waits 6 seconds for the page to reload
});

然后使用它代替 wget 作为:

phantomjs save.js https://www.bitmarket.pl/graphs/BTCPLN/90m.json

这篇关于无法下载R中的文件-状态503的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆