自动从网站按钮单击每日 csv 文件下载 [英] Automate daily csv file download from website button click

查看:22
本文介绍了自动从网站按钮单击每日 csv 文件下载的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我想自动化访问网站、单击按钮和保存文件的过程.在此站点上下载文件的唯一方法是单击按钮.您无法使用网址导航到该文件.

I would like to automate the process of visiting a website, clicking a button, and saving the file. The only way to download the file on this site is to click a button. You can't navigate to the file using a url.

我一直在尝试使用 phantomjs 和 casperjs 来自动化这个过程,但没有取得任何成功.

I have been trying to use phantomjs and casperjs to automate this process, but haven't had any success.

我最近尝试在这里使用布兰登的解决方案在 CasperJS 或 PhantomJS 中抓取资源内容

I recently tried to use brandon's solution here Grab the resource contents in CasperJS or PhantomJS

这是我的代码

var fs = require('fs');
var cache = require('./cache');
var mimetype = require('./mimetype');
var casper = require('casper').create();

casper.start('http://www.example.com/page_with_download_button', function() {

});

casper.then(function() {    
     this.click('#download_button');
 });

 casper.on('resource.received', function (resource) {
     "use strict";
    for(i=0;i < resource.headers.length; i++){
        if(resource.headers[i]["name"] == "Content-Type" && resource.headers[i]["value"] == "text/csv; charset-UTF-8;"){
            cache.includeResource(resource);
        }
    }
 });

 casper.on('load.finished', function(status) {
    for(i=0; i< cache.cachedResources.length; i++){
        var file = cache.cachedResources[i].cacheFileNoPath;
        var ext = mimetype.ext[cache.cachedResources[index].mimetype];
        var finalFile = file.replace("."+cache.cacheExtension,"."+ext);
        fs.write('downloads/'+finalFile,cache.cachedResources[i].getContents(),'b');
    }
});

casper.run();

我认为问题可能是因为我的 cachePath 在 cache.js 中不正确

I think the problem could be caused by my cachePath being incorrect in cache.js

exports.cachePath = 'C:/Users/username/AppData/Local/Ofi Labs/PhantomJS';

除了反斜杠之外,我还应该使用其他东西来定义路径吗?

Should I be using something in adition to the backslashes to define the path?

当我尝试

 casperjs --disk-cache=true export_script.js

没有下载任何东西.经过一些调试后,我发现 cache.cachedResources 始终为空.

Nothing is downloaded. After a little debugging I have found that cache.cachedResources is always empty.

我也愿意接受 phantomjs/casperjs 之外的解决方案.

I would also be open to solutions outside of phantomjs/casperjs.

更新

我不再尝试使用 CasperJS/PhantomJS 来完成此任务.我正在使用 dandavis 建议的 chrome 扩展 Tampermonkey.Tampermonkey 非常容易弄清楚.我安装了 Tampermonkey,导航到带有下载链接的页面,然后单击 tampermonkey 下的新建脚本并添加了我的 javascript 代码.

I am not longer trying to accomplish this with CasperJS/PhantomJS. I am using the chrome extension Tampermonkey suggested by dandavis. Tampermonkey was extremely easy to figure out. I installed Tampermonkey, navigated to the page with the download link, and then clicked New Script under tampermonkey and added my javascript code.

document.getElementById("download_button").click();

现在每次我在浏览器中导航到该页面时,都会下载该文件.然后我创建了一个看起来像这样的批处理脚本

Now every time I navigate to the page in my browser, the file is downloaded. I then created a batch script that looks like this

set date=%DATE:~10,4%_%DATE:~4,2%_%DATE:~7,2%
chrome "http://www.example.com/page-with-dl-button"
timeout 10
move "C:UsersuserDownloadsexport.csv" "C:path	odirexport_%date%.csv"

我将该批处理脚本设置为使用 Windows 任务调度程序每晚运行.

I set that batch script to run nightly using the windows task scheduler.

成功!

推荐答案

您的按钮很可能向服务器发出 POST 请求.为了跟踪它:

Your button most likely issues a POST request to the server. In order to track it:

  1. Chrome 开发者工具
  2. 中打开网络标签
  3. 导航到页面并点击按钮.
  4. 注意哪个请求导致文件下载.右键单击它并复制为 cURL
  5. 运行复制的 cURL

一旦您让 cURL 工作,您就可以根据您使用的操作系统使用 cron 或 Task Scheduler 安排下载.

Once you have cURL working you can schedule downloads using cron or Task Scheduler depending on operation system you are using.

这篇关于自动从网站按钮单击每日 csv 文件下载的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆