从自动化网站点击按钮日常csv文件下载 [英] Automate daily csv file download from website button click

查看:689
本文介绍了从自动化网站点击按钮日常csv文件下载的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我想自动访问一个网站,点击一个按钮,并将文件保存的过程。下载本网站上的文件的唯一方法是点击一个按钮。使用url不能定位到该文件。

I would like to automate the process of visiting a website, clicking a button, and saving the file. The only way to download the file on this site is to click a button. You can't navigate to the file using a url.

我一直在试图用phantomjs和casperjs自动执行此过程,但都没有成功。

I have been trying to use phantomjs and casperjs to automate this process, but haven't had any success.

我最近尝试在这里使用布兰登的解决方案
<一href=\"http://stackoverflow.com/questions/11531448/grab-the-resource-contents-in-casperjs-or-phantomjs#answer-14717026\">Grab在CasperJS资源内容或PhantomJS

I recently tried to use brandon's solution here Grab the resource contents in CasperJS or PhantomJS

下面是我的code为

var fs = require('fs');
var cache = require('./cache');
var mimetype = require('./mimetype');
var casper = require('casper').create();

casper.start('http://www.example.com/page_with_download_button', function() {

});

casper.then(function() {    
     this.click('#download_button');
 });

 casper.on('resource.received', function (resource) {
     "use strict";
    for(i=0;i < resource.headers.length; i++){
        if(resource.headers[i]["name"] == "Content-Type" && resource.headers[i]["value"] == "text/csv; charset-UTF-8;"){
            cache.includeResource(resource);
        }
    }
 });

 casper.on('load.finished', function(status) {
    for(i=0; i< cache.cachedResources.length; i++){
        var file = cache.cachedResources[i].cacheFileNoPath;
        var ext = mimetype.ext[cache.cachedResources[index].mimetype];
        var finalFile = file.replace("."+cache.cacheExtension,"."+ext);
        fs.write('downloads/'+finalFile,cache.cachedResources[i].getContents(),'b');
    }
});

casper.run();

我认为这个问题可以通过我的cachePath正在cache.js不正确造成的。

I think the problem could be caused by my cachePath being incorrect in cache.js

exports.cachePath = 'C:/Users/username/AppData/Local/Ofi Labs/PhantomJS';

我应该使用在ADITION东西反斜线定义路径?

Should I be using something in adition to the backslashes to define the path?

当我尝试

 casperjs --disk-cache=true export_script.js

没有被下载。一点点的调试后,我发现,cache.cachedResources总是空空的。

Nothing is downloaded. After a little debugging I have found that cache.cachedResources is always empty.

我也将开放给phantomjs / casperjs之外的解决方案。

I would also be open to solutions outside of phantomjs/casperjs.

更新

我不再试图用CasperJS / PhantomJS做到这一点。
我使用的是Chrome扩展程序的 Tampermonkey 按dandavis建议。
Tampermonkey是非常容易弄清楚。
我安装Tampermonkey,导航到该页面的下载链接,然后在点击tampermonkey新脚本,并添加我的javascript code。

I am not longer trying to accomplish this with CasperJS/PhantomJS. I am using the chrome extension Tampermonkey suggested by dandavis. Tampermonkey was extremely easy to figure out. I installed Tampermonkey, navigated to the page with the download link, and then clicked New Script under tampermonkey and added my javascript code.

document.getElementById("download_button").click();

现在我每次浏览网页在我的浏览器时,该文件被下载。然后,我创建了一个批处理脚本,看起来像这样

Now every time I navigate to the page in my browser, the file is downloaded. I then created a batch script that looks like this

set date=%DATE:~10,4%_%DATE:~4,2%_%DATE:~7,2%
chrome "http://www.example.com/page-with-dl-button"
timeout 10
move "C:\Users\user\Downloads\export.csv" "C:\path\to\dir\export_%date%.csv"

我设置批处理脚本,以便在夜间使用Windows任务调度器中运行。

I set that batch script to run nightly using the windows task scheduler.

成功!

推荐答案

您按钮最有可能发出一个POST请求到服务器。
为了跟踪它:

Your button most likely issues a POST request to the server. In order to track it:


  1. 开启网络 Chrome开发者工具选项卡

  2. 导航到该页面并点击按钮。

  3. 的通知要求而导致文件下载。右键点击它,并复制为卷曲

  4. 运行复制卷曲

一旦你有卷曲工作,就可以计划使用cron或任务计划程序下载取决于操作系统所使用。

Once you have cURL working you can schedule downloads using cron or Task Scheduler depending on operation system you are using.

这篇关于从自动化网站点击按钮日常csv文件下载的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆