从自动化网站点击按钮日常csv文件下载 [英] Automate daily csv file download from website button click
问题描述
我想自动访问一个网站,点击一个按钮,并将文件保存的过程。下载本网站上的文件的唯一方法是点击一个按钮。使用url不能定位到该文件。
I would like to automate the process of visiting a website, clicking a button, and saving the file. The only way to download the file on this site is to click a button. You can't navigate to the file using a url.
我一直在试图用phantomjs和casperjs自动执行此过程,但都没有成功。
I have been trying to use phantomjs and casperjs to automate this process, but haven't had any success.
我最近尝试在这里使用布兰登的解决方案
<一href=\"http://stackoverflow.com/questions/11531448/grab-the-resource-contents-in-casperjs-or-phantomjs#answer-14717026\">Grab在CasperJS资源内容或PhantomJS
I recently tried to use brandon's solution here Grab the resource contents in CasperJS or PhantomJS
下面是我的code为
var fs = require('fs');
var cache = require('./cache');
var mimetype = require('./mimetype');
var casper = require('casper').create();
casper.start('http://www.example.com/page_with_download_button', function() {
});
casper.then(function() {
this.click('#download_button');
});
casper.on('resource.received', function (resource) {
"use strict";
for(i=0;i < resource.headers.length; i++){
if(resource.headers[i]["name"] == "Content-Type" && resource.headers[i]["value"] == "text/csv; charset-UTF-8;"){
cache.includeResource(resource);
}
}
});
casper.on('load.finished', function(status) {
for(i=0; i< cache.cachedResources.length; i++){
var file = cache.cachedResources[i].cacheFileNoPath;
var ext = mimetype.ext[cache.cachedResources[index].mimetype];
var finalFile = file.replace("."+cache.cacheExtension,"."+ext);
fs.write('downloads/'+finalFile,cache.cachedResources[i].getContents(),'b');
}
});
casper.run();
我认为这个问题可以通过我的cachePath正在cache.js不正确造成的。
I think the problem could be caused by my cachePath being incorrect in cache.js
exports.cachePath = 'C:/Users/username/AppData/Local/Ofi Labs/PhantomJS';
我应该使用在ADITION东西反斜线定义路径?
Should I be using something in adition to the backslashes to define the path?
当我尝试
casperjs --disk-cache=true export_script.js
没有被下载。一点点的调试后,我发现,cache.cachedResources总是空空的。
Nothing is downloaded. After a little debugging I have found that cache.cachedResources is always empty.
我也将开放给phantomjs / casperjs之外的解决方案。
I would also be open to solutions outside of phantomjs/casperjs.
更新
我不再试图用CasperJS / PhantomJS做到这一点。
我使用的是Chrome扩展程序的 Tampermonkey 按dandavis建议。
Tampermonkey是非常容易弄清楚。
我安装Tampermonkey,导航到该页面的下载链接,然后在点击tampermonkey新脚本,并添加我的javascript code。
I am not longer trying to accomplish this with CasperJS/PhantomJS. I am using the chrome extension Tampermonkey suggested by dandavis. Tampermonkey was extremely easy to figure out. I installed Tampermonkey, navigated to the page with the download link, and then clicked New Script under tampermonkey and added my javascript code.
document.getElementById("download_button").click();
现在我每次浏览网页在我的浏览器时,该文件被下载。然后,我创建了一个批处理脚本,看起来像这样
Now every time I navigate to the page in my browser, the file is downloaded. I then created a batch script that looks like this
set date=%DATE:~10,4%_%DATE:~4,2%_%DATE:~7,2%
chrome "http://www.example.com/page-with-dl-button"
timeout 10
move "C:\Users\user\Downloads\export.csv" "C:\path\to\dir\export_%date%.csv"
我设置批处理脚本,以便在夜间使用Windows任务调度器中运行。
I set that batch script to run nightly using the windows task scheduler.
成功!
推荐答案
您按钮最有可能发出一个POST请求到服务器。
为了跟踪它:
Your button most likely issues a POST request to the server. In order to track it:
- 开启网络 Chrome开发者工具选项卡
- 导航到该页面并点击按钮。
- 的通知要求而导致文件下载。右键点击它,并复制为卷曲
- 运行复制卷曲
一旦你有卷曲工作,就可以计划使用cron或任务计划程序下载取决于操作系统所使用。
Once you have cURL working you can schedule downloads using cron or Task Scheduler depending on operation system you are using.
这篇关于从自动化网站点击按钮日常csv文件下载的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!