如何使用 PhantomJS 下载 csv 文件 [英] How to download a csv file using PhantomJS
问题描述
当我使用普通浏览器 (Chrome) 浏览网站 A 时,当我点击网站 A 上的链接时,Chrome 会立即以 CSV 文件的形式下载报告.
When I'm browsing a website A using normal browser (Chrome) and when I click on a link on the website A, Chrome imediatelly downloads report in a form of CSV file.
当我检查服务器响应头时,我得到以下结果:
When I checked a server response headers I get the following results:
Cache-Control:private,max-age=31536000
Connection:Keep-Alive
Content-Disposition:attachment; filename="report.csv"
Content-Encoding:gzip
Content-Language:de-DE
Content-Type:text/csv; charset=UTF-8
Date:Wed, 22 Jul 2015 12:44:30 GMT
Expires:Thu, 21 Jul 2016 12:44:30 GMT
Keep-Alive:timeout=15, max=75
Pragma:cache
Server:Apache
Transfer-Encoding:chunked
Vary:Accept-Encoding
现在,我想使用 PhantomJS 下载并解析这个文件.我设置了 page
onResourceReceived
监听器来查看 Phantom 是否会接收/下载文件.
Now, I want to download and parse this file using PhantomJS. I set page
onResourceReceived
listener to see if Phantom will receive/download the file.
clientRequests.phantomPage.onResourceReceived = function(response) {
console.log('Response (#' + response.id + ', stage "' + response.stage + '"): ' + JSON.stringify(response));
};
当我发出 Phantom 请求下载文件(这是 page.open('URL OF THE FILE'))时,我可以在 Phantom 日志中看到该文件已下载.以下是日志:
When I make Phantom request to download a file (this is page.open('URL OF THE FILE')), I can see in Phantom log that file is downloaded. Here are logs:
"contentType": "text/csv; charset=UTF-8",
"headers": {
"name": "Date",
"value": "Wed, 22 Jul 2015 12:57:41 GMT"
},
"name": "Content-Disposition",
"value": "attachment; filename="report.csv"",
"status":200,"statusText":"OK"
我收到了文件及其内容,但如何访问文件数据?当我打印当前的 PhantomJS page
对象时,我得到了页面 A 的 HTML,但我不想要那个,我想要 CSV 文件,我需要使用 JavaScript 对其进行解析.
I received the file and its content, but how to access file data? When I print current PhantomJS page
object, I get the HTML of the page A and I don't want that, I want CSV file, which I need to parse using JavaScript.
推荐答案
经过几天的摸索,不得不说有一些解决方案:
After days and days of investigation, I have to say that there are some solutions:
- 在您的评估函数中,您可以进行 AJAX 调用来下载和编码您的文件,然后您可以将此内容返回给幻影脚本
- 您可以使用一些 GitHub 页面上提供的自定义 Phantom 库
如果您需要使用 PhantomJS 下载文件,那么请远离 PhantomJS 并使用 CasperJS.CasperJS 基于 PhantomJS,但它具有更好、更直观的语法和程序流程.
If you need to download a file using PhanotmJS, then run away from PhantomJS and use CasperJS. CasperJS is based on PhantomJS, but it has much better and intuitive syntax and program flow.
这是解释为什么 CasperJS 优于 PhantomJS"的好帖子.在这篇文章中,您可以找到有关文件下载的部分.
Here is good post explaining "Why CasperJS is better than PhantomJS". In this post you can find section about file download.
如何使用 CasperJS 下载 CSV 文件(即使服务器发送标头 Content-Disposition:attachment; filename='file.csv
)
How to download CSV file using CasperJS (this works even when server sends header Content-Disposition:attachment; filename='file.csv
)
在这里您可以找到一些可供下载的自定义 csv 文件:http://captaincoffee.com.au/dump/items.csv
Here you can find some custom csv file available for download: http://captaincoffee.com.au/dump/items.csv
为了使用 CasperJS 下载此文件,请执行以下代码:
In order to download this file using CasperJS execute the following code:
var casper = require('casper').create();
casper.start("http://captaincoffee.com.au/dump/", function() {
this.echo(this.getTitle())
});
casper.then(function(){
var url = 'http://captaincoffee.com.au/dump/csv.csv';
require('utils').dump(this.base64encode(url, 'get'));
});
casper.run();
上面的代码将下载 http://captaincoffee.com.au/dump/csv.csv
CSV 文件,并将结果打印为 base64 字符串.这样一来,您甚至不必将数据下载到文件中,您的数据就是 base64 字符串.
The code above will download http://captaincoffee.com.au/dump/csv.csv
CSV file and will print results as base64 string. So this way, you don't even have to download data to file, you have your data as base64 string.
如果你明确想要下载文件到文件系统,你可以使用CasperJS中的download
函数.
If you explicitly want to download file to file system, you can use download
function which is available in CasperJS.
这篇关于如何使用 PhantomJS 下载 csv 文件的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!