如何使用HTTP请求从.jsf页面获取数据? [英] How can I use HTTP requests to get data from a .jsf page?

查看:75
本文介绍了如何使用HTTP请求从.jsf页面获取数据?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我需要以编程方式从JSF站点获取数据.

I need to fetch data programmatically from JSF sites.

这是一个例子: https://dataminer.pjm.com/dataminerui/pages/public/lmp.jsf

要获取数据,请输入任何开始日期和结束日期,然后单击右上角的导出CSV".(它会生成大量数据,因此请选择1天范围.)

To get data, enter any Start Date and End Date and click on Export CSV on top right. (It generates a fair amount of data, so pick a 1-day range.)

在Chrome的网络"标签中,我看到以下请求标头和表单数据:

In the Network tab of Chrome, I see the following request headers and form data:

Request Headers
Accept:text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,*/*;q=0.8
Accept-Encoding:gzip, deflate
Accept-Language:en-US,en;q=0.8,ko;q=0.6,zh;q=0.4
Cache-Control:max-age=0
Connection:keep-alive
Content-Length:425
Content-Type:application/x-www-form-urlencoded
Cookie:JSESSIONID=gixQBXBESRofyqLpiH2hlYg8; dataminer=1369707692.36895.0000; __utma=109610308.1662709339.1456530705.1456530705.1456530705.1; __utmc=109610308; __utmz=109610308.1456530705.1.1.utmcsr=(direct)|utmccn=(direct)|utmcmd=(none); JSESSIONID=8sx6CTIQhpPAAO5+4xcGGGlb; WT_FPC=id=xxx.xxx.xxx.xx-3069233008.30503152:lv=1456533141859:ss=1456530705581
Host:dataminer.pjm.com
Origin:https://dataminer.pjm.com
Referer:https://dataminer.pjm.com/dataminerui/pages/public/lmp.jsf
Upgrade-Insecure-Requests:1
User-Agent:Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/46.0.2490.86 Safari/537.36

Form Data
frmCriteria:frmCriteria
frmCriteria:calStartDate_input:01/01/2016
frmCriteria:calStopDate_input:01/02/2016
frmCriteria:mnuMarket_input:REALTIME
frmCriteria:mnuMarket_focus:
frmCriteria:mnuFreq_input:Daily
frmCriteria:mnuFreq_focus:
frmCriteria:mnuPnodes_input:All
frmCriteria:mnuPnodes_focus:
javax.faces.ViewState:8578362602192686517:-1021667131748875106
frmCriteria:j_idt78:frmCriteria:j_idt78

我在此请求中看到了我所有的表格数据.似乎我应该能够通过提交正确的请求(使用Python的请求库)以编程方式下载此CSV.

I see all my form data in this request. It seems like I should be able to programmatically download this CSV by submitting the right request (using Python's request library).

我尝试了多种方法来重新生成此标头和表单数据,但似乎无法生成CSV下载文件.

I've tried lots of ways of regenerating this header and form data, but can't seem to produce the CSV download.

编辑:我已经尝试了以下方法.我对HTTP请求和响应以及cookie的结构知之甚少,因此这可能是可笑的.我在POST上得到500.

Edit: I've tried the following. I know very little about the structure of HTTP requests and responses, and cookies, so this could be comically bad. I get a 500 on the POST.

import requests


headers = {
    'Accept': 'text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,*/*;q=0.8',
    'Accept-Encoding': 'gzip, deflate',
    'Accept-Language': 'en-US,en;q=0.8,ko;q=0.6,zh;q=0.4',
    'Cache-Control': 'max-age=0',
    'Connection': 'keep-alive',
    'Content-Length': 425,
    'Content-Type': 'application/x-www-form-urlencoded',
    'Host': 'dataminer.pjm.com',
    'Origin': 'https://dataminer.pjm.com',
    'Referer': 'https://dataminer.pjm.com/dataminerui/pages/public/lmp.jsf',
    'Upgrade-Insecure-Requests': 1,
    'User-Agent': 'Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/46.0.2490.86 Safari/537.36'
}


data = {
    'frmCriteria': 'frmCriteria',
    'frmCriteria': 'calStartDate_input:01/01/2016',
    'frmCriteria': 'calStopDate_input:01/02/2016',
    'frmCriteria': 'mnuMarket_input:REALTIME',
    'frmCriteria': 'mnuMarket_focus:',
    'frmCriteria': 'mnuFreq_input:Daily',
    'frmCriteria': 'mnuFreq_focus:',
    'frmCriteria': 'mnuPnodes_input:All',
    'frmCriteria': 'mnuPnodes_focus:',
    'javax.faces.ViewState': '8578362602192686517:-1021667131748875106',
    'frmCriteria:j_idt78': 'frmCriteria:j_idt78'
}


url = 'https://dataminer.pjm.com/dataminerui/pages/public/lmp.jsf'


with requests.Session() as s:
    get_response = s.get(url)
    post_response = s.post(url, headers=headers, data=data)

如何使用请求库获取CSV?

How can I use the requests library to fetch the CSV?

推荐答案

除非您仔细阅读了导致该页面的所有内容,否则您将很可能无法使用.JSF页面倾向于在Web会话中存储很多状态,因此您可以简单地发布一些静态有效负载(就像您正在做的那样)并期望它能够正常工作.

You may well not be able unless you go through everything that leads up to the page in question. JSF pages tend to store a lot of state within the web session, so you may simply be able to POST some static payload (like you're doing) and expect it to work.

一个完美的例子是ViewState参数.该值很可能每次都会更改,因此您使用的值可能完全无效.

A perfect example is that ViewState parameter. That value may very well change every single time, so the value you're using could be completely invalid.

因此,与其直接处理您要执行的任何请求,不如您必须遍历页面",使您到达那里.

So, instead of going to straight to whatever request you're trying to do, you may well have to "walk the pages" that got you there.

跟踪到达目的地所需的所有请求,查看步骤与会话之间以及会话之间的变化,并查看是否可以计算出最少的步骤数(理想情况下只有1或2个步骤)才能实现.

Track all of the requests it takes to get there, see what changes from step to step and session to session, and see if you can work out the minimum number of steps (ideally just 1 or 2) to pull it off.

这篇关于如何使用HTTP请求从.jsf页面获取数据?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆