从 .aspx 页面触发数据响应 [英] Trigger data response from .aspx page

查看:17
本文介绍了从 .aspx 页面触发数据响应的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

from bs4 import BeautifulSoup从 pprint 导入 pprint进口请求url = 'http://estadistico.ut.com.sv/OperacionDiaria.aspx's = requests.Session()pagereq = s.get(url)汤 = BeautifulSoup(pagereq.content, 'lxml')viewstategenerator = soup.find("input", attrs = {'id': '__VIEWSTATEGENERATOR'})['value']viewstate = sound.find("input", attrs = {'id': '__VIEWSTATE'})['value']eventvalidation = soup.find("input", attrs = {'id': '__EVENTVALIDATION'})['value']eventtarget = 'ASPxDashboardViewer1'DXCss = '1_33,1_4,1_9,1_5,15_2,15_4'DXScript = '1_232,1_134,1_225,1_169,1_187,15_1,1_183,1_182,1_140,1_147,1_148,1_142,1_141,1_143,1_146,1_15_1_10eventargument = {"Task":"Export","ExportInfo":{"Mode":"SingleItem","GroupName":"pivotDashboardItem1","FileName":"Generación+por+tipo+de+tecnología+(MWh)","ClientState":{"clientSize":{"width":509,"height":385},"titleHeight":48,"itemsState":[{"name":"pivotDashboardItem1","headerHeight":34,"position":{"left":11,"top":146},"width":227,"height":108,"virtualSize":'null',"scroll":{"horizo​​ntal":'true',"vertical":'true'}}]},"Format":"Excel","DocumentOptions":{"paperKind":"Letter","pageLayout":"Portrait","scaleMode":"AutoFitWithinOnePage","scaleFactor":1,"autoFitPageCount":1,"showTitle":'true',"title":"Operación+Diaria","imageFormatOptions":{"format":"Png","re​​solution":96},"excelFormatOptions":{"format":"Csv","csvValueSeparator":","},"commonOptions":{"filterStatePresentation":"None","includeCaption":'true',"caption":"Generación+por+tipo+de+tecnología+(MWh)"},"pivotOptions":{"printHeadersOnEveryPage":'true'},"gridOptions":{"fitToPageWidth":'true',"printHeadersOnEveryPage":'true'},"图表选项":{"automaticPageLayout":'true',"sizeMode":"Zoom"},"pieOptions":{"autoArrangeContent":'true'},"gaugeOptions":{"autoArrangeContent":'true'},"cardOptions":{"autoArrangeContent":'true'},"mapOptions":{"automaticPageLayout":'true',"sizeMode":"Zoom"},"rangeFilterOptions":{"automaticPageLayout":'true',"sizeMode":"Stretch"},"imageOptions":{},"fileName":"Generación+por+tipo+de+tecnología+(MWh)"},"ItemType":"PIVOT"},"Context":"BwAHAAIkY2NkNWRiYzItYzIwNS00MDIyLTkzZjUtYhQEYWYPYCYAaaaaaaaaaqaq5ywns00eRequestMarker":1,"ClientState":{}}postdata = {'__EVENTTARGET':事件目标,'__EVENTARGUMENT':事件参数,'__VIEWSTATE':视图状态,'__VIEWSTATEGENERATOR':视图状态生成器,'__EVENTVALIDATION':事件验证,'DXScript':DXScript,'DXCss':DXCss}datareq = s.post(url, data = postdata)打印 datareq.text

我正在尝试从 this .aspx 网页抓取数据.该页面通过 javascript 动态加载数据,因此直接使用 requests/BeautifulSoup 抓取将不起作用.

通过查看网络流量,我可以看到当您单击元素的导出 (Exportar a) 按钮时,选择导出类型(excel、csv),然后确认向页面发出 POST 请求.它返回我需要的数据的 base64 编码字符串.据我所知,无法直接对文件发出 GET 请求,因为它仅在请求时生成.

我想要做的是复制触发 csv 响应的 POST 请求.所以首先我抓取 __VIEWSTATE、__VIEWSTATEGENERATOR 和 __EVENTVALIDATION.__EVENTTARGET、DXCSS 和 DXScript 看起来已修复.__EVENTARGUMENT 直接从 POST 请求中复制.

我的代码返回服务器应用程序错误.我认为问题是 a) 错误的 __EVENTARGUMENT(可能部分是动态的而不是固定的?),b) 没有真正理解 .aspx 页面的工作方式,或者 c) 我试图做的事情用这些工具是不可能的.

我确实考虑过使用 selenium 来触发数据导出,但我看不到捕获服务器响应的方法.

解决方案

我能够从比我更了解 aspx 页面的人那里获得帮助.

链接到提供解决方案的 Github 要点.

https://gist.github.com/jarek/d73c672d8dd4ddb48d80bffc4d80p>

from bs4 import BeautifulSoup
from pprint import pprint
import requests

url = 'http://estadistico.ut.com.sv/OperacionDiaria.aspx'

s = requests.Session()

pagereq = s.get(url)
soup = BeautifulSoup(pagereq.content, 'lxml')

viewstategenerator = soup.find("input", attrs = {'id': '__VIEWSTATEGENERATOR'})['value']
viewstate = soup.find("input", attrs = {'id': '__VIEWSTATE'})['value']
eventvalidation = soup.find("input", attrs = {'id': '__EVENTVALIDATION'})['value']

eventtarget = 'ASPxDashboardViewer1'
DXCss = '1_33,1_4,1_9,1_5,15_2,15_4'
DXScript = '1_232,1_134,1_225,1_169,1_187,15_1,1_183,1_182,1_140,1_147,1_148,1_142,1_141,1_143,1_144,1_145,1_146,15_0,15_6,15_7'
eventargument = {"Task":"Export","ExportInfo":{"Mode":"SingleItem","GroupName":"pivotDashboardItem1","FileName":"Generación+por+tipo+de+tecnología+(MWh)","ClientState":{"clientSize":{"width":509,"height":385},"titleHeight":48,"itemsState":[{"name":"pivotDashboardItem1","headerHeight":34,"position":{"left":11,"top":146},"width":227,"height":108,"virtualSize":'null',"scroll":{"horizontal":'true',"vertical":'true'}}]},"Format":"Excel","DocumentOptions":{"paperKind":"Letter","pageLayout":"Portrait","scaleMode":"AutoFitWithinOnePage","scaleFactor":1,"autoFitPageCount":1,"showTitle":'true',"title":"Operación+Diaria","imageFormatOptions":{"format":"Png","resolution":96},"excelFormatOptions":{"format":"Csv","csvValueSeparator":","},"commonOptions":{"filterStatePresentation":"None","includeCaption":'true',"caption":"Generación+por+tipo+de+tecnología+(MWh)"},"pivotOptions":{"printHeadersOnEveryPage":'true'},"gridOptions":{"fitToPageWidth":'true',"printHeadersOnEveryPage":'true'},"chartOptions":{"automaticPageLayout":'true',"sizeMode":"Zoom"},"pieOptions":{"autoArrangeContent":'true'},"gaugeOptions":{"autoArrangeContent":'true'},"cardOptions":{"autoArrangeContent":'true'},"mapOptions":{"automaticPageLayout":'true',"sizeMode":"Zoom"},"rangeFilterOptions":{"automaticPageLayout":'true',"sizeMode":"Stretch"},"imageOptions":{},"fileName":"Generación+por+tipo+de+tecnología+(MWh)"},"ItemType":"PIVOT"},"Context":"BwAHAAIkY2NkNWRiYzItYzIwNS00MDIyLTkzZjUtYWQ0NzVhYTM5Y2E3Ag9PcGVyYWNpb25EaWFyaWECAAIAAAAAAMByQA==","RequestMarker":1,"ClientState":{}}

postdata = {'__EVENTTARGET': eventtarget,
            '__EVENTARGUMENT': eventargument,
            '__VIEWSTATE': viewstate,
            '__VIEWSTATEGENERATOR': viewstategenerator,
            '__EVENTVALIDATION': eventvalidation,
            'DXScript': DXScript,
            'DXCss': DXCss
           }

datareq = s.post(url, data = postdata)

print datareq.text

I'm trying to scrape data from this .aspx webpage. The page loads the data dynamically via javascript so scraping directly with requests/BeautifulSoup won't work.

By looking at the network traffic I can see that when you click the export (Exportar a) button for an element, select a type of export (excel, csv) then confirm a POST request is made to the page. It returns a base64 encoded string of the data I need. As far as I can tell there is no way to make a GET request for the file directly as it is only generated when requested.

What I'm trying to do is is copy the POST request which triggers the csv response. So first I scrape for __VIEWSTATE, __VIEWSTATEGENERATOR and __EVENTVALIDATION. __EVENTTARGET, DXCSS and DXScript look to be fixed. __EVENTARGUMENT is copied directly from the POST request.

My code returns a server application error. I'm thinking the problem is either a) wrong __EVENTARGUMENT (maybe part dynamic rather than fixed?), b) not really understanding how .aspx pages work or c) what I'm trying to do isn't possible with these tools.

I did look at using selenium to trigger the data export but I couldn't see a way to capture the server response.

解决方案

I was able to get help from someone who knows more about aspx pages than me.

Link to the Github gist that provides the solution.

https://gist.github.com/jarek/d73c672d8dd4ddb48d80bffc4d8038ba

这篇关于从 .aspx 页面触发数据响应的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆