从 .aspx 页面触发数据响应 [英] Trigger data response from .aspx page
问题描述
from bs4 import BeautifulSoup从 pprint 导入 pprint进口请求url = 'http://estadistico.ut.com.sv/OperacionDiaria.aspx's = requests.Session()pagereq = s.get(url)汤 = BeautifulSoup(pagereq.content, 'lxml')viewstategenerator = soup.find("input", attrs = {'id': '__VIEWSTATEGENERATOR'})['value']viewstate = sound.find("input", attrs = {'id': '__VIEWSTATE'})['value']eventvalidation = soup.find("input", attrs = {'id': '__EVENTVALIDATION'})['value']eventtarget = 'ASPxDashboardViewer1'DXCss = '1_33,1_4,1_9,1_5,15_2,15_4'DXScript = '1_232,1_134,1_225,1_169,1_187,15_1,1_183,1_182,1_140,1_147,1_148,1_142,1_141,1_143,1_146,1_15_1_10eventargument = {"Task":"Export","ExportInfo":{"Mode":"SingleItem","GroupName":"pivotDashboardItem1","FileName":"Generación+por+tipo+de+tecnología+(MWh)","ClientState":{"clientSize":{"width":509,"height":385},"titleHeight":48,"itemsState":[{"name":"pivotDashboardItem1","headerHeight":34,"position":{"left":11,"top":146},"width":227,"height":108,"virtualSize":'null',"scroll":{"horizontal":'true',"vertical":'true'}}]},"Format":"Excel","DocumentOptions":{"paperKind":"Letter","pageLayout":"Portrait","scaleMode":"AutoFitWithinOnePage","scaleFactor":1,"autoFitPageCount":1,"showTitle":'true',"title":"Operación+Diaria","imageFormatOptions":{"format":"Png","resolution":96},"excelFormatOptions":{"format":"Csv","csvValueSeparator":","},"commonOptions":{"filterStatePresentation":"None","includeCaption":'true',"caption":"Generación+por+tipo+de+tecnología+(MWh)"},"pivotOptions":{"printHeadersOnEveryPage":'true'},"gridOptions":{"fitToPageWidth":'true',"printHeadersOnEveryPage":'true'},"图表选项":{"automaticPageLayout":'true',"sizeMode":"Zoom"},"pieOptions":{"autoArrangeContent":'true'},"gaugeOptions":{"autoArrangeContent":'true'},"cardOptions":{"autoArrangeContent":'true'},"mapOptions":{"automaticPageLayout":'true',"sizeMode":"Zoom"},"rangeFilterOptions":{"automaticPageLayout":'true',"sizeMode":"Stretch"},"imageOptions":{},"fileName":"Generación+por+tipo+de+tecnología+(MWh)"},"ItemType":"PIVOT"},"Context":"BwAHAAIkY2NkNWRiYzItYzIwNS00MDIyLTkzZjUtYhQEYWYPYCYAaaaaaaaaaqaq5ywns00eRequestMarker":1,"ClientState":{}}postdata = {'__EVENTTARGET':事件目标,'__EVENTARGUMENT':事件参数,'__VIEWSTATE':视图状态,'__VIEWSTATEGENERATOR':视图状态生成器,'__EVENTVALIDATION':事件验证,'DXScript':DXScript,'DXCss':DXCss}datareq = s.post(url, data = postdata)打印 datareq.text
我正在尝试从 this .aspx 网页抓取数据.该页面通过 javascript 动态加载数据,因此直接使用 requests/BeautifulSoup 抓取将不起作用.
通过查看网络流量,我可以看到当您单击元素的导出 (Exportar a) 按钮时,选择导出类型(excel、csv),然后确认向页面发出 POST 请求.它返回我需要的数据的 base64 编码字符串.据我所知,无法直接对文件发出 GET 请求,因为它仅在请求时生成.
我想要做的是复制触发 csv 响应的 POST 请求.所以首先我抓取 __VIEWSTATE、__VIEWSTATEGENERATOR 和 __EVENTVALIDATION.__EVENTTARGET、DXCSS 和 DXScript 看起来已修复.__EVENTARGUMENT 直接从 POST 请求中复制.
我的代码返回服务器应用程序错误.我认为问题是 a) 错误的 __EVENTARGUMENT(可能部分是动态的而不是固定的?),b) 没有真正理解 .aspx 页面的工作方式,或者 c) 我试图做的事情用这些工具是不可能的.
我确实考虑过使用 selenium 来触发数据导出,但我看不到捕获服务器响应的方法.
我能够从比我更了解 aspx 页面的人那里获得帮助.
链接到提供解决方案的 Github 要点.
https://gist.github.com/jarek/d73c672d8dd4ddb48d80bffc4d80p> I'm trying to scrape data from this .aspx webpage. The page loads the data dynamically via javascript so scraping directly with requests/BeautifulSoup won't work. By looking at the network traffic I can see that when you click the export (Exportar a) button for an element, select a type of export (excel, csv) then confirm a POST request is made to the page. It returns a base64 encoded string of the data I need. As far as I can tell there is no way to make a GET request for the file directly as it is only generated when requested. What I'm trying to do is is copy the POST request which triggers the csv response. So first I scrape for __VIEWSTATE, __VIEWSTATEGENERATOR and __EVENTVALIDATION. __EVENTTARGET, DXCSS and DXScript look to be fixed. __EVENTARGUMENT is copied directly from the POST request. My code returns a server application error. I'm thinking the problem is either a) wrong __EVENTARGUMENT (maybe part dynamic rather than fixed?), b) not really understanding how .aspx pages work or c) what I'm trying to do isn't possible with these tools. I did look at using selenium to trigger the data export but I couldn't see a way to capture the server response. I was able to get help from someone who knows more about aspx pages than me. Link to the Github gist that provides the solution. https://gist.github.com/jarek/d73c672d8dd4ddb48d80bffc4d8038ba 这篇关于从 .aspx 页面触发数据响应的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!from bs4 import BeautifulSoup
from pprint import pprint
import requests
url = 'http://estadistico.ut.com.sv/OperacionDiaria.aspx'
s = requests.Session()
pagereq = s.get(url)
soup = BeautifulSoup(pagereq.content, 'lxml')
viewstategenerator = soup.find("input", attrs = {'id': '__VIEWSTATEGENERATOR'})['value']
viewstate = soup.find("input", attrs = {'id': '__VIEWSTATE'})['value']
eventvalidation = soup.find("input", attrs = {'id': '__EVENTVALIDATION'})['value']
eventtarget = 'ASPxDashboardViewer1'
DXCss = '1_33,1_4,1_9,1_5,15_2,15_4'
DXScript = '1_232,1_134,1_225,1_169,1_187,15_1,1_183,1_182,1_140,1_147,1_148,1_142,1_141,1_143,1_144,1_145,1_146,15_0,15_6,15_7'
eventargument = {"Task":"Export","ExportInfo":{"Mode":"SingleItem","GroupName":"pivotDashboardItem1","FileName":"Generación+por+tipo+de+tecnología+(MWh)","ClientState":{"clientSize":{"width":509,"height":385},"titleHeight":48,"itemsState":[{"name":"pivotDashboardItem1","headerHeight":34,"position":{"left":11,"top":146},"width":227,"height":108,"virtualSize":'null',"scroll":{"horizontal":'true',"vertical":'true'}}]},"Format":"Excel","DocumentOptions":{"paperKind":"Letter","pageLayout":"Portrait","scaleMode":"AutoFitWithinOnePage","scaleFactor":1,"autoFitPageCount":1,"showTitle":'true',"title":"Operación+Diaria","imageFormatOptions":{"format":"Png","resolution":96},"excelFormatOptions":{"format":"Csv","csvValueSeparator":","},"commonOptions":{"filterStatePresentation":"None","includeCaption":'true',"caption":"Generación+por+tipo+de+tecnología+(MWh)"},"pivotOptions":{"printHeadersOnEveryPage":'true'},"gridOptions":{"fitToPageWidth":'true',"printHeadersOnEveryPage":'true'},"chartOptions":{"automaticPageLayout":'true',"sizeMode":"Zoom"},"pieOptions":{"autoArrangeContent":'true'},"gaugeOptions":{"autoArrangeContent":'true'},"cardOptions":{"autoArrangeContent":'true'},"mapOptions":{"automaticPageLayout":'true',"sizeMode":"Zoom"},"rangeFilterOptions":{"automaticPageLayout":'true',"sizeMode":"Stretch"},"imageOptions":{},"fileName":"Generación+por+tipo+de+tecnología+(MWh)"},"ItemType":"PIVOT"},"Context":"BwAHAAIkY2NkNWRiYzItYzIwNS00MDIyLTkzZjUtYWQ0NzVhYTM5Y2E3Ag9PcGVyYWNpb25EaWFyaWECAAIAAAAAAMByQA==","RequestMarker":1,"ClientState":{}}
postdata = {'__EVENTTARGET': eventtarget,
'__EVENTARGUMENT': eventargument,
'__VIEWSTATE': viewstate,
'__VIEWSTATEGENERATOR': viewstategenerator,
'__EVENTVALIDATION': eventvalidation,
'DXScript': DXScript,
'DXCss': DXCss
}
datareq = s.post(url, data = postdata)
print datareq.text