通过导航 doPostBack 使用 R 抓取网站 [英] Scrape website with R by navigating doPostBack

查看:52
本文介绍了通过导航 doPostBack 使用 R 抓取网站的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我想定期从下面的站点中提取一个表格.

I want to extract a table periodicaly from below site.

点击积木名称(BLOK 16 A, BLOK 16 B, BLOK 16 C, ...)时,价格表会发生变化.URL 不改变,页面通过触发改变

price list changes when clicked building block names(BLOK 16 A, BLOK 16 B, BLOK 16 C, ...) . URL doesn't change, page changes by trigering

javascript:__doPostBack('ctl00$ContentPlaceHolder1$DataList2$ctl04$lnk_blok','')

在搜索 google 和 starckoverflow 后,我尝试了 3 种方法.

I've tried 3 ways after searching google and starckoverflow.

我尝试过的第一项:这不会触发 doPostBack 事件.

what I've tried no 1: this doesn't triger doPostBack event.

postForm( "http://www.kentkonut.com.tr/tr/modul/projeler/daire_fiyatlari.aspx?id=44", ctl00_ContentPlaceHolder1_DataList2_ctl03_lnk_blok="ctl00$ContentPlaceHolder1$DataList2$ctl03$lnk_blok")

我尝试过的没有 2: selenium remote 似乎适用于 (http://localhost:4444/) 但远程驱动程序不导航.返回此错误.(检查错误(res)中的错误:httr 调用中的未定义错误.httr 输出:length(url) == 1 is not TRUE)

what I've tried no 2: selenium remote seem to works on (http://localhost:4444/) but remotedriver doesn't navigate. returns this error. (Error in checkError(res) : Undefined error in httr call. httr output: length(url) == 1 is not TRUE)

library(RSelenium)
startServer()
remDr <- remoteDriver()
remDr <- remoteDriver(remoteServerAddr = "localhost" 
                  , port = 4444L, browserName = "firefox")
remDr$open()
remDr$getStatus()
remDr$navigate("http://www.kentkonut.com.tr/tr/modul/projeler/daire_fiyatlari.aspx?id=44")

我尝试过的第 3 种方法: 这是触发 dopostback 事件的另一种方法.它不会导航.

what I've tried no 3: this another way to triger dopostback event. it doesn't navigate.

base.url <- "http://www.kentkonut.com.tr/tr/modul/projeler/",
event.target <- 'ctl00$ContentPlaceHolder1$DataList2$ctl03$lnk_blok',
action <- "daire_fiyatlari.aspx?id=44"

ftarget <- paste0(base.url, action)
dum <- getURL(ftarget)
event.val <- unlist(strsplit(dum,"__EVENTVALIDATION\" value=\""))[2]
event.val <- unlist(strsplit(event.val,"\" />\r\n\r\n<script"))[1]
view.state <- unlist(strsplit(dum,"id=\"__VIEWSTATE\" value=\""))[2]
view.state <- unlist(strsplit(view.state,"\" />\r\n\r\n\r\n<script"))[1]
web.data <- postForm(ftarget, "form name" = "ctl00_ContentPlaceHolder1_DataList2_ctl03_lnk_blok", 
                   "method" = "POST", 
                   "action" = action, 
                   "id" = "ctl00_ContentPlaceHolder1_DataList2_ctl03_lnk_blok",
                   "__EVENTTARGET"=event.target,
                   "__EVENTVALIDATION"=event.val,
                   "__VIEWSTATE"=view.state)

感谢您的帮助.

推荐答案

library(rvest)    
url<-"http://www.kentkonut.com.tr/tr/modul/projeler/daire_fiyatlari.aspx?id=44"
    pgsession<-html_session(url) 
    t<-html_table(html_nodes(read_html(pgsession), css = "#ctl00_ContentPlaceHolder1_DataList1"), fill= TRUE)[[1]]
    even_indices<-seq(2,length(t$X1),2)
    t<-t[even_indices,]
    t<-t[2:(length(t$X1)),]

编辑代码:

library(rvest)    
url<-"http://www.kentkonut.com.tr/tr/modul/projeler/daire_fiyatlari.aspx?id=44"
pgsession<-html_session(url)
pgform<-html_form(pgsession)[[1]]
page<-rvest:::request_POST(pgsession,"http://www.kentkonut.com.tr/tr/modul/projeler/daire_fiyatlari.aspx?id=44",
                           body=list(
                             `__VIEWSTATE`=pgform$fields$`__VIEWSTATE`$value,
                             `__EVENTTARGET`="ctl00$ContentPlaceHolder1$DataList2$ctl01$lnk_blok",
                             `__EVENTARGUMENT`="",
                             `__VIEWSTATEGENERATOR`=pgform$fields$`__VIEWSTATEGENERATOR`$value,
                             `__VIEWSTATEENCRYPTED`=pgform$fields$`__VIEWSTATEENCRYPTED`$value,
                             `__EVENTVALIDATION`=pgform$fields$`__EVENTVALIDATION`$value
                           ),
                           encode="form"
                           )
# in the above example change eventtarget as "ctl00$ContentPlaceHolder1$DataList2$ctl02$lnk_blok" to get different table

t<-html_table(html_nodes(read_html(page), css = "#ctl00_ContentPlaceHolder1_DataList1"), fill= TRUE)[[1]]
even_indices<-seq(2,length(t$X1),2)
t<-t[even_indices,]
t<-t[2:(length(t$X1)),]

这篇关于通过导航 doPostBack 使用 R 抓取网站的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆