如何在 rvest 提交表单中传递多个值 [英] How to pass multiple values in a rvest submission form

查看:60
本文介绍了如何在 rvest 提交表单中传递多个值的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

这是对a之前的线程.该代码对于单个值非常有效,但是在尝试传递 1 个以上的值时出现以下错误,我收到基于函数长度的错误.vapply(elements, encode, character(1)) 中的错误:值必须是长度 1,但有趣(X[1]) 结果是长度 3

This is a follow up to a prior thread. The code works fantastic for a single value but I get the following error when trying to pass more than 1 value I get an error based on the length of the function. Error in vapply(elements, encode, character(1)) : values must be length 1, but FUN(X[1]) result is length 3

这是代码示例.在大多数情况下,我只能命名一个对象并以这种方式抓取.

Here is a sample of the code. In most instances I have been able just to name an object and scrape that way.

library(httr)
library(rvest)
library(dplyr)

b<-c('48127','48180','49504')

POST(
 url = "http://www.nearestoutlet.com/cgi-bin/smi/findsmi.pl", 
 body = list(zipcode = b), 
 encode = "form"
) -> res

我想知道将值插入表单的循环是否是正确的方法?然而,我的循环写作技巧仍在发展中,我不确定把它放在哪里;此外,当我调用循环时,它不会逐行打印,它只会返回空结果.

I was wondering if a loop to insert the values into the form would be the right way to go? However my loop writing skills are still in development and I am unsure of where to place it; in addition when i call the loop it doesn't print line by line it just returns null results.

#d isn't listed in the above code as it returns null    
d<-for(i in 1:3){nrow(b)}

推荐答案

这里是发送多个 POST 请求的方法

Here is an approach to send multiple POST requests

library(httr)
library(rvest)
b <- c('48127','48180','49504')

对 b 中的每个元素执行一个函数,将发送适当的 POST 请求

For each element in b perform a function that will send the appropriate POST request

res <- lapply(b, function(x){
  res <- POST(
    url = "http://www.nearestoutlet.com/cgi-bin/smi/findsmi.pl", 
    body = list(zipcode = x), 
    encode = "form"
  ) 
  res <- read_html(content(res, as="raw")) 
})

现在对于列表中的每个元素 res 你应该执行 hrbrmstr 解释的解析步骤:如何使用 rvest 和 R 抓取 CGI-Bin?

Now for each element of the list res you should do the parsing steps explained by hrbrmstr: How can I Scrape a CGI-Bin with rvest and R?

library(tidyverse)

我将使用 hrbrmstr 的代码,因为他是国王,你已经很清楚了.我们在这里唯一要做的就是对 res 列表的每个元素执行它.

I will use hrbrmstr's code since he is king and it is already clear to you. Only thing we are doing here is performing it on each element of res list.

res_list = lapply(res, function(x){
    rows <- html_nodes(x, "table[width='300'] > tr > td")
    ret <- data_frame(
    record = !is.na(html_attr(rows, "bgcolor")),
    text = html_text(rows, trim=TRUE)
    ) %>% 
    mutate(record = cumsum(record)) %>% 
    filter(text != "") %>% 
    group_by(record) %>% 
    summarise(x = paste0(text, collapse="|")) %>% 
    separate(x, c("store", "address1", "city_state_zip", "phone_and_or_distance"), sep="\\|", extra="merge")
  return(ret)
}
)

或使用 purrr

res %>%
  map(function(x){
    rows <- html_nodes(x, "table[width='300'] > tr > td")
    data_frame(
      record = !is.na(html_attr(rows, "bgcolor")),
      text = html_text(rows, trim=TRUE)
      ) %>% 
      mutate(record = cumsum(record)) %>% 
      filter(text != "") %>% 
      group_by(record) %>% 
      summarise(x = paste0(text, collapse="|")) %>% 
      separate(x, c("store", "address1", "city_state_zip", "phone_and_or_distance"),
               sep="\\|", extra="merge") -> ret
    return(ret)
  }
  )

如果你想在数据框中使用这个:

If you would like this in a data frame:

res_df <- data.frame(do.call(rbind, res_list), #rbinds list elements 
                     b = rep(b, times = unlist(lapply(res_list, length)))) #names the rows according to elements in b

这篇关于如何在 rvest 提交表单中传递多个值的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆