for() 循环的返回值,就像它是 R 中的函数一样 [英] Return value of for() loop as if it were a function in R
问题描述
我在 R
脚本中有这个 for
循环:
I have this for
loop in an R
script:
url <- "https://example.com"
page <- html_session(url, config(ssl_verifypeer = FALSE))
links <- page %>%
html_nodes("td") %>%
html_nodes("tr") %>%
html_nodes("a") %>%
html_attr("href")
base_names <- page %>%
html_nodes("td") %>%
html_nodes("tr") %>%
html_nodes("a") %>%
html_attr("href") %>%
basename()
for(i in 1:length(links)) {
site <- html_session(URLencode(
paste0("https://example.com", links[i])),
config(ssl_verifypeer = FALSE))
writeBin(site$response$content, base_names[i])
}
这会循环链接,&将文本文件下载到我的工作目录.我想知道是否可以将 return
放在某处,以便它返回文档.
This loops through links, & downloads a text file to my working directory. I'm wondering if I can put return
somewhere, so that it returns the document.
原因是,我在 NiFi 中执行我的脚本(使用 ExecuteProcess
),并且它没有将我抓取的文档发送到线路中.相反,它只显示我的 R 脚本的头部.我假设您会将 for
循环包装在 fun <- function(x) {}
中,但我不确定如何集成 x
进入一个已经工作的刮板.
Reason being, is that I'm executing my script in NiFi (using ExecuteProcess
), and it's not sending my scraped documents down the line. Instead, it just shows the head of my R script. I would assume you would wrap the for
loop in a fun <- function(x) {}
, but I'm not sure how to integrate the x
into an already working scraper.
我需要它在流程中返回文档,而不仅仅是这个:
I need it to return documents down the flow, and not just this:
处理器配置:
即使您不熟悉 NiFi,它也会对 R 部分有很大帮助!谢谢
推荐答案
如果您的意图是 (1) 保存输出(使用 writeBin
)和(2) 返回值(在 list
中),然后试试这个:
If your intent is to both (1) save the output (with writeBin
) and (2) return the values (in a list
), then try this:
out <- Map(function(ln, bn) {
site <- html_session(URLencode(
paste0("https://example.com", ln)),
config(ssl_verifypeer = FALSE))
writeBin(site$response$content, bn)
site$response$content
}, links, base_names)
Map
的使用将各个元素压缩"在一起.对于基本情况,以下内容是相同的:
The use of Map
"zips" together the individual elements. For a base-case, the following are identical:
Map(myfunc, list1)
lapply(list1, myfunc)
但是如果您想使用多个列表中的相同索引元素,您可以执行其中之一
But if you want to use same-index elements from multiple lists, you can do one of
lapply(seq_len(length(list1)), function(i) myfunc(list1[i], list2[i], list3[i]))
Map(myfunc, list1, list2, list3)
展开 Map
的结果是:
myfunc(list1[1], list2[1], list3[1])
myfunc(list1[2], list2[2], list3[2])
# ...
这里lapply
和Map
最大的区别就是lapply
只能接受一个向量,而Map
接受一个或多个(几乎无限制),将它们压缩在一起.使用的所有列表的长度必须相同或长度为 1(循环使用),因此执行类似的操作是合法的
The biggest difference between lapply
and Map
here is that lapply
can only accept one vector, whereas Map
accepts one or more (practically unlimited), zipping them together. All of the lists used must be the same length or length 1 (recycled), so it's legitimate to do something like
Map(myfunc, list1, list2, "constant string")
注意:Map
-versus-mapply
类似于 lapply
-vs-sapply
.对于这两种情况,第一个 always 返回一个 list
对象,而第二个将返回一个 vector
IFF,每个返回值的长度/维度都相同, 否则它也会返回一个 list
.
Note: Map
-versus-mapply
is similar to lapply
-vs-sapply
. For both, the first always returns a list
object, while the second will return a vector
IFF every return value is of the same length/dimension, otherwise it too will return a list
.
这篇关于for() 循环的返回值,就像它是 R 中的函数一样的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!