我如何使用spark_apply()使用combn()生成组合 [英] How can I use spark_apply() to generate combinations using combn()
问题描述
我想使用spark生成combn()
函数的输出,以用于较大的输入列表(200 ish),并更改m
的值(2-5),但是我遇到了麻烦包括在spark_apply()
中.
I would like to use spark to generate the output of the combn()
function for a relatively large list of inputs (200 ish), and to varying values of m
(2-5), however I am having trouble including this in spark_apply()
.
A mwe of my current approach (based on this):
names_df <- data.frame(name = c("Alice", "Bob", "Cat"),
types = c("Human", "Human", "Animal"))
combn(names_df$name, 2)
name_tbl <- sdf_copy_to(sc = sc,
x = names_df,
name = "name_table")
name_tbl %>%
select(name) %>%
spark_apply(function(e) combn(e, 2))
错误消息输出很大,但是我很难理解如何使用该信息来完善我的方法.
The error message output is large, but I am having trouble understanding how to use that information to refine my approach.
我期望输出如MWE第二行的输出.是combn()
期望的不是"c4>提供的向量源"的问题吗?还是select没有返回可强制到Spark DataFrame的对象(通常是spark_tbl)"?不管哪种方式,有什么方法可以用来达到预期的效果?
I expected an output such as that of the second line of the MWE. Is the problem that combn()
is expecting a "vector source" which is not what I am providing by select()
? Or is it that select is not returning "An object (usually a spark_tbl) coercable to a Spark DataFrame"? Either way, is there a method I can use to achieve the desired result?
我也尝试了此尝试,但没有成功:
I have also tried this in an attempt with no success:
name_tbl %>%
select(name) %>% # removing this also doesn't work
spark_apply(function(e) combn(e$name, 2))
expand.grid
正常工作,这向我表明combn
的返回存在一些问题,无法将其强制转换为data.frame.
so expand.grid
works fine, which suggests to me that there is some issue with the return of combn
not being able to be coerced into a data.frame.
工作中expand.grid
:
name_tbl %>%
spark_apply(function(e) expand.grid(e))
在更仔细地阅读文档之后,我现在还尝试将函数强制为data.frame,如下所示:
Having more closely read the documentation, I have now also tried coercing the function into a data.frame as it says:
您的R函数应设计为在R数据帧上运行.传递给spark_apply的R函数需要一个DataFrame,并将返回一个可以转换为DataFrame的对象.
Your R function should be designed to operate on an R data frame. The R function passed to spark_apply expects a DataFrame and will return an object that can be cast as a DataFrame.
但是,以下操作也不成功:
However, the following are also unsuccessful:
name_tbl %>%
spark_apply(function(e) data.frame(combn(e$name, 2)))
name_tbl %>%
select(name) %>%
spark_apply(function(e) data.frame(combn(e, 2)))
推荐答案
问题似乎是combn()
与因子不能正常工作,代码也需要命名列,如:
The problem seems to be that combn()
does not work properly with factors, code also needs named columns, as in:
name_tbl %>%
spark_apply(
function(e) data.frame(combn(as.character(e$name), 2)),
names = c("1", "2", "3")
)
# Source: table<sparklyr_tmp_626bc0dd927> [?? x 3]
# Database: spark_connection
`1` `2` `3`
<chr> <chr> <chr>
1 Alice Alice Bob
2 Bob Cat Cat
这篇关于我如何使用spark_apply()使用combn()生成组合的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!