如何使用 spark_apply() 生成使用 combn() 的组合 [英] How can I use spark_apply() to generate combinations using combn()

查看:24
本文介绍了如何使用 spark_apply() 生成使用 combn() 的组合的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我想使用 spark 为相对较大的输入列表(200 ish)生成 combn() 函数的输出,以及改变 m (2-5),但是我在将其包含在 spark_apply() 中时遇到了麻烦.

I would like to use spark to generate the output of the combn() function for a relatively large list of inputs (200 ish), and to varying values of m (2-5), however I am having trouble including this in spark_apply().

我目前的方法(基于此):

names_df <- data.frame(name = c("Alice", "Bob", "Cat"), 
                   types = c("Human", "Human", "Animal"))

combn(names_df$name, 2)

name_tbl <- sdf_copy_to(sc = sc,
                        x = names_df,
                        name = "name_table")

name_tbl %>%
  select(name) %>%
  spark_apply(function(e) combn(e, 2))

错误消息输出很大,但我无法理解如何使用该信息来改进我的方法.

The error message output is large, but I am having trouble understanding how to use that information to refine my approach.

我期望输出类似于 MWE 的第二行的输出.问题是 combn() 期望矢量源"而不是我通过 select() 提供的?还是 select 没有返回一个对象(通常是 spark_tbl)可强制转换为 Spark DataFrame"?无论哪种方式,有没有我可以使用的方法来达到预期的结果?

I expected an output such as that of the second line of the MWE. Is the problem that combn() is expecting a "vector source" which is not what I am providing by select()? Or is it that select is not returning "An object (usually a spark_tbl) coercable to a Spark DataFrame"? Either way, is there a method I can use to achieve the desired result?

我也尝试过,但没有成功:

I have also tried this in an attempt with no success:

name_tbl %>%
  select(name) %>% # removing this also doesn't work
  spark_apply(function(e) combn(e$name, 2))

所以 expand.grid 工作正常,这向我表明 combn 的返回存在一些问题,无法将其强制转换为数据.框架.

so expand.grid works fine, which suggests to me that there is some issue with the return of combn not being able to be coerced into a data.frame.

工作expand.grid:

name_tbl %>%
  spark_apply(function(e) expand.grid(e))

编辑 2:

更仔细地阅读文档后,我现在还尝试将函数强制转换为 data.frame,如其所述:

Having more closely read the documentation, I have now also tried coercing the function into a data.frame as it says:

您的 R 函数应设计为对 R 数据帧进行操作.传递给 spark_apply 的 R 函数需要一个 DataFrame 并将返回一个可以转换为 DataFrame 的对象.

Your R function should be designed to operate on an R data frame. The R function passed to spark_apply expects a DataFrame and will return an object that can be cast as a DataFrame.

然而,以下也失败了:

name_tbl %>%
  spark_apply(function(e) data.frame(combn(e$name, 2)))

name_tbl %>%
  select(name) %>%
  spark_apply(function(e) data.frame(combn(e, 2)))

推荐答案

问题好像是 combn() 对因子不能正常工作,代码也需要命名列,如:

The problem seems to be that combn() does not work properly with factors, code also needs named columns, as in:

name_tbl %>%
  spark_apply(
    function(e) data.frame(combn(as.character(e$name), 2)),
    names = c("1", "2", "3")
  )

# Source:   table<sparklyr_tmp_626bc0dd927> [?? x 3]
# Database: spark_connection
    `1`   `2`   `3`
  <chr> <chr> <chr>
1 Alice Alice   Bob
2   Bob   Cat   Cat

这篇关于如何使用 spark_apply() 生成使用 combn() 的组合的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆