我如何使用spark_apply()使用combn()生成组合 [英] How can I use spark_apply() to generate combinations using combn()

查看:149
本文介绍了我如何使用spark_apply()使用combn()生成组合的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我想使用spark生成combn()函数的输出,以用于较大的输入列表(200 ish),并更改m的值(2-5),但是我遇到了麻烦包括在spark_apply()中.

I would like to use spark to generate the output of the combn() function for a relatively large list of inputs (200 ish), and to varying values of m (2-5), however I am having trouble including this in spark_apply().

我目前的做法(基于此的 ) :

A mwe of my current approach (based on this):

names_df <- data.frame(name = c("Alice", "Bob", "Cat"), 
                   types = c("Human", "Human", "Animal"))

combn(names_df$name, 2)

name_tbl <- sdf_copy_to(sc = sc,
                        x = names_df,
                        name = "name_table")

name_tbl %>%
  select(name) %>%
  spark_apply(function(e) combn(e, 2))

错误消息输出很大,但是我很难理解如何使用该信息来完善我的方法.

The error message output is large, but I am having trouble understanding how to use that information to refine my approach.

我期望输出如MWE第二行的输出.是combn()期望的不是"c4>提供的向量源"的问题吗?还是select没有返回可强制到Spark DataFrame的对象(通常是spark_tbl)"?不管哪种方式,有什么方法可以用来达到预期的效果?

I expected an output such as that of the second line of the MWE. Is the problem that combn() is expecting a "vector source" which is not what I am providing by select()? Or is it that select is not returning "An object (usually a spark_tbl) coercable to a Spark DataFrame"? Either way, is there a method I can use to achieve the desired result?

我也尝试了此尝试,但没有成功:

I have also tried this in an attempt with no success:

name_tbl %>%
  select(name) %>% # removing this also doesn't work
  spark_apply(function(e) combn(e$name, 2))

expand.grid正常工作,这向我表明combn的返回存在一些问题,无法将其强制转换为data.frame.

so expand.grid works fine, which suggests to me that there is some issue with the return of combn not being able to be coerced into a data.frame.

工作中expand.grid:

name_tbl %>%
  spark_apply(function(e) expand.grid(e))

在更仔细地阅读文档之后,我现在还尝试将函数强制为data.frame,如下所示:

Having more closely read the documentation, I have now also tried coercing the function into a data.frame as it says:

您的R函数应设计为在R数据帧上运行.传递给spark_apply的R函数需要一个DataFrame,并将返回一个可以转换为DataFrame的对象.

Your R function should be designed to operate on an R data frame. The R function passed to spark_apply expects a DataFrame and will return an object that can be cast as a DataFrame.

但是,以下操作也不成功:

However, the following are also unsuccessful:

name_tbl %>%
  spark_apply(function(e) data.frame(combn(e$name, 2)))

name_tbl %>%
  select(name) %>%
  spark_apply(function(e) data.frame(combn(e, 2)))

推荐答案

问题似乎是combn()与因子不能正常工作,代码也需要命名列,如:

The problem seems to be that combn() does not work properly with factors, code also needs named columns, as in:

name_tbl %>%
  spark_apply(
    function(e) data.frame(combn(as.character(e$name), 2)),
    names = c("1", "2", "3")
  )

# Source:   table<sparklyr_tmp_626bc0dd927> [?? x 3]
# Database: spark_connection
    `1`   `2`   `3`
  <chr> <chr> <chr>
1 Alice Alice   Bob
2   Bob   Cat   Cat

这篇关于我如何使用spark_apply()使用combn()生成组合的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆