Sparklyr中R的list()函数的等效功能是什么? [英] What is the equivalent of R's list() function in sparklyr?

查看：85 发布时间：2020/9/4 4:17:05 r apache-spark sparklyr

本文介绍了Sparklyr中R的list()函数的等效功能是什么?的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

下面是一个示例R代码.我想在sparklyr中做同样的事情.

Below is a sample R code. I would like to do the same in sparklyr.

custTrans1 <- Pdt_table %>% 
  group_by(Main_CustomerID) %>% 
  summarise(Invoice = as.vector(list(Invoice_ID)),Industry = as.vector(list(Industry)))

其中Pdt_table是spark数据帧，而Main_CustomerID，Invoice_ID和Industry是变量.

where Pdt_table is spark data frame and Main_CustomerID, Invoice_ID and Industry are variables.

我想创建上述变量的列表并将其转换为向量.如何在sparklyr中做到这一点?

I would like to create list of the above variables and convert it to vector. How can I do it in sparklyr?

推荐答案

您可以使用

You can use collect_list or collect_set:

set.seed(1)
df <- copy_to(
  sc, tibble(group = rep(c("a", "b"), 3), value = runif(6)),
  name = "df"
)

result <- df %>% group_by(group) %>% summarise(values = collect_list(value))
result

# Source:   lazy query [?? x 2]
# Database: spark_connection
  group values    
  <chr> <list>    
1 b     <list [3]>
2 a     <list [3]>

将转换为以下查询:

result %>% show_query()

<SQL>
SELECT `group`, COLLECT_LIST(`value`) AS `values`
FROM `df`
GROUP BY `group`

具有相应的执行计划:

result %>% optimizedPlan()

<jobj[213]>
  org.apache.spark.sql.catalyst.plans.logical.Aggregate
  Aggregate [group#259], [group#259, collect_list(value#260, 0, 0) AS values#345]
+- InMemoryRelation [group#259, value#260], true, 10000, StorageLevel(disk, memory, deserialized, 1 replicas), `df`
      +- Scan ExistingRDD[group#259,value#260]

和架构(带有array<...>列):

root
 |-- group: string (nullable = true)
 |-- values: array (nullable = true)
 |    |-- element: double (containsNull = true)

请记住:

这种操作在分布式系统中非常昂贵.
依靠数据分布可能不可行.
复杂类型在Spark中很难处理，而sparklyr具有整洁的数据焦点，这并不会使事情变得容易.为了有效地处理结果，您可能需要Scala扩展.

Operation like this one is very expensive in a distributed system.
Depending on the data distribution might not be feasible.
Complex types are somewhat hard to handle in Spark in general, and sparklyr with it's tidy data focus, doesn't make things easier. To process the result efficiently you may require a Scala extension.

这篇关于Sparklyr中R的list()函数的等效功能是什么?的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

Sparklyr中R的list()函数的等效功能是什么? [英] What is the equivalent of R's list() function in sparklyr?

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录关闭

Sparklyr中R的list()函数的等效功能是什么? [英] What is the equivalent of R&#39;s list() function in sparklyr?

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录 关闭

Sparklyr中R的list()函数的等效功能是什么? [英] What is the equivalent of R's list() function in sparklyr?

登录关闭