Sparklyr中R的list()函数的等效功能是什么? [英] What is the equivalent of R's list() function in sparklyr?
本文介绍了Sparklyr中R的list()函数的等效功能是什么?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!
问题描述
下面是一个示例R代码.我想在sparklyr中做同样的事情.
Below is a sample R code. I would like to do the same in sparklyr.
custTrans1 <- Pdt_table %>%
group_by(Main_CustomerID) %>%
summarise(Invoice = as.vector(list(Invoice_ID)),Industry = as.vector(list(Industry)))
其中Pdt_table是spark数据帧,而Main_CustomerID,Invoice_ID和Industry是变量.
where Pdt_table is spark data frame and Main_CustomerID, Invoice_ID and Industry are variables.
我想创建上述变量的列表并将其转换为向量.如何在sparklyr
中做到这一点?
I would like to create list of the above variables and convert it to vector. How can I do it in sparklyr
?
推荐答案
您可以使用 collect_set
:
You can use collect_list
or collect_set
:
set.seed(1)
df <- copy_to(
sc, tibble(group = rep(c("a", "b"), 3), value = runif(6)),
name = "df"
)
result <- df %>% group_by(group) %>% summarise(values = collect_list(value))
result
# Source: lazy query [?? x 2]
# Database: spark_connection
group values
<chr> <list>
1 b <list [3]>
2 a <list [3]>
将转换为以下查询:
result %>% show_query()
<SQL>
SELECT `group`, COLLECT_LIST(`value`) AS `values`
FROM `df`
GROUP BY `group`
具有相应的执行计划:
result %>% optimizedPlan()
<jobj[213]>
org.apache.spark.sql.catalyst.plans.logical.Aggregate
Aggregate [group#259], [group#259, collect_list(value#260, 0, 0) AS values#345]
+- InMemoryRelation [group#259, value#260], true, 10000, StorageLevel(disk, memory, deserialized, 1 replicas), `df`
+- Scan ExistingRDD[group#259,value#260]
和架构(带有array<...>
列):
root
|-- group: string (nullable = true)
|-- values: array (nullable = true)
| |-- element: double (containsNull = true)
请记住:
- 这种操作在分布式系统中非常昂贵.
- 依靠数据分布可能不可行. 一般而言,
- 复杂类型在Spark中很难处理,而
sparklyr
具有整洁的数据焦点,这并不会使事情变得容易.为了有效地处理结果,您可能需要Scala扩展.
- Operation like this one is very expensive in a distributed system.
- Depending on the data distribution might not be feasible.
- Complex types are somewhat hard to handle in Spark in general, and
sparklyr
with it's tidy data focus, doesn't make things easier. To process the result efficiently you may require a Scala extension.
这篇关于Sparklyr中R的list()函数的等效功能是什么?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!
查看全文