sparklyr:使用mutate函数创建新列 [英] sparklyr: create new column with mutate function

查看:112
本文介绍了sparklyr:使用mutate函数创建新列的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

如果使用Sparklyr无法解决此类问题,我感到非常惊讶:

I'm very surprised if this kind of problems cannot be solved with sparklyr:

iris_tbl <- copy_to(sc, aDataFrame)

# date_vector is a character vector of element
# in this format: YYYY-MM-DD (year, month, day)
for (d in date_vector) {
   ...
   aDataFrame %>% mutate(newValue=gsub("-","",d)))
   ...
}

我收到此错误:

Error: org.apache.spark.sql.AnalysisException: Undefined function: 'GSUB'. This function is neither a registered temporary function nor a permanent function registered in the database 'default'.; line 2 pos 86
    at org.apache.spark.sql.catalyst.catalog.SessionCatalog.failFunctionLookup(SessionCatalog.scala:787)
    at org.apache.spark.sql.hive.HiveSessionCatalog.lookupFunction0(HiveSessionCatalog.scala:200)
    at org.apache.spark.sql.hive.HiveSessionCatalog.lookupFunction(HiveSessionCatalog.scala:172)
    at org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveFunctions$$anonfun$apply$13$$anonfun$applyOrElse$6$$anonfun$applyOrElse$39.apply(Analyzer.scala:884)
    at org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveFunctions$$anonfun$apply$13$$anonfun$applyOrElse$6$$anonfun$applyOrElse$39.apply(Analyzer.scala:884)
    at org.apache.spark.sql.catalyst.analysis.package$.withPosition(package.scala:48)
    at org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveFunctions$$anonfun$apply$13$$anonfun

但是这行:

aDataFrame %>% mutate(newValue=toupper("hello"))

一切正常.有帮助吗?

推荐答案

我强烈建议您在继续之前阅读 sparklyr 文档.特别是,您将要阅读有关如何将R转换为SQL的部分( http://spark.rstudio.com/dplyr.html#sql_translation ).简而言之,R函数的一个非常有限的子集可用于 sparklyr 数据帧,并且 gsub 并不是这些函数之一(但 toupper 是).如果您确实需要 gsub ,则必须将数据收集到本地数据帧中,然后对它进行 gsub (您仍然可以使用 mutate ),然后 copy_to 返回火花.

I would strongly recommend you read the sparklyr documentation before proceeding. In particular, you're going to want to read the section on how R is translated to SQL (http://spark.rstudio.com/dplyr.html#sql_translation). In short, a very limited subset of R functions are available for use on sparklyr dataframes, and gsub is not one of those functions (but toupper is). If you really need gsub you're going to have to collect the data in to a local dataframe, then gsub it (you can still use mutate), then copy_to back to spark.

这篇关于sparklyr:使用mutate函数创建新列的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆