sparklyr:使用mutate函数创建新列 [英] sparklyr: create new column with mutate function
问题描述
如果使用Sparklyr无法解决此类问题,我感到非常惊讶:
I'm very surprised if this kind of problems cannot be solved with sparklyr:
iris_tbl <- copy_to(sc, aDataFrame)
# date_vector is a character vector of element
# in this format: YYYY-MM-DD (year, month, day)
for (d in date_vector) {
...
aDataFrame %>% mutate(newValue=gsub("-","",d)))
...
}
我收到此错误:
Error: org.apache.spark.sql.AnalysisException: Undefined function: 'GSUB'. This function is neither a registered temporary function nor a permanent function registered in the database 'default'.; line 2 pos 86
at org.apache.spark.sql.catalyst.catalog.SessionCatalog.failFunctionLookup(SessionCatalog.scala:787)
at org.apache.spark.sql.hive.HiveSessionCatalog.lookupFunction0(HiveSessionCatalog.scala:200)
at org.apache.spark.sql.hive.HiveSessionCatalog.lookupFunction(HiveSessionCatalog.scala:172)
at org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveFunctions$$anonfun$apply$13$$anonfun$applyOrElse$6$$anonfun$applyOrElse$39.apply(Analyzer.scala:884)
at org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveFunctions$$anonfun$apply$13$$anonfun$applyOrElse$6$$anonfun$applyOrElse$39.apply(Analyzer.scala:884)
at org.apache.spark.sql.catalyst.analysis.package$.withPosition(package.scala:48)
at org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveFunctions$$anonfun$apply$13$$anonfun
但是这行:
aDataFrame %>% mutate(newValue=toupper("hello"))
一切正常.有帮助吗?
推荐答案
我强烈建议您在继续之前阅读 sparklyr
文档.特别是,您将要阅读有关如何将R转换为SQL的部分( http://spark.rstudio.com/dplyr.html#sql_translation ).简而言之,R函数的一个非常有限的子集可用于 sparklyr
数据帧,并且 gsub
并不是这些函数之一(但 toupper
是).如果您确实需要 gsub
,则必须将数据收集
到本地数据帧中,然后对它进行 gsub
(您仍然可以使用 mutate
),然后 copy_to
返回火花.
I would strongly recommend you read the sparklyr
documentation before proceeding. In particular, you're going to want to read the section on how R is translated to SQL (http://spark.rstudio.com/dplyr.html#sql_translation). In short, a very limited subset of R functions are available for use on sparklyr
dataframes, and gsub
is not one of those functions (but toupper
is). If you really need gsub
you're going to have to collect
the data in to a local dataframe, then gsub
it (you can still use mutate
), then copy_to
back to spark.
这篇关于sparklyr:使用mutate函数创建新列的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!