sparklyr我可以将格式和路径选项传递到spark_write_table吗?或将saveAsTable与spark_write_orc一起使用? [英] sparklyr can I pass format and path options into spark_write_table? or use saveAsTable with spark_write_orc?

查看：368 发布时间：2020/9/4 20:41:55 r apache-spark hive apache-spark-sql sparklyr

本文介绍了sparklyr我可以将格式和路径选项传递到spark_write_table吗?或将saveAsTable与spark_write_orc一起使用?的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

带有Hive的Spark 2.0

Spark 2.0 with Hive

假设我正在尝试将Spark数据框irisDf写入orc 并将其保存到配置单元metastore中

Let's say I am trying to write a spark dataframe, irisDf to orc and save it to the hive metastore

在Spark中，我会这样做，

In Spark I would do that like this,

irisDf.write.format("orc") .mode("overwrite") .option("path", "s3://my_bucket/iris/") .saveAsTable("my_database.iris")

在sparklyr中，我可以使用spark_write_table函数，

In sparklyr I can use the spark_write_tablefunction,

data("iris") iris_spark <- copy_to(sc, iris, name = "iris") output <- spark_write_table( iris ,name = 'my_database.iris' ,mode = 'overwrite' )

但这不允许我设置path或format

我也可以使用spark_write_orc

spark_write_orc( iris , path = "s3://my_bucket/iris/" , mode = "overwrite" )

但没有saveAsTable选项

现在，我可以使用invoke语句复制Spark代码，

Now, I CAN use invoke statements to replicate the Spark code,

sdf <- spark_dataframe(iris_spark) writer <- invoke(sdf, "write") writer %>% invoke('format', 'orc') %>% invoke('mode', 'overwrite') %>% invoke('option','path', "s3://my_bucket/iris/") %>% invoke('saveAsTable',"my_database.iris")

但是我想知道是否有替代方法将format和path选项传递到spark_write_table或将saveAsTable选项传递给spark_write_orc?

But I am wondering if there is anyway to instead pass the format and path options into spark_write_table or the saveAsTable option into spark_write_orc?

推荐答案

path，该参数等效于本机DataFrameWriter中的options调用:

path can be set using options argument, which is equivalent to options call in the native DataFrameWriter:

spark_write_table( iris_spark, name = 'my_database.iris', mode = 'overwrite', options = list(path = "s3a://my_bucket/iris/") )

在Spark中，默认情况下，这将创建一个表，该表存储为path上的 Parquet (分区子目录可以用partition_by参数指定.)

By default in Spark, this will create a table stored as Parquet at path (partition subdirectories can be specified with the partition_by argument).

到目前为止，尚无此类格式选项，但一个简单的解决方法是在运行时设置spark.sessionState.conf.defaultDataSourceName属性

As of today there is no such option for format, but an easy workaround is to set spark.sessionState.conf.defaultDataSourceName property, either on runtime

spark_session_config( sc, "spark.sessionState.conf.defaultDataSourceName", "orc" )

或创建会话时.

这篇关于sparklyr我可以将格式和路径选项传递到spark_write_table吗?或将saveAsTable与spark_write_orc一起使用?的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

sparklyr我可以将格式和路径选项传递到spark_write_table吗?或将saveAsTable与spark_write_orc一起使用? [英] sparklyr can I pass format and path options into spark_write_table? or use saveAsTable with spark_write_orc?

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录关闭

sparklyr我可以将格式和路径选项传递到spark_write_table吗?或将saveAsTable与spark_write_orc一起使用? [英] sparklyr can I pass format and path options into spark_write_table? or use saveAsTable with spark_write_orc?

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录 关闭

登录关闭