每种格式的写入或读取选项的参考在哪里? [英] Where is the reference for options for writing or reading per format?

查看:79
本文介绍了每种格式的写入或读取选项的参考在哪里?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我使用Spark 1.6.1.

I use Spark 1.6.1.

我们正在尝试使用HiveContext和DataFrameWriter将ORC文件写入HDFS.虽然我们可以使用

We are trying to write an ORC file to HDFS using HiveContext and DataFrameWriter. While we can use

df.write().orc(<path>)

我们宁愿做类似的事情

df.write().options(Map("format" -> "orc", "path" -> "/some_path")

因此,我们可以根据使用此帮助程序库的应用程序灵活地更改格式或根路径.我们在哪里可以找到可以传递给DataFrameWriter的选项的引用?我在这里的文档中找不到任何内容

This is so that we have the flexibility to change the format or root path depending on the application that uses this helper library. Where can we find a reference to the options that can be passed into the DataFrameWriter? I found nothing in the docs here

https://spark.apache.org/docs/1.6.0/api/java/org/apache/spark/sql/DataFrameWriter.html#options(java.util.Map)

推荐答案

我们在哪里可以找到可以传递给DataFrameWriter的选项的引用?

Where can we find a reference to the options that can be passed into the DataFrameWriter?

最权威,最权威的答案是来源:

The most definitive and authoritative answer are the sources:

  • CSVOptions
  • JDBCOptions
  • JSONOptions
  • ParquetOptions
  • TextOptions
  • OrcOptions
  • ...

您可能会在文档中找到一些描述,但是没有一个页面(可以从源代码中自动生成以保持最新状态).

Some description you may find in the docs, but there is no single page (that could possibly be auto-generated from the sources to stay up-to-date the most).

原因是选项与格式实现是有意分离的,以具有您希望为每个用例提供的灵活性(如您所适当指出的那样):

The reason being that the options are separated from the format implementation on purpose to have the flexibility you want to offer per use case (as you duly noted):

因此,我们可以根据使用此帮助程序库的应用程序灵活地更改格式或根路径.

This is so that we have the flexibility to change the format or root path depending on the application that uses this helper library.


您的问题似乎类似于如何知道Databricks支持的文件格式?我在哪里说:


Your question seems similar to How to know the file formats supported by Databricks? where I said:

在哪里可以获得每种文件格式支持的选项列表?

Where can I get the list of options supported for each file format?

这是不可能的,因为没有没有的API可以定义选项(例如Spark MLlib中的API).每种格式都可以单独执行此操作...不幸的是,最好的选择是阅读文档或(更权威的)源代码.

That's not possible as there is no API to follow (like in Spark MLlib) to define options. Every format does this on its own...unfortunately and your best bet is to read the documentation or (more authoritative) the source code.

这篇关于每种格式的写入或读取选项的参考在哪里?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆