如何知道Databricks支持的文件格式? [英] How to know the file formats supported by Databricks?

查看:140
本文介绍了如何知道Databricks支持的文件格式?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我需要将各种文件(不同类型)加载到spark数据帧中. Databricks是否支持所有这些文件格式?如果是,我在哪里可以获得每种文件格式支持的选项列表?

delimited
csv
parquet
avro
excel
json

谢谢

解决方案

我不完全了解Databricks提供的现成功能(预安装),但是您可以进行一些反向工程 >使用 YMMV .

除非您在Databricks上找到权威的答案,否则您可能想要(按照 解决方案

I don't know exactly what Databricks offers out of the box (pre-installed), but you can do some reverse-engineering using org.apache.spark.sql.execution.datasources.DataSource object that is (quoting the scaladoc):

The main class responsible for representing a pluggable Data Source in Spark SQL

All data sources usually register themselves using DataSourceRegister interface (and use shortName to provide their alias):

Data sources should implement this trait so that they can register an alias to their data source.

Reading along the scaladoc of DataSourceRegister you'll find out that:

This allows users to give the data source alias as the format type over the fully qualified class name.

So, YMMV.

Unless you find an authoritative answer on Databricks, you may want to (follow DataSource.lookupDataSource and) use Java's ServiceLoader.load method to find all registered implementations of DataSourceRegister interface.

// start a Spark application with external module with a separate DataSource
$ ./bin/spark-shell --packages org.apache.spark:spark-sql-kafka-0-10_2.11:2.3.0-SNAPSHOT

import java.util.ServiceLoader
import org.apache.spark.sql.sources.DataSourceRegister

val formats = ServiceLoader.load(classOf[DataSourceRegister])

import scala.collection.JavaConverters._
scala> formats.asScala.map(_.shortName).foreach(println)
orc
hive
libsvm
csv
jdbc
json
parquet
text
console
socket
kafka


Where can I get the list of options supported for each file format?

That's not possible as there is no API to follow (like in Spark MLlib) to define options. Every format does this on its own...unfortunately and your best bet is to read the documentation or (more authoritative) the source code.

这篇关于如何知道Databricks支持的文件格式?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆