在Azure数据砖中创建外部表 [英] Create External table in Azure databricks

查看:128
本文介绍了在Azure数据砖中创建外部表的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我不熟悉Azure数据砖,并尝试创建一个外部表,指向Azure Data Lake Storage(ADLS)第2代位置.

I am new to azure databricks and trying to create an external table, pointing to Azure Data Lake Storage (ADLS) Gen-2 location.

在databricks笔记本中,我尝试为ADLS访问设置火花配置.仍然无法执行创建的DDL.

From databricks notebook i have tried to set the spark configuration for ADLS access. Still i am unable to execute the DDL created.

注意:一种适用于我的解决方案是将ADLS帐户挂载到集群,然后使用外部表的DDL中的挂载位置.但是我需要检查是否有可能在没有安装位置的情况下创建带有ADLS路径的外部表DDL.

Note: One solution working for me is mounting the ADLS account to cluster and then use the mount location in external table's DDL. But i needed to check if it is possible to create a external table DDL with ADLS path without mount location.

# Using Principal credentials
spark.conf.set("dfs.azure.account.auth.type", "OAuth")
spark.conf.set("dfs.azure.account.oauth.provider.type", "ClientCredential")
spark.conf.set("dfs.azure.account.oauth2.client.id", "client_id")
spark.conf.set("dfs.azure.account.oauth2.client.secret", "client_secret")
spark.conf.set("dfs.azure.account.oauth2.client.endpoint", 
"https://login.microsoftonline.com/tenant_id/oauth2/token")

DDL

create external table test(
id string,
name string
)
partitioned by (pt_batch_id bigint, pt_file_id integer)
STORED as parquet
location 'abfss://container@account_name.dfs.core.windows.net/dev/data/employee

收到错误

Error in SQL statement: AnalysisException: org.apache.hadoop.hive.ql.metadata.HiveException: MetaException(message:Got exception: shaded.databricks.v20180920_b33d810.org.apache.hadoop.fs.azurebfs.contracts.exceptions.ConfigurationPropertyNotFoundException Configuration property account_name.dfs.core.windows.net not found.);

我需要帮助,以了解这是否有可能直接在DDL中引用ADLS的位置?

I need help in knowing if this is possible to refer to ADLS location directly in DDL?

谢谢.

推荐答案

一旦Azure Data Lake存储混乱,您就可以执行此操作.

You can perform this operation, once the Azure Data lake storage is confiruged.

如果希望Databricks工作区中的所有用户都可以访问已安装的Azure Data Lake Storage Gen2帐户,则应使用以下描述的方法创建安装点.您用于访问Azure Data Lake Storage Gen2帐户的服务客户端应仅被授予对该Azure Data Lake Storage Gen2帐户的访问权限;不应授予它对Azure中其他资源的访问权限.

You should create a mount point using the method described below, if you want all users in the Databricks workspace to have access to the mounted Azure Data Lake Storage Gen2 account. The service client that you use to access the Azure Data Lake Storage Gen2 account should be granted access only to that Azure Data Lake Storage Gen2 account; it should not be granted access to other resources in Azure.

一旦通过集群创建了挂载点,该集群的用户就可以立即访问该挂载点.要在另一个正在运行的群集中使用安装点,用户必须在该正在运行的群集上运行dbutils.fs.refreshMounts()才能使新创建的安装点可供使用.

Once a mount point is created through a cluster, users of that cluster can immediately access the mount point. To use the mount point in another running cluster, users must run dbutils.fs.refreshMounts() on that running cluster to make the newly created mount point available for use.

有三种从Databricks群集访问Azure Data Lake Storage Gen2的主要方法:

There are three primary ways of accessing Azure Data Lake Storage Gen2 from a Databricks cluster:

  1. 使用具有委派权限和OAuth 2.0的服务主体将Azure Data Lake Storage Gen2文件系统安装到DBFS.
  2. 直接使用服务主体.
  3. 直接使用Azure Data Lake Storage Gen2存储帐户访问密钥.

有关更多详细信息,请参阅" Azure Data Lake Storage Gen2 ".

For more details, refer "Azure Data Lake Storage Gen2".

希望这会有所帮助.

这篇关于在Azure数据砖中创建外部表的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆