Azure Databricks外部Hive Metastore [英] Azure Databricks external Hive Metastore

查看:76
本文介绍了Azure Databricks外部Hive Metastore的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我检查了有关Azure Databricks外部Hive Metastore(Azure SQL数据库)用法的[文档] [1].

I checked the [documentation][1] about usage of Azure Databricks external Hive Metastore (Azure SQL database).

我能够下载jar并将其放入/dbfs/hive_metastore_jar

I was able to download jars and place them into /dbfs/hive_metastore_jar

我的下一步是使用Init文件运行集群:

My next step is to run cluster with Init file:

# Hive-specific configuration options.
# spark.hadoop prefix is added to make sure these Hive specific options propagate to the metastore client.
# JDBC connect string for a JDBC metastore
spark.hadoop.javax.jdo.option.ConnectionURL jdbc:sqlserver://<host>.database.windows.net:1433;database=<database> #should I add more parameters?

# Username to use against metastore database
spark.hadoop.javax.jdo.option.ConnectionUserName admin

# Password to use against metastore database
spark.hadoop.javax.jdo.option.ConnectionPassword p@ssword

# Driver class name for a JDBC metastore
spark.hadoop.javax.jdo.option.ConnectionDriverName com.microsoft.sqlserver.jdbc.SQLServerDriver

# Spark specific configuration options
spark.sql.hive.metastore.version 2.7.3 #I am not sure about this
# Skip this one if <hive-version> is 0.13.x.
spark.sql.hive.metastore.jars /dbfs/hive_metastore_jar 

我已经将ini文件上传到DBMS并启动了集群.无法读取ini.有问题..[1]: https://docs.microsoft.com/en-us/azure/databricks/data/metastores/external-hive-metastore

I've uploaded ini file to the DBMS and launch cluster. It was failed to read ini. Something wrong.. [1]: https://docs.microsoft.com/en-us/azure/databricks/data/metastores/external-hive-metastore

推荐答案

我暂时解决了这个问题.我遇到的问题:

I solved this for now. The problems I faced:

  1. 我没有将Hive jars复制到本地集群.这很重要,我无法引用DBMS,而应该将 spark.sql.hive.metastore.jars 引用到Hive的本地副本.使用INI脚本,我可以复制它们.
  2. 连接很好.我还使用了
  1. I didn't copy Hive jars to the local cluster. This is important, I couldn't refer to the DBMS and should refer spark.sql.hive.metastore.jars to the local copy of Hive. With INI script I can copy them.
  2. connection was good. I also used the Azure template with Vnet, it is more preferable. Then I allow traffic for Azure SQL from my Vnet with databricks.
  3. last issue - I had to create Hive schema before start databricks by copy and run DDL from Git with Hive version 1.2 I deployed it into Azure SQL Database and then I was good to go.

有一个有用的笔记本,其中包含以下步骤的步骤:

There is a useful notebook with steps to download jars. It is downloading jars to tmp then we should copy it to the own folder. Finally, within cluster creation we should refer to INI script that has all parameters. It has the step of copy jars from DBFS to local file system of cluster.

// This example is for an init script named `external-metastore_hive121.sh`.
dbutils.fs.put(
    "dbfs:/databricks/scripts/external-metastore_hive121.sh",
    """#!/bin/sh
      |# A temporary workaround to make sure /dbfs is available.
      |sleep 10
      |# Copy metastore jars from DBFS to the local FileSystem of every node.
      |cp -r /dbfs/metastore_jars/hive-v1_2/* /databricks/hive_1_2_1_metastore_jars
      |# Loads environment variables to determine the correct JDBC driver to use.
      |source /etc/environment
      |# Quoting the label (i.e. EOF) with single quotes to disable variable interpolation.
      |cat << 'EOF' > /databricks/driver/conf/00-custom-spark.conf
      |[driver] {
      |    # Hive specific configuration options.
      |    # spark.hadoop prefix is added to make sure these Hive specific options will propagate to the metastore client.
      |    # JDBC connect string for a JDBC metastore
      |    "spark.hadoop.javax.jdo.option.ConnectionURL" = "jdbc:sqlserver://host--name.database.windows.net:1433;database=tcdatabricksmetastore_dev;encrypt=true;trustServerCertificate=false;hostNameInCertificate=*.database.windows.net"
      |
      |    # Username to use against metastore database
      |    "spark.hadoop.javax.jdo.option.ConnectionUserName" = "admin"
      |
      |    # Password to use against metastore database
      |    "spark.hadoop.javax.jdo.option.ConnectionPassword" = "P@ssword"
      |
      |    # Driver class name for a JDBC metastore
      |    "spark.hadoop.javax.jdo.option.ConnectionDriverName" = "com.microsoft.sqlserver.jdbc.SQLServerDriver"
      |
      |    # Spark specific configuration options
      |    "spark.sql.hive.metastore.version" = "1.2.1"
      |    # Skip this one if ${hive-version} is 0.13.x.
      |    "spark.sql.hive.metastore.jars" = "/databricks/hive_1_2_1_metastore_jars/*"
      |}
      |EOF
      |""".stripMargin,
    overwrite = true)

该命令将在DBFS中创建一个文件,我们将使用它作为集群创建的参考.

The command will create a file in DBFS and we will use it as a reference for the cluster creation.

根据文档,我们应该使用config:

According to the documentation, we should use config:

datanucleus.autoCreateSchema true
datanucleus.fixedDatastore false 

为了创建Hive DDL.它对我不起作用,这就是为什么我使用git自己创建模式和表的原因.

In order to create the Hive DDL. It didn't work for me, that's why I used git and create schema and tables myself.

您可以使用以下命令测试所有功能:

You can test that all works with command:

%sql show databases

这篇关于Azure Databricks外部Hive Metastore的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆