Databricks Connect:无法连接到 azure 上的远程集群,命令:“databricks-connect test"停止 [英] Databricks Connect: can't connect to remote cluster on azure, command: 'databricks-connect test' stops

查看:54
本文介绍了Databricks Connect:无法连接到 azure 上的远程集群,命令:“databricks-connect test"停止的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我尝试设置 Databricks Connect,以便能够与已在 Azure 上的 Workspace 上运行的远程 Databricks 集群一起使用.当我尝试运行命令时:'databricks-connect test' 它永远不会结束.

I try to set up Databricks Connect to be able work with remote Databricks Cluster already running on Workspace on Azure. When I try to run command: 'databricks-connect test' it never ends.

我遵循官方文档.

我已经在 3.7 版中安装了最新的 Anaconda.我已经创建了本地环境:畅达创建 --name dbconnect python=3.5

I've installed most recent Anaconda in version 3.7. I've created local environment: conda create --name dbconnect python=3.5

我在 5.1 版中安装了databricks-connect",它与我在 Azure Databricks 上的集群配置相匹配.

I've installed 'databricks-connect' in version 5.1 what matches configuration of my cluster on Azure Databricks.

    pip install -U databricks-connect==5.1.*

我已经设置了 'databricks-connect 配置如下:

I've already set 'databricks-connect configure as follows:

    (base) C:\>databricks-connect configure
    The current configuration is:
    * Databricks Host: ******.azuredatabricks.net
    * Databricks Token: ************************************
    * Cluster ID: ****-******-*******
    * Org ID: ****************
    * Port: 8787

在上述步骤之后,我尝试为数据块连接运行测试"命令:

After above steps I try to run 'test' command for databricks connect:

    databricks-connect test

结果程序在有关 MetricsSystem 的警告后开始和停止,如下所示:

and as a result procedure starts and stops after warning about MetricsSystem as it is visible below:

    (dbconnect) C:\>databricks-connect test
    * PySpark is installed at c:\users\miltad\appdata\local\continuum\anaconda3\envs\dbconnect\lib\site-packages\pyspark
    * Checking java version
    java version "1.8.0_181"
    Java(TM) SE Runtime Environment (build 1.8.0_181-b13)
    Java HotSpot(TM) 64-Bit Server VM (build 25.181-b13, mixed mode)
    * Testing scala command
    19/05/31 08:14:26 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
    Using Spark's default log4j profile: org/apache/spark/log4j-defaults.properties
    Setting default log level to "WARN".
    To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use setLogLevel(newLevel).
    19/05/31 08:14:34 WARN MetricsSystem: Using default name SparkStatusTracker for source because neither spark.metrics.namespace nor spark.app.id is set. 

我希望这个过程应该像官方 文档:

I expect that process should move to next steps like it is in official documentation:

    * Testing scala command
    18/12/10 16:38:44 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
    Using Spark's default log4j profile: org/apache/spark/log4j-defaults.properties
    Setting default log level to "WARN".
    To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use setLogLevel(newLevel).
    18/12/10 16:38:50 WARN MetricsSystem: Using default name SparkStatusTracker for source because neither spark.metrics.namespace nor spark.app.id is set.
    18/12/10 16:39:53 WARN SparkServiceRPCClient: Now tracking server state for 5abb7c7e-df8e-4290-947c-c9a38601024e, invalidating prev state
    18/12/10 16:39:59 WARN SparkServiceRPCClient: Syncing 129 files (176036 bytes) took 3003 ms
    Welcome to
          ____              __
         / __/__  ___ _____/ /__
        _\ \/ _ \/ _ `/ __/  '_/
       /___/ .__/\_,_/_/ /_/\_\   version 2.4.0-SNAPSHOT
          /_/

    Using Scala version 2.11.12 (Java HotSpot(TM) 64-Bit Server VM, Java 1.8.0_152)
    Type in expressions to have them evaluated.
    Type :help for more information.

所以我的进程在WARN MetricsSystem: Using default name SparkStatusTracker"之后停止.

So my process stops after 'WARN MetricsSystem: Using default name SparkStatusTracker'.

我做错了什么?我应该配置更多吗?

What am I doing wrong? Should I configure something more?

推荐答案

运行时 5.3 或更低版本似乎并未正式支持此功能.如果更新运行时有限制,我会确保 spark conf 设置如下:spark.databricks.service.server.enabled true但是,对于较旧的运行时,事情可能仍然不稳定.我建议使用运行时 5.5 或 6.1 或更高版本执行此操作.

Looks like this feature isn't officially supported on runtimes 5.3 or below. If there are limitations on updating the runtime, i would make sure the spark conf is set as follows: spark.databricks.service.server.enabled true However, with the older runtimes things still might be wonky. I would recommend doing this with runtime 5.5 or 6.1 or above.

这篇关于Databricks Connect:无法连接到 azure 上的远程集群,命令:“databricks-connect test"停止的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆