Databricks(Spark):.egg依赖项未自动安装? [英] Databricks (Spark): .egg dependencies not installed automatically?
问题描述
我有一个本地创建的.egg
程序包,该程序包依赖于boto==2.38.0.
我使用setuptools创建了构建发行版.一切都在我自己的本地环境中运行,因为它可以从PiP
正确获取boto
.但是,在databricks
上,当我将库附加到群集时,它不会自动获取依赖项.
I have a locally created .egg
package that depends on boto==2.38.0.
I used setuptools to create the build distribution. Everything works in my own local environment, as it fetches boto
correctly from PiP
. However on databricks
it does not automatically fetch dependencies when I attach a library to the cluster.
我真的很努力了几天,试图在加载到数据块上时自动安装依赖项,我使用setuptools;
'install_requires=['boto==2.38.0']'
是相关领域.
I really struggled now for a few days trying to install a dependency automatically when loaded on databricks, I use setuptools;
'install_requires=['boto==2.38.0']'
is the relevant field.
当我直接从databricks
服务器上的PyPi
安装boto
(因此不依赖install_requires
字段正常工作)然后调用我自己的.egg
时,它确实识别出boto
是一个程序包,但无法识别其任何模块(因为它不是在我自己的.egg的命名空间中导入的??).因此,我无法使.egg
正常工作.如果这个问题在没有任何解决方案的情况下仍然存在,那么我认为对于databricks
用户而言,这是一个非常大的问题.当然应该有解决方案...
When I install boto
directly from PyPi
on the databricks
server (so not relying on the install_requires
field to work properly) and then call my own .egg
, it does recognize that boto
is a package, but it does not recognize any of its modules (since it is not imported on my own .egg's namespace???). So I cannot get my .egg
to work. If this problem persists without having any solutions I'd think that is a really big problem for databricks
users right now. There should be a solution of course...
谢谢!
推荐答案
如果您的应用程序的依赖项是多种多样的并且没有统一的语言支持,则通常无法正常工作. Databrick文档对此进行解释
Your application's dependencies will not, in general, work properly if they are diverse and don't have uniform language support. The Databrick docs explain that
如果库同时支持Python 2和3,则Databricks将安装正确的版本.如果库不支持Python 3,则库连接将失败并显示错误.
Databricks will install the correct version if the library supports both Python 2 and 3. If the library does not support Python 3 then library attachment will fail with an error.
在这种情况下,将库附加到群集时,它将不会自动获取依赖项.
In this case it will not automatically fetch dependencies when you attach a library to the cluster.
这篇关于Databricks(Spark):.egg依赖项未自动安装?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!