Databricks(Spark):.egg依赖项未自动安装? [英] Databricks (Spark): .egg dependencies not installed automatically?

查看:132
本文介绍了Databricks(Spark):.egg依赖项未自动安装?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个本地创建的.egg程序包,该程序包依赖于boto==2.38.0.我使用setuptools创建了构建发行版.一切都在我自己的本地环境中运行,因为它可以从PiP正确获取boto.但是,在databricks上,当我将库附加到群集时,它不会自动获取依赖项.

I have a locally created .egg package that depends on boto==2.38.0. I used setuptools to create the build distribution. Everything works in my own local environment, as it fetches boto correctly from PiP. However on databricks it does not automatically fetch dependencies when I attach a library to the cluster.

我真的很努力了几天,试图在加载到数据块上时自动安装依赖项,我使用setuptools; 'install_requires=['boto==2.38.0']'是相关领域.

I really struggled now for a few days trying to install a dependency automatically when loaded on databricks, I use setuptools; 'install_requires=['boto==2.38.0']' is the relevant field.

当我直接从databricks服务器上的PyPi安装boto(因此不依赖install_requires字段正常工作)然后调用我自己的.egg时,它确实识别出boto是一个程序包,但无法识别其任何模块(因为它不是在我自己的.egg的命名空间中导入的??).因此,我无法使.egg正常工作.如果这个问题在没有任何解决方案的情况下仍然存在,那么我认为对于databricks用户而言,这是一个非常大的问题.当然应该有解决方案...

When I install boto directly from PyPi on the databricks server (so not relying on the install_requires field to work properly) and then call my own .egg, it does recognize that boto is a package, but it does not recognize any of its modules (since it is not imported on my own .egg's namespace???). So I cannot get my .egg to work. If this problem persists without having any solutions I'd think that is a really big problem for databricks users right now. There should be a solution of course...

谢谢!

推荐答案

如果您的应用程序的依赖项是多种多样的并且没有统一的语言支持,则通常无法正常工作. Databrick文档对此进行解释

Your application's dependencies will not, in general, work properly if they are diverse and don't have uniform language support. The Databrick docs explain that

如果库同时支持Python 2和3,则Databricks将安装正确的版本.如果库不支持Python 3,则库连接将失败并显示错误.

Databricks will install the correct version if the library supports both Python 2 and 3. If the library does not support Python 3 then library attachment will fail with an error.

在这种情况下,将库附加到群集时,它将不会自动获取依赖项.

In this case it will not automatically fetch dependencies when you attach a library to the cluster.

这篇关于Databricks(Spark):.egg依赖项未自动安装?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆