将GitHub存储库导入Databricks社区版 [英] Import a GitHub repo into Databricks community edition
问题描述
我正在尝试从GitHub的公共存储库中导入一些数据,以便从我的Databricks笔记本中使用它们。
到目前为止,我尝试连接我的Databricks帐户按照
相同的问题。
在上导入和存储GitHub存储库的最佳方法是什么databricks社区版?
我设法使用 shell
命令解决了这个问题从笔记本本身。为了第一次检索存储库,我通过HTTPS进行了 git clone
:
%sh git clone https://github.com/SomeDataRepo/TheData.git --depth 1 --branch = master / dbfs / FileStore / TheData /
为什么不使用SSH?好吧,SSH需要设置在我的情况下不需要的SSH密钥。
最后,每次我需要新版本的数据时,我都执行 git pull
在执行我的程序之前:
%sh git -C / dbfs / FileStore / TheData /拉
I am trying to import some data from a public repo in GitHub so that to use it from my Databricks notebooks.
So far I tried to connect my Databricks account with my GitHub as described here, without results though since it seems that GitHub support comes with some non-community licensing. I get the following message when I try to set the GitHub token which is required for the GitHub integration:
The same question has been asked before on the official Databricks forum.
What is the best way to import and store a GitHub repo on databricks community edition?
I managed to solve this using shell
commands from the notebook itself. To retrieve the repository for the 1st time I did git clone
via HTTPS:
%sh git clone https://github.com/SomeDataRepo/TheData.git --depth 1 --branch=master /dbfs/FileStore/TheData/
Why not SSH? Well SSH requires to setup the SSH keys which was not necessary in my case.
Finally, every time that I need a fresh version of the data I execute a git pull
before executing my program:
%sh git -C /dbfs/FileStore/TheData/ pull
这篇关于将GitHub存储库导入Databricks社区版的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!