Sparklyr无法从Dockerfile中的Apache下载Spark [英] Sparklyr fails to download Spark from apache in Dockerfile

查看:78
本文介绍了Sparklyr无法从Dockerfile中的Apache下载Spark的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试创建一个dockerfile,该文件从Rocker/tidyverse构建图像,并包含来自sparklyr的Spark.以前,在此信息上:无法在Dockerfile中使用sparklyr安装spark ,我试图弄清楚为什么火花不能从我的dockerfile下载.在玩了5天之后,我认为我已经找到了原因,但不知道如何解决.

I am trying to create a dockerfile that builds an image from Rocker/tidyverse and include Spark from sparklyr. Previously, on this post: Unable to install spark with sparklyr in Dockerfile, I was trying to figure out why spark wouldn't download from my dockerfile. After playing with it for the past 5 days I think I have found the reason but have no idea how to fix it.

这是我的Dockerfile:

Here is my Dockerfile:

# start with the most up-to-date tidyverse image as the base image
FROM rocker/tidyverse:latest

# install openjdk 8 (Java)
RUN apt-get update \
  && apt-get install -y openjdk-8-jdk

# Install devtools
RUN Rscript -e 'install.packages("devtools")'

# Install sparklyr
RUN Rscript -e 'devtools::install_version("sparklyr", version = "1.5.2", dependencies = TRUE)'

# Install spark
RUN Rscript -e 'sparklyr::spark_install(version = "3.0.0", hadoop_version = "3.2")'

RUN mv /root/spark /opt/ && \
    chown -R rstudio:rstudio /opt/spark/ && \
    ln -s /opt/spark/ /home/rstudio/

RUN apt-get install unixodbc unixodbc-dev --install-suggests
RUN apt-get install odbc-postgresql

RUN install2.r --error --deps TRUE DBI
RUN install2.r --error --deps TRUE RPostgres
RUN install2.r --error --deps TRUE dbplyr

在此行之前下载所有内容都没有问题:

It has no problem downloading everything up until this line:

RUN Rscript -e'sparklyr :: spark_install(version ="3.0.0&",hadoop_version ="3.2")'

然后给我错误:

Step 5/11 : RUN Rscript -e 'sparklyr::spark_install(version = "3.0.0", hadoop_version = "3.2")'
 ---> Running in 739775db8f12
Error in download.file(installInfo$packageRemotePath, destfile = installInfo$packageLocalPath,  : 
  download from 'https://archive.apache.org/dist/spark/spark-3.0.0/spark-3.0.0-bin-hadoop3.2.tgz' failed
Calls: <Anonymous>
Execution halted
ERROR: Service 'rocker_sparklyr' failed to build : The command '/bin/sh -c Rscript -e 'sparklyr::spark_install(version = "3.0.0", hadoop_version = "3.2")'' returned a non-zero code: 1

经过研究后,我认为这是超时错误,在这种情况下,我事先运行了:

After doing some research I thought that it was a timeout error, in which case I ran beforehand:

RUN Rscript -e'options(timeout = 600)'

这并没有增加再次出错的时间.我通过Rstudio将所有东西都安装到了我的个人计算机上,并且没有问题.我认为问题是特定于docker的,因为它无法从

This did not increase the time it took to error out again. I installed everything onto my personal machine through Rstudio and it installed with no problems. I think the problem is specific to docker in that it isn't able to download from https://archive.apache.org/dist/spark/spark-3.0.0/spark-3.0.0-bin-hadoop3.2.tgz

我发现有关此问题的文档很少,并且严重依赖此帖子来解决它.在此先感谢所有了解此知识的人.

I have found very little documentation on this problem and am relying heavily on this post to figure it out. Thank you in advance to anyone with this knowledge for reaching out.

推荐答案

自己下载该版本,然后使用此功能进行安装

download the version yourself and then use this function to install

sparklyr::spark_install_tar(tarfile ="~/spark/spark-3.0.1-bin-hadoop3.2.tgz")

这篇关于Sparklyr无法从Dockerfile中的Apache下载Spark的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆