使用本地R(sparkr)接口在远程spark(EC2)上执行命令挂起 [英] Executing commands on remote spark(EC2) using local R(sparkr) interface hangs

查看:154
本文介绍了使用本地R(sparkr)接口在远程spark(EC2)上执行命令挂起的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述



我正在尝试使用SparkR(来自本地R-GUI)运行一些spark命令。为了在EC2上设置火花簇,我使用了(https://edgarsdatalab.com/2016/08/25/setup-a-spark-2-0-cluster-r-on-aws/)的大部分命令安装最新版本的修改很少。我所要做的就是使用SparkR软件包从我当地的R-GUI与远程spark(在EC2-Ubuntu上)进行交互。



**这里是我的设置(一步一步):**



1.我的PC上安装了带有R3.3.3和SparkR软件包的Windows 8.1。 />
2.我创建了一个AWS-EC2实例(免费套餐帐户),并使用了亚马逊现有的Ubuntu图像。

3.在我的本地PC上安装了PuTTy。使用PuTTy终端连接到Ubuntu-16(在EC2上)并将其用于下面的步骤4到10.

4.在EC2上安装Java然后spark-2.1.1-bin-hadoop2.7

5.添加以下.bashrc(/ home / ubuntu)



export SPARK_HOME =〜/ server / spark-2.1.1- bin-hadoop2.7



PATH = $ PATH:$ SPARK_HOME / bin



export PATH





6.加载修改后的文件。



。 .bashrc



7.在EC2-Ubuntu上安装R

8.我在EC2上创建了另一个实例(使用Ubuntu)并按照步骤4到6(上面)设置spark worker节点。

9.在第一个EC2实例(称之为Master实例)上,我使用start-master.sh启动了spark master。从web-ui获得master的URL for spark。

10.在第二个EC2实例(称为Slave实例)上,我使用start-slave.sh启动了spark slave并传递了spark master的URL。 />
11.然后在我的本地PC上启动R(GUI)。

12.从R跟随连接并执行spark中的命令。 (在下面的xx.yy.zz.aa中是spark master的公共IP地址)。



库(SparkR)



sparkR.session(master =spark://xx.yy.zz.aa:7077,sparkHome =/ home /ubuntu / server / spark2.1.1-bin-hadoop2.7 ,enableHiveSupport = FALSE)



ds< - createDataFrame(mtcars)## R变得没有响应



13.等待了足够长的时间后,我从Spark Web UI中删除了该进程。我收到以下错误(见截图):

[截图]



请救命。我究竟做错了什么?我怎样才能解决这个问题?我想做的就是使用R接口(本地PC)从本地PC使用远程spark。



谢谢,

SG



我尝试过:



- 在sparkR.Session()中,我尝试传递EC2第一个实例(master)的公共和私有地址。

- 我也尝试在两个EC2实例上安装R.甚至从两者中卸载R都没有用。

- 另外,尝试在相同的EC2-Ubuntu(第一个EC2)上启动spark master和slave。

- Ran R inside EC2-Ubuntu实例同时在同一EC2上运行主服务器和从服务器。没有任何工作

Hi,
I am trying to run few spark commands using SparkR (from local R-GUI). For setting up the spark cluster on EC2 I used most of the commands from ( https://edgarsdatalab.com/2016/08/25/setup-a-spark-2-0-cluster-r-on-aws/) with little modification to install the latest versions. All I was trying to do is to interact with remote spark(on EC2-Ubuntu) from my local R-GUI using SparkR package.

**Here is my setup (step by step):**

1. I have Windows 8.1 on my PC with R3.3.3 and SparkR package.
2. I created an AWS-EC2 instance (free tier account) and used existing Ubuntu image from Amazon.
3. Installed PuTTy on my local PC. Used PuTTy terminal to connect to Ubuntu-16 (on EC2) and used it for steps 4 to 10 below.
4. Installed Java and then spark-2.1.1-bin-hadoop2.7 on EC2
5. Added following to .bashrc (/home/ubuntu)

export SPARK_HOME=~/server/spark-2.1.1-bin-hadoop2.7

PATH=$PATH:$SPARK_HOME/bin

export PATH


6. Load modified file.

. .bashrc

7. Installed R on EC2-Ubuntu
8. I created another instance on EC2 (with Ubuntu) and followed steps 4 to 6(above) to set up spark worker node.
9. On the first EC2 instance (call it Master instance), I started spark master using start-master.sh. Got master's URL from web-ui for spark.
10. On the second EC2 instance (call it Slave instance), I started spark slave using start-slave.sh and passing spark master's URL.
11. Then launched R (GUI) on my local PC.
12. Ran following from R to connect and to execute commands in spark. (in the following xx.yy.zz.aa is the spark master's public ip address).

library(SparkR)

sparkR.session(master = "spark://xx.yy.zz.aa:7077", sparkHome = "/home/ubuntu/server/spark-2.1.1-bin-hadoop2.7", enableHiveSupport=FALSE)

ds <- createDataFrame(mtcars) ## R becomes unresponsive

13. when I killed the process from Spark web UI after waiting for long enough. I get following error (see screenshot):
[Screenshot]

Please help. what am I doing wrong? How can I fix this? All I want to do is to use remote spark from local PC using R interface(local PC).

Thanks,
SG

What I have tried:

- in sparkR.Session(), I tried passing public and private address of EC2 first instance(master).
- I also tried installing R on both EC2 instances. Even uninstalling R from both didn't work.
- Also, tried launching spark master and slave on same EC2-Ubuntu (the first EC2).
- Ran R inside EC2-Ubuntu instance that had both master and slave running on same EC2. Nothing worked

推荐答案

路径:


SPARK_HOME / bin



导出路径





6.加载修改后的文件。



。 .bashrc



7.在EC2-Ubuntu上安装R

8.我在EC2上创建了另一个实例(使用Ubuntu)并按照步骤4到6(上面)设置spark worker节点。

9.在第一个EC2实例(称之为Master实例)上,我使用start-master.sh启动了spark master。从web-ui获得master的URL for spark。

10.在第二个EC2实例(称为Slave实例)上,我使用start-slave.sh启动了spark slave并传递了spark master的URL。 />
11.然后在我的本地PC上启动R(GUI)。

12.从R跟随连接并执行spark中的命令。 (在下面的xx.yy.zz.aa中是spark master的公共IP地址)。



库(SparkR)



sparkR.session(master =spark://xx.yy.zz.aa:7077,sparkHome =/ home /ubuntu / server / spark2.1.1-bin-hadoop2.7 ,enableHiveSupport = FALSE)



ds< - createDataFrame(mtcars)## R变得没有响应



13.等待了足够长的时间后,我从Spark Web UI中删除了该进程。我收到以下错误(见截图):

[截图]



请救命。我究竟做错了什么?我怎样才能解决这个问题?我想做的就是使用R接口(本地PC)从本地PC使用远程spark。



谢谢,

SG



我尝试过:



- 在sparkR.Session()中,我尝试传递EC2第一个实例(master)的公共和私有地址。

- 我也尝试在两个EC2实例上安装R.甚至从两者中卸载R都没有用。

- 另外,尝试在相同的EC2-Ubuntu(第一个EC2)上启动spark master和slave。

- Ran R inside EC2-Ubuntu实例同时在同一EC2上运行主服务器和从服务器。什么都没有用
SPARK_HOME/bin

export PATH


6. Load modified file.

. .bashrc

7. Installed R on EC2-Ubuntu
8. I created another instance on EC2 (with Ubuntu) and followed steps 4 to 6(above) to set up spark worker node.
9. On the first EC2 instance (call it Master instance), I started spark master using start-master.sh. Got master's URL from web-ui for spark.
10. On the second EC2 instance (call it Slave instance), I started spark slave using start-slave.sh and passing spark master's URL.
11. Then launched R (GUI) on my local PC.
12. Ran following from R to connect and to execute commands in spark. (in the following xx.yy.zz.aa is the spark master's public ip address).

library(SparkR)

sparkR.session(master = "spark://xx.yy.zz.aa:7077", sparkHome = "/home/ubuntu/server/spark-2.1.1-bin-hadoop2.7", enableHiveSupport=FALSE)

ds <- createDataFrame(mtcars) ## R becomes unresponsive

13. when I killed the process from Spark web UI after waiting for long enough. I get following error (see screenshot):
[Screenshot]

Please help. what am I doing wrong? How can I fix this? All I want to do is to use remote spark from local PC using R interface(local PC).

Thanks,
SG

What I have tried:

- in sparkR.Session(), I tried passing public and private address of EC2 first instance(master).
- I also tried installing R on both EC2 instances. Even uninstalling R from both didn't work.
- Also, tried launching spark master and slave on same EC2-Ubuntu (the first EC2).
- Ran R inside EC2-Ubuntu instance that had both master and slave running on same EC2. Nothing worked


这篇关于使用本地R(sparkr)接口在远程spark(EC2)上执行命令挂起的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆