使用本地R(sparkr)接口在远程spark(EC2)上执行命令挂起 [英] Executing commands on remote spark(EC2) using local R(sparkr) interface hangs
问题描述
我正在尝试使用SparkR(来自本地R-GUI)运行一些spark命令。为了在EC2上设置火花簇,我使用了(https://edgarsdatalab.com/2016/08/25/setup-a-spark-2-0-cluster-r-on-aws/)的大部分命令安装最新版本的修改很少。我所要做的就是使用SparkR软件包从我当地的R-GUI与远程spark(在EC2-Ubuntu上)进行交互。
**这里是我的设置(一步一步):**
1.我的PC上安装了带有R3.3.3和SparkR软件包的Windows 8.1。 />
2.我创建了一个AWS-EC2实例(免费套餐帐户),并使用了亚马逊现有的Ubuntu图像。
3.在我的本地PC上安装了PuTTy。使用PuTTy终端连接到Ubuntu-16(在EC2上)并将其用于下面的步骤4到10.
4.在EC2上安装Java然后spark-2.1.1-bin-hadoop2.7
5.添加以下.bashrc(/ home / ubuntu)
export SPARK_HOME =〜/ server / spark-2.1.1- bin-hadoop2.7
PATH = $ PATH:$ SPARK_HOME / bin
export PATH
6.加载修改后的文件。
。 .bashrc
7.在EC2-Ubuntu上安装R
8.我在EC2上创建了另一个实例(使用Ubuntu)并按照步骤4到6(上面)设置spark worker节点。
9.在第一个EC2实例(称之为Master实例)上,我使用start-master.sh启动了spark master。从web-ui获得master的URL for spark。
10.在第二个EC2实例(称为Slave实例)上,我使用start-slave.sh启动了spark slave并传递了spark master的URL。 />
11.然后在我的本地PC上启动R(GUI)。
12.从R跟随连接并执行spark中的命令。 (在下面的xx.yy.zz.aa中是spark master的公共IP地址)。
库(SparkR)
sparkR.session(master =spark://xx.yy.zz.aa:7077,sparkHome =/ home /ubuntu / server / spark2.1.1-bin-hadoop2.7 ,enableHiveSupport = FALSE)
ds< - createDataFrame(mtcars)## R变得没有响应
13.等待了足够长的时间后,我从Spark Web UI中删除了该进程。我收到以下错误(见截图):
[截图]
请救命。我究竟做错了什么?我怎样才能解决这个问题?我想做的就是使用R接口(本地PC)从本地PC使用远程spark。
谢谢,
SG
我尝试过:
- 在sparkR.Session()中,我尝试传递EC2第一个实例(master)的公共和私有地址。
- 我也尝试在两个EC2实例上安装R.甚至从两者中卸载R都没有用。
- 另外,尝试在相同的EC2-Ubuntu(第一个EC2)上启动spark master和slave。
- Ran R inside EC2-Ubuntu实例同时在同一EC2上运行主服务器和从服务器。没有任何工作
Hi,
I am trying to run few spark commands using SparkR (from local R-GUI). For setting up the spark cluster on EC2 I used most of the commands from ( https://edgarsdatalab.com/2016/08/25/setup-a-spark-2-0-cluster-r-on-aws/) with little modification to install the latest versions. All I was trying to do is to interact with remote spark(on EC2-Ubuntu) from my local R-GUI using SparkR package.
**Here is my setup (step by step):**
1. I have Windows 8.1 on my PC with R3.3.3 and SparkR package.
2. I created an AWS-EC2 instance (free tier account) and used existing Ubuntu image from Amazon.
3. Installed PuTTy on my local PC. Used PuTTy terminal to connect to Ubuntu-16 (on EC2) and used it for steps 4 to 10 below.
4. Installed Java and then spark-2.1.1-bin-hadoop2.7 on EC2
5. Added following to .bashrc (/home/ubuntu)
export SPARK_HOME=~/server/spark-2.1.1-bin-hadoop2.7
PATH=$PATH:$SPARK_HOME/bin
export PATH
6. Load modified file.
. .bashrc
7. Installed R on EC2-Ubuntu
8. I created another instance on EC2 (with Ubuntu) and followed steps 4 to 6(above) to set up spark worker node.
9. On the first EC2 instance (call it Master instance), I started spark master using start-master.sh. Got master's URL from web-ui for spark.
10. On the second EC2 instance (call it Slave instance), I started spark slave using start-slave.sh and passing spark master's URL.
11. Then launched R (GUI) on my local PC.
12. Ran following from R to connect and to execute commands in spark. (in the following xx.yy.zz.aa is the spark master's public ip address).
library(SparkR)
sparkR.session(master = "spark://xx.yy.zz.aa:7077", sparkHome = "/home/ubuntu/server/spark-2.1.1-bin-hadoop2.7", enableHiveSupport=FALSE)
ds <- createDataFrame(mtcars) ## R becomes unresponsive
13. when I killed the process from Spark web UI after waiting for long enough. I get following error (see screenshot):
[Screenshot]
Please help. what am I doing wrong? How can I fix this? All I want to do is to use remote spark from local PC using R interface(local PC).
Thanks,
SG
What I have tried:
- in sparkR.Session(), I tried passing public and private address of EC2 first instance(master).
- I also tried installing R on both EC2 instances. Even uninstalling R from both didn't work.
- Also, tried launching spark master and slave on same EC2-Ubuntu (the first EC2).
- Ran R inside EC2-Ubuntu instance that had both master and slave running on same EC2. Nothing worked
推荐答案
路径:
SPARK_HOME / bin
导出路径
6.加载修改后的文件。
。 .bashrc
7.在EC2-Ubuntu上安装R
8.我在EC2上创建了另一个实例(使用Ubuntu)并按照步骤4到6(上面)设置spark worker节点。
9.在第一个EC2实例(称之为Master实例)上,我使用start-master.sh启动了spark master。从web-ui获得master的URL for spark。
10.在第二个EC2实例(称为Slave实例)上,我使用start-slave.sh启动了spark slave并传递了spark master的URL。 />
11.然后在我的本地PC上启动R(GUI)。
12.从R跟随连接并执行spark中的命令。 (在下面的xx.yy.zz.aa中是spark master的公共IP地址)。
库(SparkR)
sparkR.session(master =spark://xx.yy.zz.aa:7077,sparkHome =/ home /ubuntu / server / spark2.1.1-bin-hadoop2.7 ,enableHiveSupport = FALSE)
ds< - createDataFrame(mtcars)## R变得没有响应
13.等待了足够长的时间后,我从Spark Web UI中删除了该进程。我收到以下错误(见截图):
[截图]
请救命。我究竟做错了什么?我怎样才能解决这个问题?我想做的就是使用R接口(本地PC)从本地PC使用远程spark。
谢谢,
SG
我尝试过:
- 在sparkR.Session()中,我尝试传递EC2第一个实例(master)的公共和私有地址。
- 我也尝试在两个EC2实例上安装R.甚至从两者中卸载R都没有用。
- 另外,尝试在相同的EC2-Ubuntu(第一个EC2)上启动spark master和slave。
- Ran R inside EC2-Ubuntu实例同时在同一EC2上运行主服务器和从服务器。什么都没有用
SPARK_HOME/bin
export PATH
6. Load modified file.
. .bashrc
7. Installed R on EC2-Ubuntu
8. I created another instance on EC2 (with Ubuntu) and followed steps 4 to 6(above) to set up spark worker node.
9. On the first EC2 instance (call it Master instance), I started spark master using start-master.sh. Got master's URL from web-ui for spark.
10. On the second EC2 instance (call it Slave instance), I started spark slave using start-slave.sh and passing spark master's URL.
11. Then launched R (GUI) on my local PC.
12. Ran following from R to connect and to execute commands in spark. (in the following xx.yy.zz.aa is the spark master's public ip address).
library(SparkR)
sparkR.session(master = "spark://xx.yy.zz.aa:7077", sparkHome = "/home/ubuntu/server/spark-2.1.1-bin-hadoop2.7", enableHiveSupport=FALSE)
ds <- createDataFrame(mtcars) ## R becomes unresponsive
13. when I killed the process from Spark web UI after waiting for long enough. I get following error (see screenshot):
[Screenshot]
Please help. what am I doing wrong? How can I fix this? All I want to do is to use remote spark from local PC using R interface(local PC).
Thanks,
SG
What I have tried:
- in sparkR.Session(), I tried passing public and private address of EC2 first instance(master).
- I also tried installing R on both EC2 instances. Even uninstalling R from both didn't work.
- Also, tried launching spark master and slave on same EC2-Ubuntu (the first EC2).
- Ran R inside EC2-Ubuntu instance that had both master and slave running on same EC2. Nothing worked
这篇关于使用本地R(sparkr)接口在远程spark(EC2)上执行命令挂起的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!