使用本地R（sparkr）接口在远程spark（EC2）上执行命令挂起 [英] Executing commands on remote spark(EC2) using local R(sparkr) interface hangs

查看：154 发布时间：2019/6/8 15:30:49 R AWS spark

本文介绍了使用本地R（sparkr）接口在远程spark（EC2）上执行命令挂起的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我正在尝试使用SparkR（来自本地R-GUI）运行一些spark命令。为了在EC2上设置火花簇，我使用了（https://edgarsdatalab.com/2016/08/25/setup-a-spark-2-0-cluster-r-on-aws/）的大部分命令安装最新版本的修改很少。我所要做的就是使用SparkR软件包从我当地的R-GUI与远程spark（在EC2-Ubuntu上）进行交互。

**这里是我的设置（一步一步）：**

1.我的PC上安装了带有R3.3.3和SparkR软件包的Windows 8.1。 />
2.我创建了一个AWS-EC2实例（免费套餐帐户），并使用了亚马逊现有的Ubuntu图像。

3.在我的本地PC上安装了PuTTy。使用PuTTy终端连接到Ubuntu-16（在EC2上）并将其用于下面的步骤4到10.

4.在EC2上安装Java然后spark-2.1.1-bin-hadoop2.7

5.添加以下.bashrc（/ home / ubuntu）

export SPARK_HOME =〜/ server / spark-2.1.1- bin-hadoop2.7

PATH = $ PATH：$ SPARK_HOME / bin

export PATH

6.加载修改后的文件。

。 .bashrc

7.在EC2-Ubuntu上安装R

8.我在EC2上创建了另一个实例（使用Ubuntu）并按照步骤4到6（上面）设置spark worker节点。

9.在第一个EC2实例（称之为Master实例）上，我使用start-master.sh启动了spark master。从web-ui获得master的URL for spark。

10.在第二个EC2实例（称为Slave实例）上，我使用start-slave.sh启动了spark slave并传递了spark master的URL。 />
11.然后在我的本地PC上启动R（GUI）。

12.从R跟随连接并执行spark中的命令。（在下面的xx.yy.zz.aa中是spark master的公共IP地址）。

库（SparkR）

sparkR.session（master =spark：//xx.yy.zz.aa：7077，sparkHome =/ home /ubuntu / server / spark2.1.1-bin-hadoop2.7 ，enableHiveSupport = FALSE）

ds< - createDataFrame（mtcars）## R变得没有响应

13.等待了足够长的时间后，我从Spark Web UI中删除了该进程。我收到以下错误（见截图）：

[截图]

请救命。我究竟做错了什么？我怎样才能解决这个问题？我想做的就是使用R接口（本地PC）从本地PC使用远程spark。

谢谢，

SG

我尝试过：

- 在sparkR.Session（）中，我尝试传递EC2第一个实例（master）的公共和私有地址。

- 我也尝试在两个EC2实例上安装R.甚至从两者中卸载R都没有用。

- 另外，尝试在相同的EC2-Ubuntu（第一个EC2）上启动spark master和slave。

- Ran R inside EC2-Ubuntu实例同时在同一EC2上运行主服务器和从服务器。没有任何工作

Hi,
I am trying to run few spark commands using SparkR (from local R-GUI). For setting up the spark cluster on EC2 I used most of the commands from ( https://edgarsdatalab.com/2016/08/25/setup-a-spark-2-0-cluster-r-on-aws/) with little modification to install the latest versions. All I was trying to do is to interact with remote spark(on EC2-Ubuntu) from my local R-GUI using SparkR package.

**Here is my setup (step by step):**

1. I have Windows 8.1 on my PC with R3.3.3 and SparkR package.
2. I created an AWS-EC2 instance (free tier account) and used existing Ubuntu image from Amazon.
3. Installed PuTTy on my local PC. Used PuTTy terminal to connect to Ubuntu-16 (on EC2) and used it for steps 4 to 10 below.
4. Installed Java and then spark-2.1.1-bin-hadoop2.7 on EC2
5. Added following to .bashrc (/home/ubuntu)

export SPARK_HOME=~/server/spark-2.1.1-bin-hadoop2.7

PATH=$PATH:$SPARK_HOME/bin

export PATH

6. Load modified file.

. .bashrc

7. Installed R on EC2-Ubuntu
8. I created another instance on EC2 (with Ubuntu) and followed steps 4 to 6(above) to set up spark worker node.
9. On the first EC2 instance (call it Master instance), I started spark master using start-master.sh. Got master's URL from web-ui for spark.
10. On the second EC2 instance (call it Slave instance), I started spark slave using start-slave.sh and passing spark master's URL.
11. Then launched R (GUI) on my local PC.
12. Ran following from R to connect and to execute commands in spark. (in the following xx.yy.zz.aa is the spark master's public ip address).

library(SparkR)

sparkR.session(master = "spark://xx.yy.zz.aa:7077", sparkHome = "/home/ubuntu/server/spark-2.1.1-bin-hadoop2.7", enableHiveSupport=FALSE)

ds <- createDataFrame(mtcars) ## R becomes unresponsive

13. when I killed the process from Spark web UI after waiting for long enough. I get following error (see screenshot):
[Screenshot]

Please help. what am I doing wrong? How can I fix this? All I want to do is to use remote spark from local PC using R interface(local PC).

Thanks,
SG

What I have tried:

- in sparkR.Session(), I tried passing public and private address of EC2 first instance(master).
- I also tried installing R on both EC2 instances. Even uninstalling R from both didn't work.
- Also, tried launching spark master and slave on same EC2-Ubuntu (the first EC2).
- Ran R inside EC2-Ubuntu instance that had both master and slave running on same EC2. Nothing worked

推荐答案

路径：

SPARK_HOME / bin

导出路径

6.加载修改后的文件。

。 .bashrc

7.在EC2-Ubuntu上安装R

8.我在EC2上创建了另一个实例（使用Ubuntu）并按照步骤4到6（上面）设置spark worker节点。

9.在第一个EC2实例（称之为Master实例）上，我使用start-master.sh启动了spark master。从web-ui获得master的URL for spark。

10.在第二个EC2实例（称为Slave实例）上，我使用start-slave.sh启动了spark slave并传递了spark master的URL。 />
11.然后在我的本地PC上启动R（GUI）。

12.从R跟随连接并执行spark中的命令。（在下面的xx.yy.zz.aa中是spark master的公共IP地址）。

库（SparkR）

sparkR.session（master =spark：//xx.yy.zz.aa：7077，sparkHome =/ home /ubuntu / server / spark2.1.1-bin-hadoop2.7 ，enableHiveSupport = FALSE）

ds< - createDataFrame（mtcars）## R变得没有响应

13.等待了足够长的时间后，我从Spark Web UI中删除了该进程。我收到以下错误（见截图）：

[截图]

请救命。我究竟做错了什么？我怎样才能解决这个问题？我想做的就是使用R接口（本地PC）从本地PC使用远程spark。

谢谢，

SG

我尝试过：

- 在sparkR.Session（）中，我尝试传递EC2第一个实例（master）的公共和私有地址。

- 我也尝试在两个EC2实例上安装R.甚至从两者中卸载R都没有用。

- 另外，尝试在相同的EC2-Ubuntu（第一个EC2）上启动spark master和slave。

- Ran R inside EC2-Ubuntu实例同时在同一EC2上运行主服务器和从服务器。什么都没有用

SPARK_HOME/bin

export PATH

6. Load modified file.

. .bashrc

7. Installed R on EC2-Ubuntu
8. I created another instance on EC2 (with Ubuntu) and followed steps 4 to 6(above) to set up spark worker node.
9. On the first EC2 instance (call it Master instance), I started spark master using start-master.sh. Got master's URL from web-ui for spark.
10. On the second EC2 instance (call it Slave instance), I started spark slave using start-slave.sh and passing spark master's URL.
11. Then launched R (GUI) on my local PC.
12. Ran following from R to connect and to execute commands in spark. (in the following xx.yy.zz.aa is the spark master's public ip address).

library(SparkR)

sparkR.session(master = "spark://xx.yy.zz.aa:7077", sparkHome = "/home/ubuntu/server/spark-2.1.1-bin-hadoop2.7", enableHiveSupport=FALSE)

ds <- createDataFrame(mtcars) ## R becomes unresponsive

13. when I killed the process from Spark web UI after waiting for long enough. I get following error (see screenshot):
[Screenshot]

Please help. what am I doing wrong? How can I fix this? All I want to do is to use remote spark from local PC using R interface(local PC).

Thanks,
SG

What I have tried:

- in sparkR.Session(), I tried passing public and private address of EC2 first instance(master).
- I also tried installing R on both EC2 instances. Even uninstalling R from both didn't work.
- Also, tried launching spark master and slave on same EC2-Ubuntu (the first EC2).
- Ran R inside EC2-Ubuntu instance that had both master and slave running on same EC2. Nothing worked

这篇关于使用本地R（sparkr）接口在远程spark（EC2）上执行命令挂起的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

使用本地R（sparkr）接口在远程spark（EC2）上执行命令挂起 [英] Executing commands on remote spark(EC2) using local R(sparkr) interface hangs

问题描述

推荐答案

相关文章

其他开发语言最新文章

热门教程

热门工具

登录关闭

使用本地R（sparkr）接口在远程spark（EC2）上执行命令挂起 [英] Executing commands on remote spark(EC2) using local R(sparkr) interface hangs

问题描述

推荐答案

相关文章

其他开发语言最新文章

热门教程

热门工具

登录 关闭

登录关闭