无法远程执行加载库"rhdfs"的R脚本. [英] Failed to remotely execute R script which loads library "rhdfs"

查看:68
本文介绍了无法远程执行加载库"rhdfs"的R脚本.的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在使用R-Hadoop进行项目,并且遇到了这个问题.

I'm working on a project using R-Hadoop, and got this problem.

我正在使用JAVA中的JSch SSH到远程hadoop伪集群,这是用于创建连接的Java代码的一部分.

I'm using JSch in JAVA to ssh to remote hadoop pseudo-cluster, and here are part of Java code to create connection.

/* Create a connection instance */
Connection conn = new Connection(hostname);
/* Now connect */
conn.connect();
/* Authenticate */
boolean isAuthenticated = conn.authenticateWithPassword(username, password);
if (isAuthenticated == false)
throw new IOException("Authentication failed.");
/* Create a session */
Session sess = conn.openSession();
//sess.execCommand("uname -a && date && uptime && who");
sess.execCommand("Rscript -e 'args1 <- \"Dell\"; args2 <- 1; source(\"/usr/local/R/mytest.R\")'");
//sess.execCommand("ls");
sess.waitForCondition(ChannelCondition.TIMEOUT, 50);

我尝试了几个简单的R脚本,并且我的代码工作正常.但是当涉及到R-Hadoop时,R脚本将停止运行.但是,如果我直接在远程服务器上运行Rscript -e 'args1 <- "Dell"; args2 <- 1; source("/usr/local/R/mytest.R")',一切都会很好.

I tried several simple R scripts, and my codes worked fine. But when it comes to R-Hadoop, the R script will stop running. But if I run Rscript -e 'args1 <- "Dell"; args2 <- 1; source("/usr/local/R/mytest.R")' directly in remote server, everything works fine.

这是我接受Hong Ooi的建议之后得到的: 我没有使用Rscript,而是使用了以下命令:

Here is what I got after taking Hong Ooi's suggestion: Instead of using Rscript, I used following command:

sess.execCommand("R CMD BATCH --no-save --no-restore '--args args1=\"Dell\" args2=1' /usr/local/R/mytest.R /usr/local/R/whathappened.txt");

在whathappened.txt中,出现以下错误:

And in the whathappened.txt, I got following error:

> args=(commandArgs(TRUE))
> for(i in 1:length(args)){
+      eval(parse(text=args[[i]]))
+ }
> source("/usr/local/R/main.R")
> main(args1,args2)
Loading required package: rJava
Error : .onLoad failed in loadNamespace() for 'rhdfs', details:
  call: fun(libname, pkgname)
  error: Environment variable HADOOP_CMD must be set before loading package rhdfs
Error: package/namespace load failed for 鈥榬hdfs鈥?
Execution halted

好吧,现在的问题更加清楚了.不幸的是,我对linux还是很陌生,不知道如何解决这个问题.

Well, now the problem is much clearer. Unfortunately, I'm pretty new to linux, and have no idea how to solve this.

推荐答案

好吧,我自己找到了另一个解决方案:

Well, I just found another solution by myself:

可以从R脚本中设置env,而不是从Hadoop集群外部关注env.

Instead of caring about env from outside Hadoop cluster, can set env in R scripts like:

Sys.setenv(HADOOP_HOME="put your HADOOP_HOME path here")
Sys.setenv(HADOOP_CMD="put your HADOOP_CMD path here")

library(rmr2)
library(rhdfs)

这篇关于无法远程执行加载库"rhdfs"的R脚本.的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆