如何csv文件加载到SparkR上RStudio? [英] How to load csv file into SparkR on RStudio?

查看:883
本文介绍了如何csv文件加载到SparkR上RStudio?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

你怎么csv文件加载到SparkR上RStudio?下面是我不得不执行对RStudio运行SparkR的步骤。我已经使用read.df阅读的.csv不知道怎么回事,写这个。不知道这一步被认为是创造RDDS。

  #SET SYS环境变量
Sys.setenv(SPARK_HOME =C:/Users/Desktop/spark/spark-1.4.1-bin-hadoop2.6)
.libPaths(三(file.path(Sys.getenv(SPARK_HOME),R,LIB),.libPaths()))#Sys.setenv('SPARKR_SUBMIT_ARGS'=' - 包com.databricks:火花csv_2.10:1.0.3sparkr壳')#Load库
库(SparkR)
库(magrittr)SC< - sparkR.init(主=本地)
SC< - sparkR.init()
SC< - sparkR.init(sparkPackages =com.databricks:火花csv_2.11:1.0.3)
sqlContext< - sparkRSQL.init(SC)数据< - read.df(sqlContext,C:/Users/Desktop/Dat​​aSets/hello_world.csv,com.databricks.spark.csv,标题=真)

我收到错误:

 错误writeJobj(CON,对象):无效jobj 1


解决方案

据我可以告诉你使用的错误版本火花CSV 。星火pre-内置版本使用Scala 2.10,但你使用Scala的2.11星火CSV。试试这个:

  SC<  -  sparkR.init(sparkPackages =com.databricks:火花csv_2.10:1.2.0)

How do you load csv file into SparkR on RStudio? Below are the steps I had to perform to run SparkR on RStudio. I have used read.df to read .csv not sure how else to write this. Not sure if this step is considered to create RDDs.

#Set sys environment variables
Sys.setenv(SPARK_HOME = "C:/Users/Desktop/spark/spark-1.4.1-bin-hadoop2.6")
.libPaths(c(file.path(Sys.getenv("SPARK_HOME"), "R", "lib"), .libPaths()))

#Sys.setenv('SPARKR_SUBMIT_ARGS'='"--packages" "com.databricks:spark-csv_2.10:1.0.3" "sparkr-shell"')

#Load libraries
library(SparkR)
library(magrittr)

sc <- sparkR.init(master="local")
sc <- sparkR.init()
sc <- sparkR.init(sparkPackages="com.databricks:spark-csv_2.11:1.0.3")
sqlContext <- sparkRSQL.init(sc)

data <- read.df(sqlContext, "C:/Users/Desktop/DataSets/hello_world.csv", "com.databricks.spark.csv", header="true")

I am getting error:

Error in writeJobj(con, object) : invalid jobj 1

解决方案

As far as I can tell you're using a wrong version of spark-csv. Pre-built versions of Spark are using Scala 2.10, but you're using Spark CSV for Scala 2.11. Try this instead:

sc <- sparkR.init(sparkPackages="com.databricks:spark-csv_2.10:1.2.0")

这篇关于如何csv文件加载到SparkR上RStudio?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆