如何csv文件加载到SparkR上RStudio? [英] How to load csv file into SparkR on RStudio?
本文介绍了如何csv文件加载到SparkR上RStudio?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!
问题描述
你怎么csv文件加载到SparkR上RStudio?下面是我不得不执行对RStudio运行SparkR的步骤。我已经使用read.df阅读的.csv不知道怎么回事,写这个。不知道这一步被认为是创造RDDS。
#SET SYS环境变量
Sys.setenv(SPARK_HOME =C:/Users/Desktop/spark/spark-1.4.1-bin-hadoop2.6)
.libPaths(三(file.path(Sys.getenv(SPARK_HOME),R,LIB),.libPaths()))#Sys.setenv('SPARKR_SUBMIT_ARGS'=' - 包com.databricks:火花csv_2.10:1.0.3sparkr壳')#Load库
库(SparkR)
库(magrittr)SC< - sparkR.init(主=本地)
SC< - sparkR.init()
SC< - sparkR.init(sparkPackages =com.databricks:火花csv_2.11:1.0.3)
sqlContext< - sparkRSQL.init(SC)数据< - read.df(sqlContext,C:/Users/Desktop/DataSets/hello_world.csv,com.databricks.spark.csv,标题=真)
我收到错误:
错误writeJobj(CON,对象):无效jobj 1
解决方案
据我可以告诉你使用的错误版本火花CSV
。星火pre-内置版本使用Scala 2.10,但你使用Scala的2.11星火CSV。试试这个:
SC< - sparkR.init(sparkPackages =com.databricks:火花csv_2.10:1.2.0)
How do you load csv file into SparkR on RStudio? Below are the steps I had to perform to run SparkR on RStudio. I have used read.df to read .csv not sure how else to write this. Not sure if this step is considered to create RDDs.
#Set sys environment variables
Sys.setenv(SPARK_HOME = "C:/Users/Desktop/spark/spark-1.4.1-bin-hadoop2.6")
.libPaths(c(file.path(Sys.getenv("SPARK_HOME"), "R", "lib"), .libPaths()))
#Sys.setenv('SPARKR_SUBMIT_ARGS'='"--packages" "com.databricks:spark-csv_2.10:1.0.3" "sparkr-shell"')
#Load libraries
library(SparkR)
library(magrittr)
sc <- sparkR.init(master="local")
sc <- sparkR.init()
sc <- sparkR.init(sparkPackages="com.databricks:spark-csv_2.11:1.0.3")
sqlContext <- sparkRSQL.init(sc)
data <- read.df(sqlContext, "C:/Users/Desktop/DataSets/hello_world.csv", "com.databricks.spark.csv", header="true")
I am getting error:
Error in writeJobj(con, object) : invalid jobj 1
解决方案
As far as I can tell you're using a wrong version of spark-csv
. Pre-built versions of Spark are using Scala 2.10, but you're using Spark CSV for Scala 2.11. Try this instead:
sc <- sparkR.init(sparkPackages="com.databricks:spark-csv_2.10:1.2.0")
这篇关于如何csv文件加载到SparkR上RStudio?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!
查看全文