通过 RStudio 加载 com.databricks.spark.csv [英] Loading com.databricks.spark.csv via RStudio
问题描述
我已经安装了 Spark-1.4.0.我还安装了它的 R 包 SparkR,我可以通过 Spark-shell 和 RStudio 使用它,但是,我无法解决一个差异.
I have installed Spark-1.4.0. I have also installed its R package SparkR and I am able to use it via Spark-shell and via RStudio, however, there is one difference I can not solve.
启动 SparkR-shell 时
When launching the SparkR-shell
./bin/sparkR --master local[7] --packages com.databricks:spark-csv_2.10:1.0.3
我可以按如下方式读取 .csv 文件
I can read a .csv-file as follows
flights <- read.df(sqlContext, "data/nycflights13.csv", "com.databricks.spark.csv", header="true")
不幸的是,当我通过 RStudio 启动 SparkR(正确设置我的 SPARK_HOME)时,我收到以下错误消息:
Unfortunately, when I start SparkR via RStudio (correctly setting my SPARK_HOME) I get the following error message:
15/06/16 16:18:58 ERROR RBackendHandler: load on 1 failed
Caused by: java.lang.RuntimeException: Failed to load class for data source: com.databricks.spark.csv
我知道我应该以某种方式加载 com.databricks:spark-csv_2.10:1.0.3,但我不知道该怎么做.有人可以帮我吗?
I know I should load com.databricks:spark-csv_2.10:1.0.3 in a way, but I have no idea how to do this. Could someone help me?
推荐答案
这是正确的语法(经过数小时的尝试):(注意 - 您必须专注于第一行.注意双引号)
This is the right syntax (after hours of trying): (Note - You've to focus on the first line. Notice to double-quotes)
Sys.setenv('SPARKR_SUBMIT_ARGS'='"--packages" "com.databricks:spark-csv_2.10:1.0.3" "sparkr-shell"')
library(SparkR)
library(magrittr)
# Initialize SparkContext and SQLContext
sc <- sparkR.init(appName="SparkR-Flights-example")
sqlContext <- sparkRSQL.init(sc)
# The SparkSQL context should already be created for you as sqlContext
sqlContext
# Java ref type org.apache.spark.sql.SQLContext id 1
# Load the flights CSV file using `read.df`. Note that we use the CSV reader Spark package here.
flights <- read.df(sqlContext, "nycflights13.csv", "com.databricks.spark.csv", header="true")
这篇关于通过 RStudio 加载 com.databricks.spark.csv的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!