Spark 2.0.0:SparkR CSV导入 [英] Spark 2.0.0: SparkR CSV Import
问题描述
我正在尝试将CSV文件读取到SparkR(运行Spark 2.0.0)中-&尝试尝试新添加的功能.
I am trying to read a csv file into SparkR (running Spark 2.0.0) - & trying to experiment with the newly added features.
在这里使用RStudio.
Using RStudio here.
读取"源文件时出现错误.
I am getting an error while "reading" the source file.
我的代码:
Sys.setenv(SPARK_HOME = "C:/spark-2.0.0-bin-hadoop2.6")
library(SparkR, lib.loc = c(file.path(Sys.getenv("SPARK_HOME"), "R", "lib")))
sparkR.session(master = "local[*]", appName = "SparkR")
df <- loadDF("F:/file.csv", "csv", header = "true")
在loadDF函数上出现错误.
I get an error at at the loadDF function.
错误:
loadDF("F:/file.csv", "csv", header = "true")
invokeJava(isStatic = TRUE,className,methodName,...)中的错误:java.lang.reflect.InvocationTargetException在sun.reflect.NativeConstructorAccessorImpl.newInstance0(本机方法)处在sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)在sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)在java.lang.reflect.Constructor.newInstance(Constructor.java:422)在org.apache.spark.sql.hive.client.IsolatedClientLoader.createClient(IsolatedClientLoader.scala:258)在org.apache.spark.sql.hive.HiveUtils $ .newClientForMetadata(HiveUtils.scala:359)在org.apache.spark.sql.hive.HiveUtils $ .newClientForMetadata(HiveUtils.scala:263)在org.apache.spark.sql.hive.HiveSharedState.metadataHive $ lzycompute(HiveSharedState.scala:39)在org.apache.spark.sql.hive.HiveSharedState.metadataHive(HiveSharedState.scala:38)在org.apache.spark.sql.hive.HiveSharedState.externalCatalog $ lzycompute(HiveSharedState.scala:46)在org.apache.spark.sql.hive.HiveSharedSt
Error in invokeJava(isStatic = TRUE, className, methodName, ...) : java.lang.reflect.InvocationTargetException at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62) at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45) at java.lang.reflect.Constructor.newInstance(Constructor.java:422) at org.apache.spark.sql.hive.client.IsolatedClientLoader.createClient(IsolatedClientLoader.scala:258) at org.apache.spark.sql.hive.HiveUtils$.newClientForMetadata(HiveUtils.scala:359) at org.apache.spark.sql.hive.HiveUtils$.newClientForMetadata(HiveUtils.scala:263) at org.apache.spark.sql.hive.HiveSharedState.metadataHive$lzycompute(HiveSharedState.scala:39) at org.apache.spark.sql.hive.HiveSharedState.metadataHive(HiveSharedState.scala:38) at org.apache.spark.sql.hive.HiveSharedState.externalCatalog$lzycompute(HiveSharedState.scala:46) at org.apache.spark.sql.hive.HiveSharedSt
我在这里缺少一些说明吗?任何前进的指针将不胜感激.
Am I missing some specification here? Any pointers to proceed would be appreciated.
推荐答案
我有同样的问题.但是这个简单的代码有相似性的问题
I have the same problem. But similary problem with this simple code
createDataFrame(iris)
在安装中可能出了点问题?
May be some wrong in installation ?
UPD.是的 !我找到了解决方法.
UPD. YES ! I find solution.
基于以下解决方案:带有DataFrame API的Apache Spark MLlib在createDataFrame()或read().csv(...)
对于R,只需通过以下代码开始会话:
For R just start session by this code:
sparkR.session(sparkConfig = list(spark.sql.warehouse.dir="/file:C:/temp"))
这篇关于Spark 2.0.0:SparkR CSV导入的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!