将数据从本地hdfs加载到本地SparkR [英] Loading data from on-premises hdfs to local SparkR
本文介绍了将数据从本地hdfs加载到本地SparkR的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!
问题描述
当我这样做时:
$ b $ 我试图使用SparkR将数据从本地hdfs加载到R-Studio。 b df_hadoop< - read.df(sqlContext,hdfs://xxx.xx.xxx.xxx:xxxx / user / lam / lamr_2014_09.csv ,
source =com.databricks.spark.csv)
然后这个:
str(df_hadoop)
我得到这个:
带2个插槽的正式类'DataFrame'[包'SparkR']
.. @ env:< environment:0x000000000xxxxxxx>
.. @ sdf:Class'jobj'< environment:0x000000000xxxxxx>
这不是我要找的df,因为csv I中有13个字段试图从hdfs加载。
我有一个包含csv的13个字段的模式,但是我在哪里或如何将它告诉给SparkR?
解决方案
如果您尝试以下操作:
df < - createDataFrame(sqlContext,
data.frame(a = c(1,2,3),
b = c(2,3,4),
c = c(3,4,5)))
str(df)
您还可以获得
形式类'DataFrame'[套件'SparkR']带有2个插槽
.. @ env:< environment:0x139235d18>
.. @ sdf:Class'jobj'< environment:0x139230e68>
Str()会向您显示df的表示形式,它是一个指针而不是data.frame 。而不只是使用
df
或
show(df)
I'm trying to load data from an on-premises hdfs to R-Studio with SparkR.
When I do this:
df_hadoop <- read.df(sqlContext, "hdfs://xxx.xx.xxx.xxx:xxxx/user/lam/lamr_2014_09.csv",
source = "com.databricks.spark.csv")
and then this:
str(df_hadoop)
I get this:
Formal class 'DataFrame' [package "SparkR"] with 2 slots
..@ env: <environment: 0x000000000xxxxxxx>
..@ sdf:Class 'jobj' <environment: 0x000000000xxxxxx>
This is not however the df I'm looking for, because there are 13 fields in the csv I'm trying to load from hdfs.
I have a schema with the 13 fields of the csv, but where or how do I tell it to SparkR?
解决方案
If you try the following:
df <- createDataFrame(sqlContext,
data.frame(a=c(1,2,3),
b=c(2,3,4),
c=c(3,4,5)))
str(df)
You as well get
Formal class 'DataFrame' [package "SparkR"] with 2 slots
..@ env:<environment: 0x139235d18>
..@ sdf:Class 'jobj' <environment: 0x139230e68>
Str() does show you the representation of df, which is a pointer instead of a data.frame. Rather just use
df
or
show(df)
这篇关于将数据从本地hdfs加载到本地SparkR的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!
查看全文