将数据从本地hdfs加载到本地SparkR [英] Loading data from on-premises hdfs to local SparkR

查看:140
本文介绍了将数据从本地hdfs加载到本地SparkR的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述



当我这样做时:

$ b $

我试图使用SparkR将数据从本地hdfs加载到R-Studio。 b

  df_hadoop<  -  read.df(sqlContext,hdfs://xxx.xx.xxx.xxx:xxxx / user / lam / lamr_2014_09.csv ,
source =com.databricks.spark.csv)

然后这个:

  str(df_hadoop)

我得到这个:

 带2个插槽的正式类'DataFrame'[包'SparkR'] 
.. @ env:< environment:0x000000000xxxxxxx>
.. @ sdf:Class'jobj'< environment:0x000000000xxxxxx>

这不是我要找的df,因为csv I中有13个字段试图从hdfs加载。



我有一个包含csv的13个字段的模式,但是我在哪里或如何将它告诉给SparkR?

解决方案

如果您尝试以下操作:

  df < -  createDataFrame(sqlContext,
data.frame(a = c(1,2,3),
b = c(2,3,4),
c = c(3,4,5)))

str(df)

您还可以获得

 形式类'DataFrame'[套件'SparkR']带有2个插槽
.. @ env:< environment:0x139235d18>
.. @ sdf:Class'jobj'< environment:0x139230e68>

Str()会向您显示df的表示形式,它是一个指针而不是data.frame 。而不只是使用

  df 

  show(df)


I'm trying to load data from an on-premises hdfs to R-Studio with SparkR.

When I do this:

 df_hadoop <- read.df(sqlContext, "hdfs://xxx.xx.xxx.xxx:xxxx/user/lam/lamr_2014_09.csv",
              source = "com.databricks.spark.csv")

and then this:

str(df_hadoop)

I get this:

Formal class 'DataFrame' [package "SparkR"] with 2 slots 
..@ env: <environment: 0x000000000xxxxxxx>  
..@ sdf:Class 'jobj' <environment: 0x000000000xxxxxx>  

This is not however the df I'm looking for, because there are 13 fields in the csv I'm trying to load from hdfs.

I have a schema with the 13 fields of the csv, but where or how do I tell it to SparkR?

解决方案

If you try the following:

df <- createDataFrame(sqlContext,
                      data.frame(a=c(1,2,3),
                                 b=c(2,3,4),
                                 c=c(3,4,5)))

str(df)

You as well get

Formal class 'DataFrame' [package "SparkR"] with 2 slots
  ..@ env:<environment: 0x139235d18> 
  ..@ sdf:Class 'jobj' <environment: 0x139230e68> 

Str() does show you the representation of df, which is a pointer instead of a data.frame. Rather just use

df

or

show(df)

这篇关于将数据从本地hdfs加载到本地SparkR的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆