从DataFrame写入后将CSV数据读入SparkR [英] Reading csv data into SparkR after writing it out from a DataFrame
本文介绍了从DataFrame写入后将CSV数据读入SparkR的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!
问题描述
我按照这篇文章中的示例进行了写出 DataFrame
作为AWS S3存储桶中的csv。结果不是一个文件,而是一个包含许多.csv文件的文件夹。我现在在SparkR中以 DataFrame
的形式读取此文件夹时遇到麻烦。以下是我尝试过的方法,但是它们并不会导致我写出相同的 DataFrame
。
write.df(df,'s3a:// bucket / df',source = csv)#创建一个文件夹在S3存储桶中命名为df
相同的数据帧pre>
df_in1<-read.df( s3a:// bucket / df,source = csv)
df_in2<-read.df( s3a://bucket/df/*.csv,source = csv)
#df_in1或df_in2均不会导致与df
解决方案
#在此示例中使用了Spark 1.4
#
#从https://s3-us-west-2.amazonaws.com/sparkr-data/nycflights13.csv以CSV格式下载nyc航班数据集
#使用$ b $启动SparkR b#./bin/sparkR --packages com.databricks:spark-csv_2.10:1.0.3
#SparkSQL上下文应该已经为您创建为sqlContext
sqlContext
#Java引用类型org.apache.spark.sql.SQLContext id 1
#使用`read.df`加载排期CSV文件。请注意,我们在这里使用CSV阅读器Spark软件包。
航班<-read.df(sqlContext, ./nycflights13.csv, com.databricks.spark.csv,header = true)
#打印第一个几行
头(航班)
希望此示例有所帮助。
I followed the example in this post to write out a
DataFrame
as a csv to an AWS S3 bucket. The result was not a single file but rather a folder with many .csv files. I'm now having trouble reading in this folder as aDataFrame
in SparkR. Below is what I've tried but they do not result in the sameDataFrame
that I wrote out.write.df(df, 's3a://bucket/df', source="csv") #Creates a folder named df in S3 bucket df_in1 <- read.df("s3a://bucket/df", source="csv") df_in2 <- read.df("s3a://bucket/df/*.csv", source="csv") #Neither df_in1 or df_in2 result in DataFrames that are the same as df
解决方案# Spark 1.4 is used in this example # # Download the nyc flights dataset as a CSV from https://s3-us-west-2.amazonaws.com/sparkr-data/nycflights13.csv # Launch SparkR using # ./bin/sparkR --packages com.databricks:spark-csv_2.10:1.0.3 # The SparkSQL context should already be created for you as sqlContext sqlContext # Java ref type org.apache.spark.sql.SQLContext id 1 # Load the flights CSV file using `read.df`. Note that we use the CSV reader Spark package here. flights <- read.df(sqlContext, "./nycflights13.csv", "com.databricks.spark.csv", header="true") # Print the first few rows head(flights)
Hope this example helps.
这篇关于从DataFrame写入后将CSV数据读入SparkR的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!
查看全文