如何将数据帧(从配置单元表中获得)写入hadoop SequenceFile和RCFile? [英] How to write dataframe (obtained from hive table) into hadoop SequenceFile and RCFile?

查看:54
本文介绍了如何将数据帧(从配置单元表中获得)写入hadoop SequenceFile和RCFile?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我能够将其写入

  • ORC

PARQUET

直接和

TEXTFILE

AVRO

使用数据块中的其他依赖项.

using additional dependencies from databricks.

    <dependency>
        <groupId>com.databricks</groupId>
        <artifactId>spark-csv_2.10</artifactId>
        <version>1.5.0</version>
    </dependency>
    <dependency>
        <groupId>com.databricks</groupId>
        <artifactId>spark-avro_2.10</artifactId>
        <version>2.0.1</version>
    </dependency>

示例代码:

    SparkContext sc = new SparkContext(conf);
    HiveContext hc = new HiveContext(sc);
    DataFrame df = hc.table(hiveTableName);
    df.printSchema();
    DataFrameWriter writer = df.repartition(1).write();

    if ("ORC".equalsIgnoreCase(hdfsFileFormat)) {
        writer.orc(outputHdfsFile);

    } else if ("PARQUET".equalsIgnoreCase(hdfsFileFormat)) {
        writer.parquet(outputHdfsFile);

    } else if ("TEXTFILE".equalsIgnoreCase(hdfsFileFormat)) {
        writer.format("com.databricks.spark.csv").option("header", "true").save(outputHdfsFile);

    } else if ("AVRO".equalsIgnoreCase(hdfsFileFormat)) {
        writer.format("com.databricks.spark.avro").save(outputHdfsFile);
    }

有什么方法可以将数据帧写入hadoop SequenceFile和RCFile吗?

Is there any way to write dataframe into hadoop SequenceFile and RCFile?

推荐答案

您可以使用void saveAsObjectFile(String path)RDD保存为序列化对象的SequenceFile.因此,在您的情况下,您必须从DataFrame中检索RDD:

You can use void saveAsObjectFile(String path) to save a RDD as a SequenceFile of serialized objects. So in your case you have to to retrieve the RDD from the DataFrame:

JavaRDD<Row> rdd = df.javaRDD;
rdd.saveAsObjectFile(outputHdfsFile);

这篇关于如何将数据帧(从配置单元表中获得)写入hadoop SequenceFile和RCFile?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆