在Java中创建镶木地板文件 [英] create parquet files in java

查看:60
本文介绍了在Java中创建镶木地板文件的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

是否可以通过Java创建镶木地板文件?

Is there a way to create parquet files from java?

我的内存中有数据(java类),我想将其写入一个Parquet文件中,以便以后从apache-drill中读取它.

I have data in memory (java classes) and I want to write it into a parquet file, to later read it from apache-drill.

是否有一种简单的方法,例如将数据插入sql表?

Is there an simple way to do this, like inserting data into a sql table?

知道了

感谢您的帮助.

将答案与

Combining the answers and this link, I was able to create a parquet file and read it back with drill.

推荐答案

不推荐使用ParquetWriter的构造函数(1.8.1),但不赞成使用ParquetWriter本身,您仍然可以通过在其中扩展抽象Builder子类来创建ParquetWriter.

ParquetWriter's constructors are deprecated(1.8.1) but not ParquetWriter itself, you can still create ParquetWriter by extending abstract Builder subclass inside of it.

这里是实木复合地板创作者自己的例子

Here an example from parquet creators themselves ExampleParquetWriter:

  public static class Builder extends ParquetWriter.Builder<Group, Builder> {
    private MessageType type = null;
    private Map<String, String> extraMetaData = new HashMap<String, String>();

    private Builder(Path file) {
      super(file);
    }

    public Builder withType(MessageType type) {
      this.type = type;
      return this;
    }

    public Builder withExtraMetaData(Map<String, String> extraMetaData) {
      this.extraMetaData = extraMetaData;
      return this;
    }

    @Override
    protected Builder self() {
      return this;
    }

    @Override
    protected WriteSupport<Group> getWriteSupport(Configuration conf) {
      return new GroupWriteSupport(type, extraMetaData);
    }

  }

如果您不想使用Group和GroupWriteSupport(捆绑在Parquet中,但仅用作数据模型实现的示例),则可以使用Avro,协议缓冲区或Thrift内存中数据模型.这是一个使用Avro编写Parquet的示例:

If you don't want to use Group and GroupWriteSupport(bundled in Parquet but purposed just as an example of data-model implementation) you can go with Avro, Protocol Buffers, or Thrift in-memory data models. Here is an example using writing Parquet using Avro:

try (ParquetWriter<GenericData.Record> writer = AvroParquetWriter
        .<GenericData.Record>builder(fileToWrite)
        .withSchema(schema)
        .withConf(new Configuration())
        .withCompressionCodec(CompressionCodecName.SNAPPY)
        .build()) {
    for (GenericData.Record record : recordsToWrite) {
        writer.write(record);
    }
}   

您将需要以下依赖项:

<dependency>
    <groupId>org.apache.parquet</groupId>
    <artifactId>parquet-avro</artifactId>
    <version>1.8.1</version>
</dependency>

<dependency>
    <groupId>org.apache.parquet</groupId>
    <artifactId>parquet-hadoop</artifactId>
    <version>1.8.1</version>
</dependency>

完整示例 查看全文

登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆