如何使用org.apache.parquet.hadoop.ParquetWriter将NULL值写入镶木地板? [英] How can I write NULL value to parquet using org.apache.parquet.hadoop.ParquetWriter?

查看:394
本文介绍了如何使用org.apache.parquet.hadoop.ParquetWriter将NULL值写入镶木地板?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个使用

I have a tool that uses a org.apache.parquet.hadoop.ParquetWriter to convert CSV data files to parquet data files.

我可以写基本的基本类型(INT32,DOUBLE,BINARY字符串).

I can write basic primitive types just fine (INT32, DOUBLE, BINARY string).

我需要写NULL值,但我不知道如何.我试过用ParquetWriter编写null,它会引发异常.

I need to write NULL values, but I do not know how. I've tried simply writing null with ParquetWriter, and it throws an exception.

如何使用

How can I write NULL using org.apache.parquet.hadoop.ParquetWriter? Is there a nullable type?

我认为该代码是不言自明的:

The code I believe is self explanatory:

    ArrayList<Type> fields = new ArrayList<>();
    fields.add(new PrimitiveType(Type.Repetition.OPTIONAL, PrimitiveTypeName.INT32, "int32_col", null));
    fields.add(new PrimitiveType(Type.Repetition.OPTIONAL, PrimitiveTypeName.DOUBLE, "double_col", null));
    fields.add(new PrimitiveType(Type.Repetition.OPTIONAL, PrimitiveTypeName.BINARY, "string_col", null));
    MessageType schema = new MessageType("input", fields);

    Configuration configuration = new Configuration();
    configuration.setQuietMode(true);
    GroupWriteSupport.setSchema(schema, configuration);
    SimpleGroupFactory f = new SimpleGroupFactory(schema);
    ParquetWriter<Group> writer = new ParquetWriter<Group>(
      new Path("output.parquet"),
      new GroupWriteSupport(),
      CompressionCodecName.SNAPPY,
      ParquetWriter.DEFAULT_BLOCK_SIZE,
      ParquetWriter.DEFAULT_PAGE_SIZE,
      1048576,
      true,
      false,
      ParquetProperties.WriterVersion.PARQUET_1_0,
      configuration
    );

    // create row 1 with defined values
    Group group1 = f.newGroup();
    Integer int1 = 100;
    Double double1 = 0.5;
    String string1 = "string-value";
    group1.add(0, int1);
    group1.add(1, double1);
    group1.add(2, string1);
    writer.write(group1);

    // create row 2 with NULL values -- does not work!
    Group group2 = f.newGroup();
    Integer int2 = null;
    Double double2 = null;
    String string2 = null;
    group2.add(0, int2); // <-- throws NullPointerException
    group2.add(1, double2); // <-- throws NullPointerException
    group2.add(2, string2); // <-- throws NullPointerException
    writer.write(group2);

    writer.close();

推荐答案

该解决方案非常简单,只是不要写一个值:

The solution turns out to be quite simple, just don't write a value:

// create row 1 with defined values
Group group1 = f.newGroup();
Integer int1 = 100;
Double double1 = 0.5;
String string1 = "string-value";
group1.add(0, int1);
group1.add(1, double1);
group1.add(2, string1);
writer.write(group1);

// create row 2 with NULL values -- does not work!
Group group2 = f.newGroup();
// do nothing !
writer.write(group2);

// Now, parquet file will have 2 rows, one with values, one with null values

这篇关于如何使用org.apache.parquet.hadoop.ParquetWriter将NULL值写入镶木地板?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆