如何在Google Cloud Dataflow中编码可为空的对象? [英] How can I code nullable objects in Google Cloud Dataflow?

查看:59
本文介绍了如何在Google Cloud Dataflow中编码可为空的对象?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

这篇文章旨在回答以下问题:

This post is intended to answer questions like the following:

  • 哪些内置Coder支持空值?
  • 如何编码可为空的对象?
  • 具有可为空字段的类怎么办?
  • 关于带有null条目的集合呢?
  • Which built-in Coders support nullable values?
  • How can I encode nullable objects?
  • What about classes with nullable fields?
  • What about collections with null entries?

推荐答案

您可以在某些默认编码器不支持null值,通常是为了提高效率.例如,DoubleCoder始终使用8个字节对double进行编码.添加一个位以反映double是否为null会为所有非null值添加第9个字节(填充).

Some of the default Coders do not support null values, often for efficiency. For example, DoubleCoder always encodes a double using 8 bytes; adding a bit to reflect whether the double is null would add a (padded) 9th byte to all non-null values.

可以使用以下概述的技术对可为空的值进行编码.

It is possible to encode nullable values using the techniques outlined below.

  1. 我们通常建议使用

  1. We generally recommend using AvroCoder to encode classes. AvroCoder has support for nullable fields annotated with the org.apache.avro.reflect.Nullable annotation:

@DefaultCoder(AvroCoder.class)
class MyClass {
  @Nullable String nullableField;
}

请参见 TrafficMaxLaneFlow 以获得更完整的代码示例.

See the TrafficMaxLaneFlow for a more complete code example.

AvroCoder还支持在Union中包含Null的字段.

AvroCoder also supports fields that include Null in a Union.

我们建议使用

We recommend using NullableCoder to encode nullable objects themselves. This implements the strategy in #1.

例如,考虑以下工作代码:

For example, consider the following working code:

PCollection<String> output =
    p.apply(Create.of(null, "test1", null, "test2", null)
        .withCoder(NullableCoder.of(String.class)));

只要嵌套编码器支持null字段/对象,

  • 嵌套的null字段/对象就受到许多编码器的支持.

  • Nested null fields/objects are supported by many coders, as long as the nested coder supports null fields/objects.

    例如,SDK应该能够使用默认的CoderRegistry来为List<MyClass>推断有效的编码器-它应该自动使用带有嵌套AvroCoderListCoder.

    For example, the SDK should be able to infer a working coder using the default CoderRegistry for a List<MyClass> -- it should automatically use a ListCoder with a nested AvroCoder.

    类似地,可以使用编码器对可能包含null项的List<String>进行编码:

    Similarly, a List<String> with possibly-null entries can be encoded with the Coder:

    Coder<List<String>> coder = ListCoder.of(NullableCoder.of(String.class))
    

  • 最后,在某些情况下,编码器必须是确定性的,例如,用于GroupByKey的密钥.在AvroCoder中,只要基本类型的Coder本身是确定性的,就对@Nullable字段进行确定性编码.同样,使用NullableCoder不应影响是否可以确定性地编码对象.

    Finally, in some cases Coders must be deterministic, e.g., the key used for GroupByKey. In AvroCoder, the @Nullable fields are coded deterministically as long as the Coder for the base type is itself deterministic. Similarly, using NullableCoder should not affect whether an object can be encoded deterministically.

    这篇关于如何在Google Cloud Dataflow中编码可为空的对象?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

    查看全文
    登录 关闭
    扫码关注1秒登录
    发送“验证码”获取 | 15天全站免登陆