从PCollection< TableRow>转换为到PCollection KV,K,V. [英] Convert from PCollection<TableRow> to PCollection<KV<K,V>>
问题描述
我正在尝试从BigQuery的2个表中提取数据,然后通过CoGroupByKey将其加入.
尽管BigQuery的输出为PCollection<TableRow>
,但CoGroupByKey
需要PCollection<KV<K,V>>
.
如何从PCollection<TableRow>
转换为PCollection<KV<K,V>>
?
I'm trying to extract data from 2 tables in BigQuery, then join it by CoGroupByKey.
Although the output of BigQuery is PCollection<TableRow>
, CoGroupByKey
requires PCollection<KV<K,V>>
.
How can I convert from PCollection<TableRow>
to PCollection<KV<K,V>>
?
推荐答案
CoGroupByKey
需要通过以下方式知道CoGroup
的哪个键-这是KV<K, V>
中的K
,而V
是与此集合中与此键相关联的值.将几个集合共同分组的结果将为您为每个键提供每个集合中与此键的所有值.
CoGroupByKey
needs to know which key to CoGroup
by - this is the K
in KV<K, V>
, and the V
is the value associated with this key in this collection. The result of co-grouping several collections will give you, for each key, all of the values with this key in each collection.
因此,您需要将两个PCollection<TableRow>
都转换为PCollection<KV<YourKey, TableRow>>
,其中YourKey
是您想要加入它们的键的类型,例如在您的情况下,可能是String
或Integer
或其他内容.
So, you need to convert both of your PCollection<TableRow>
to PCollection<KV<YourKey, TableRow>>
where YourKey
is the type of key on which you want to join them, e.g. in your case perhaps it might be String
, or Integer
, or something else.
进行转换的最佳转换可能是WithKeys
.例如.这是一个代码示例,该示例将PCollection<TableRow>
转换为以String
类型的假设userId
字段为键的PCollection<KV<String, TableRow>>
:
The best transform to do the conversion is probably WithKeys
. E.g. here's a code sample converting a PCollection<TableRow>
to a PCollection<KV<String, TableRow>>
keyed by a hypothetical userId
field of type String
:
PCollection<TableRow> rows = ...;
PCollection<KV<String, TableRow>> rowsKeyedByUser = rows
.apply(WithKeys.of(new SerializableFunction<TableRow, String>() {
@Override
public String apply(TableRow row) {
return (String)row.get("userId");
}
}));
这篇关于从PCollection< TableRow>转换为到PCollection KV,K,V.的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!