通过Google DataFlow Transformer查询关系数据库 [英] Querying a relational database through Google DataFlow Transformer
问题描述
我想在我的Dataflow Pipeline上实现 ParDo
Transformer,它基本上根据要处理的每个元素提供的数据查询关系数据库。我知道用户定义的变换器中的每个属性都必须是可序列化的,但要使用 jdbc
查询数据到数据库,我需要创建一个 Connection
这是一个自然不可序列化的对象。
I would like to implement a ParDo
Transformer on my Dataflow Pipeline, that basically query a relational database based on the data provided by each element to be processed. I know every attribute in an user defined transformer must be serializable, but to query data to a database, using jdbc
I need to create a Connection
that is naturally non serializable object.
在数据流管道上下文中仍然可以这样做吗?
Is still possible to do that in the Dataflow Pipeline context?
推荐答案
是的,这是可能的。您可以使Connection对象处于瞬态状态,以便不对其进行序列化,并通过 startBundle
方法为每个bundle创建一次。处理完捆绑中的所有元素后,可以通过 finishBundle
方法关闭连接。
Yes it is possible. You could make your Connection object transient so that its not serialized and create it once per bundle through the startBundle
method. Once all the elements in the bundle are processed, the connection can be closed through the finishBundle
method.
class MyDoFn extends DoFn<X, Y> {
private transient Connection jdbc;
@Setup
public void setup(Context c) {
jdbc = // Create connection
}
@ProcessElement
public void processElement(ProcessContext c) {
// query database
}
@Teardown
public void tearDown(Context c) {
// close connection
}
}
这篇关于通过Google DataFlow Transformer查询关系数据库的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!