Apache Beam流处理json数据 [英] Apache Beam stream processing of json data
本文介绍了Apache Beam流处理json数据的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!
问题描述
我正在分析Apache Beam对数据的流处理.我从事过Apache Kafka流处理(生产者,消费者等)的工作.我现在想将其与Beam进行比较.
I am analyzing Apache Beam stream processing of data. I have worked on Apache Kafka stream processing (Producer, Consumer etc). I want to compare it with Beam now.
我想以编程方式(Java)使用Apache Beam流化简单的json数据.
I want to to stream simple json data using Apache Beam programmatically (Java).
{"UserID":"1","Address":"XXX","ClassNo":"989","UserName":"Stella","ClassType":"YYY"}
有人可以指导我或通过示例链接指导我吗?
Can someone please guide me or direct me with an example link?
推荐答案
它有多个方面:
- 首先,您需要确定数据来自何处:
- 您需要在Beam管道中使用某种IO,请参见此处;
- 有一堆内置的IO,请参见列表此处 ;
- 通过使用上述链接中的IO,您很可能会获得包含这些JSON对象的字符串流;
- 某些IO可以本地解析Avro和其他格式(PubsubIO),这取决于特定的IO实施;
- first you need to establish where the data is coming from:
- you need to use some kind of IO in Beam pipeline, see here;
- there are a bunch of built in IOs, see the list here;
- by using an IO from the above link you will likely get a stream of strings containing those JSON objects;
- some IOs can natively parse Avro and other formats (PubsubIO), this depends on specific IO implementation;
然后您可能需要转换数据:
then you may need to transform the data:
- 您将需要创建自己的PTransform来处理从JSON字符串到Java类的转换:
- 请参见有关PTransforms的部分此处;
- you will need to create your own PTransform which handles the conversion from a JSON string to your Java class:
- see the section about PTransforms here;
- this JsonToRow PTransform accepts a string with JSON object and converts it to a Beam Row using Jackson ObjectMapper;
- you can either try using the Row object yourself, or you can implement a similar transform to convert JSON strings to your custom Java type instead of Row;
您还可以查看 examples文件夹在光束源中;
you may also take a look at examples folder in Beam source;
这篇关于Apache Beam流处理json数据的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!
查看全文