如何在Flume中转换事件并将其发送到另一个频道? [英] How do I transform events in Flume and send them to another channel?

查看:140
本文介绍了如何在Flume中转换事件并将其发送到另一个频道?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

Flume具有一些准备就绪的组件,可以在进一步推动事件之前对其进行转换- 像RegexHbaseEventSerializer一样,您可以加入HBaseSink.此外,提供自定义序列化器也很容易.

Flume has some ready components to transform events before pushing them further - like RegexHbaseEventSerializer you can stick into an HBaseSink. Also, it's easy to provide a custom serializer.

我想处理事件并将其发送到下一个频道.最接近我想要的是 Regex Extractor拦截器,它接受用于regexp匹配的自定义序列化程序.但是它不能替代事件主体,只是将新的标头和结果附加到事件中,从而使输出流更重.我想接受大型事件,例如压缩的html> 5KB,对其进行解析,然后将许多苗条的消息(例如在页面中找到的url)放到另一个频道中.

I want to process events and send them to the next channel. Most close to what I want is Regex Extractor Interceptor , which accepts a custom serialiser for regexp matches. But it does not substitute event body, just appends new headers with results to events, thus making output flow heavier. I'd like to accept big sized events, like zipped html > 5KB, parse them and put many slim messages, like urls found in pages, to another channel.

                  channel1                channel2
HtmlPagesSource -----------> PageParser -----------> WhateverSinkGoesNext
                    html                    urls

我是否需要为此编写一个自定义接收器,或者是否有某种类型的组件可以接受自定义序列化程序,例如HBaseSink?

Do I have to write a custom sink for that, or is there some type of component that accepts custom serializers, like HBaseSink?

如果我写一个接收器,在处理传入事件时是否只使用Flume客户端SDK并调用append(Event)或appendBatch(List)?

If I write a sink, do I just use Flume client SDK and call append(Event) or appendBatch(List) when processing incoming events?

推荐答案

似乎您需要运行两个Flume代理:

It seems like you need run two Flume agents:

Agent1:HtmlPagesSource-> channel-> PageParser(扩展了AvroSink并覆盖了可以解析输入并写入许多细长消息的处理方法)

Agent1: HtmlPagesSource -> channel -> PageParser (extends AvroSink and overrides process method that can parse input and write many slim messages)

Agent2:AvroSource->频道-> WhateverSinkGoesNext

Agent2: AvroSource -> channel -> WhateverSinkGoesNext

查找链接Flume数据流的一些示例: http://www.ibm.com/developerworks/library/bd-flumews/#N10081

Look for some examples of chaining Flume data flows: http://www.ibm.com/developerworks/library/bd-flumews/#N10081

这篇关于如何在Flume中转换事件并将其发送到另一个频道?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆