为什么Flume来源需要识别消息的格式? [英] Why does a Flume source need to recognize the format of the message?

查看:86
本文介绍了为什么Flume来源需要识别消息的格式?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

根据此处

Flume源使用由外部源(如Web服务器)传递给它的事件.外部源以目标Flume源可以识别的格式将事件发送到Flume.例如,Avro Flume源可用于从流中从Avro接收器发送事件的流中的Avro客户端或其他Flume代理接收Avro事件.

A Flume source consumes events delivered to it by an external source like a web server. The external source sends events to Flume in a format that is recognized by the target Flume source. For example, an Avro Flume source can be used to receive Avro events from Avro clients or other Flume agents in the flow that send events from an Avro sink.

为什么Flume来源需要识别或理解消息的格式?它所做的只是将邮件转发到其中一个频道.

Why does a Flume source need to recognize or understand the format of the message? While all it does it does is to forward the message to one of the channel.

推荐答案

根据我的了解,Flume将传输数据封装在由报头和有效负载构成的事件包中(传输数据).从文档中:

Since what I've learnt, Flume encapsulate the transfering data in an event packet made by an header and a payload (the transfering data). From the documentation:

Flume事件定义为具有字节有效负载的数据流单位 以及一组可选的字符串属性.

A Flume event is defined as a unit of data flow having a byte payload and an optional set of string attributes.

在您引用文档之前.

您指定的格式是事件包的格式,而不是数据的格式.

The format you specify is the format of the event packet, not the format of your data.

让我们假设您有这个代理人:

Let's suppose you have this agent:

plain_to_avro_translator.sources = plain-source avro-source
plain_to_avro_translator.sinks = avro-sink local-file-sink
plain_to_avro_translator.channels = mem-channel1 mem-channel2

plain_to_avro_translator.sources.plain-source.channels = mem-channel1
plain_to_avro_translator.sources.plain-source.type = exec
plain_to_avro_translator.sources.plain-source.restart = true
plain_to_avro_translator.sources.plain-source.restartThrottle = 40000
plain_to_avro_translator.sources.plain-source.command = cat /home/user/data.log

plain_to_avro_translator.sinks.avro-sink.channel = mem-channel1
plain_to_avro_translator.sinks.avro-sink.type = thrift
plain_to_avro_translator.sinks.avro-sink.hostname = 192.168.200.43
plain_to_avro_translator.sinks.avro-sink.port = 6000

plain_to_avro_translator.channels.mem-channel1.type = memory
plain_to_avro_translator.channels.mem-channel1.capacity = 100
plain_to_avro_translator.channels.mem-channel1.transactionCapacity = 100

plain_to_avro_translator.sources.avro-source.channels = mem-channel2
plain_to_avro_translator.sources.avro-source.type = thrift
plain_to_avro_translator.sources.avro-source.bind = 0.0.0.0
plain_to_avro_translator.sources.avro-source.port = 6000

plain_to_avro_translator.channels.mem-channel2.type = memory
plain_to_avro_translator.channels.mem-channel2.capacity = 100
plain_to_avro_translator.channels.mem-channel2.transactionCapacity = 100

plain_to_avro_translator.sinks.local-file-sink.channel = mem-channel2
plain_to_avro_translator.sinks.local-file-sink.type = file_roll
plain_to_avro_translator.sinks.local-file-sink.sink.directory = /home/user/flume_output

这将毫无问题,并且不依赖于data.log格式(您可以编写所需的任何格式的内容).如果您尝试将avro-sink类型设置为avro而不是节俭,则会从avro-source收到错误消息,因为它期望节俭格式事件.

This will work with no problems and is not dependant from the data.log format (you can write whatever you need and in whatever format). If you try to set the avro-sink type to avro instead of thrift, you will get errors from avro-source because it expects thrift format event.

接收器和源需要知道如何解析事件包.

Sink and source needs to know how to parse event packet.

希望我一切都好.如果我错了,请任何人纠正我.

Hope I got it well. Please anyone correct me if I am wrong.

这篇关于为什么Flume来源需要识别消息的格式?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆