Flume - 有没有办法将avro事件(header& body)存储到hdfs中? [英] Flume - Is there a way to store avro event (header & body) into hdfs?

查看:927
本文介绍了Flume - 有没有办法将avro事件(header& body)存储到hdfs中?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

新来flume ...



我正在接收avro事件并将它们存储到HDFS中。

据我所知,默认情况下只有事件的主体存储在HDFS中。我也知道有一个 avro_event序列化程序。但是我不知道这个串行器实际上在做什么?它如何影响接收器的最终输出?



另外,我不知道如何将事件转储到保存其头信息的HDFS中。是否需要编写自己的序列化程序?

解决方案

事实证明,序列化程序 avro_event 不会同时存储标题&



以下是我如何设置我的接收器:

  a1.sinks.i1.type = hdfs 
a1.sinks.i1.hdfs.path = hdfs:// localhost:8020 / user / my-name
a1.sinks.i1.hdfs .rollInterval = 0
1.a1.sinks.i1.hdfs.rollSize = 1024
a1.sinks.i1.hdfs.rollCount = 0
a1.sinks.i1.serializer = avro_event
a1.sinks.i1.hdfs.fileType = DataStream

我使用打包代理发送事件 avro-client ,通过使用 -R headerFile 选项注入头文件。



headerFile的内容:

  machine = localhost 
user = myName

最后使用我从张贴

  final FileSystem fs = FileSystem.get(getConf()); 
final Path path = new Path(fs.getHomeDirectory(),FlumeData.1446072877536);

printWriter.write(path +-exists:+ fs.exists(path));

final SeekableInput input = new FsInput(path,getConf());
final DatumReader< GenericRecord> reader = new GenericDatumReader< GenericRecord>();
final FileReader< GenericRecord> fileReader = DataFileReader.openReader(input,reader); (最终GenericRecord datum:fileReader)

{
printWriter.write(value =+ datum);
}

fileReader.close();

我确实看到了每条记录的标题,这里有一行:

  value = {headers:{machine:localhost,user:myName},body:{字节:set -x}} 






还有一个序列化程序也会发出标题,这就是 header_and_text 序列化程序生成的文件是一个人类可读的文本文件。下面是一个示例行:

  {machine = localhost,user = userName} set -x 

最后,在Apache Flume - Hadoop的分布式日志集合中,提到了 header_and_text serialzer,但我无法让它工作。


New to flume...

I'm receiving avro events and storing them into HDFS.

I understand that by default only the body of the event is stored in HDFS. I also know there is an avro_event serializer. But I do not know what this serializer is actually doing? How does it effect the final output of the sink?

Also, I can't figure out how to just dump the event into HDFS preserving its header information. Do I need to write my own serializer?

解决方案

As it turns out the serializer avro_event does store both header & body in the file.

Here is how I set up my sink:

a1.sinks.i1.type=hdfs
a1.sinks.i1.hdfs.path=hdfs://localhost:8020/user/my-name
a1.sinks.i1.hdfs.rollInterval=0
a1.sinks.i1.hdfs.rollSize=1024
a1.sinks.i1.hdfs.rollCount=0
a1.sinks.i1.serializer=avro_event
a1.sinks.i1.hdfs.fileType=DataStream

I sent the events using the packaged agent avro-client, injected headers by using the -R headerFile option.

content of headerFile:

machine=localhost
user=myName

Finally tested the results using a simple java app I stole from this posting:

final FileSystem fs = FileSystem.get(getConf());
        final Path path = new Path(fs.getHomeDirectory(), "FlumeData.1446072877536");

        printWriter.write(path + "-exists: " + fs.exists(path));

        final SeekableInput input = new FsInput(path, getConf());
        final DatumReader<GenericRecord> reader = new GenericDatumReader<GenericRecord>();
        final FileReader<GenericRecord> fileReader = DataFileReader.openReader(input, reader);

        for (final GenericRecord datum : fileReader) {
            printWriter.write("value = " + datum);
        }

        fileReader.close(); 

And sure enough I see my headers for each record, here is one line:

value = {"headers": {"machine": "localhost", "user": "myName"}, "body": {"bytes": "set -x"}}


There is one other serializer that also emits the headers and that is the header_and_text serializer The resulting file is a human-readable text file. Here is a sample line:

{machine=localhost, user=userName} set -x

Finally in the Apache Flume - Distributed Log Collection for Hadoop, there is a mention of the header_and_text serialzer but I couldn't get that to work.

这篇关于Flume - 有没有办法将avro事件(header&amp; body)存储到hdfs中?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆