使用大象鸟和 hive 读取 protobuf 数据 [英] Use elephant-bird with hive to read protobuf data

查看:104
本文介绍了使用大象鸟和 hive 读取 protobuf 数据的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个类似的问题,比如 这个一个

I have a similar problem like this one

以下是我使用的:

  1. CDH4.4 (hive 0.10)
  2. protobuf-java-.2.4.1.jar
  3. elephant-bird-hive-4.6-SNAPSHOT.jar
  4. elephant-bird-core-4.6-SNAPSHOT.jar
  5. elephant-bird-hadoop-compat-4.6-SNAPSHOT.jar
  6. 包含 protoc 编译的 .class 文件的 jar 文件.

我流Protocol Buffer java教程创建我的数据testbook".

And I flow Protocol Buffer java tutorial create my data "testbook".

还有我

使用 hdfs dfs -mkdir/protobuf_data 创建 HDFS 文件夹.

use hdfs dfs -mkdir /protobuf_data to create HDFS folder.

使用 hdfs dfs -put testbook/protobuf_data 将testbook"放入 HDFS.

Use hdfs dfs -put testbook /protobuf_data to put "testbook" to HDFS.

然后我关注 elephant-鸟网页创建表格,语法如下:

Then I follow elephant-bird web page to create table, syntax is like this:

create table addressbook
  row format serde "com.twitter.elephantbird.hive.serde.ProtobufDeserializer"
  with serdeproperties (
    "serialization.class"="com.example.tutorial.AddressBookProtos$AddressBook")
  stored as
    inputformat "com.twitter.elephantbird.mapred.input.DeprecatedRawMultiInputFormat"
    OUTPUTFORMAT "org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat"
  LOCATION '/protobuf_data/';

一切正常.

但是当我提交查询select * from addressbook;没有结果出来.

But when I submit the query select * from addressbook; no result came out.

而且我找不到任何错误的日志进行调试.

And I couldn't find any logs with errors to debug.

有人可以帮我吗?

非常感谢

推荐答案

问题已解决.

首先我将protobuf二进制数据直接放入HDFS中,没有显示结果.

First I put protobuf binary data directly into HDFS, no result showed.

因为它不能那样工作.

问了一些资深的同事,他们说protobuf二进制数据应该写入某种容器,某种文件格式,比如hadoop SequenceFile等.

After asking some senior colleagues, they said protobuf binary data should be written into some kind of container, some file format, like hadoop SequenceFile etc.

elephant-bird页面也写了信息,但一开始我完全看不懂.

The elephant-bird page had written the information too, but first I couldn't understand it completely.

将protobuf二进制数据写入sequenceFile后,可以用hive读取protobuf数据.

After writing protobuf binary data into sequenceFile, I can read the protobuf data with hive.

而且因为我使用了sequenceFile格式,所以我使用了create table语法:

And because I use sequenceFile format, so I use the create table syntax:

inputformat 'org.apache.hadoop.mapred.SequenceFileInputFormat'
outputformat 'org.apache.hadoop.mapred.SequenceFileOutputFormat'

希望它也能帮助其他刚接触 hadoop、hive 和大象的人.

Hope it can help others who are new to hadoop, hive, elephant too.

这篇关于使用大象鸟和 hive 读取 protobuf 数据的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆