失败,异常 java.io.IOException:org.apache.avro.AvroTypeException: Found long,期待在 hive 中的联合 [英] Failed with exception java.io.IOException:org.apache.avro.AvroTypeException: Found long, expecting union in hive
问题描述
需要帮助!!!
我正在使用 flume
将 twitter 提要流式传输到 hdfs 并将其加载到 hive
中进行分析.
I am streaming twitter feeds into hdfs using flume
and loading it up in hive
for analysis.
步骤如下:
hdfs 中的数据:
我已经在 avsc
文件中描述了 avro schema
并将其放入 hadoop:
I have described the avro schema
in an avsc
file and put it in hadoop:
{"type":"record",
"name":"Doc",
"doc":"adoc",
"fields":[{"name":"id","type":"string"},
{"name":"user_friends_count","type":["int","null"]},
{"name":"user_location","type":["string","null"]},
{"name":"user_description","type":["string","null"]},
{"name":"user_statuses_count","type":["int","null"]},
{"name":"user_followers_count","type":["int","null"]},
{"name":"user_name","type":["string","null"]},
{"name":"user_screen_name","type":["string","null"]},
{"name":"created_at","type":["string","null"]},
{"name":"text","type":["string","null"]},
{"name":"retweet_count","type":["boolean","null"]},
{"name":"retweeted","type":["boolean","null"]},
{"name":"in_reply_to_user_id","type":["long","null"]},
{"name":"source","type":["string","null"]},
{"name":"in_reply_to_status_id","type":["long","null"]},
{"name":"media_url_https","type":["string","null"]},
{"name":"expanded_url","type":["string","null"]}]}
我已经编写了一个 .hql 文件来创建一个表并在其中加载数据:
I have written an .hql file to create a table and loaded data in it:
create table tweetsavro
row format serde
'org.apache.hadoop.hive.serde2.avro.AvroSerDe'
stored as inputformat
'org.apache.hadoop.hive.ql.io.avro.AvroContainerInputFormat'
outputformat
'org.apache.hadoop.hive.ql.io.avro.AvroContainerOutputFormat'
tblproperties ('avro.schema.url'='hdfs:///avro_schema/AvroSchemaFile.avsc');
load data inpath '/test/twitter_data/FlumeData.*' overwrite into table tweetsavro;
我已成功运行 .hql 文件,但是当我在 hive 中运行 select *from <tablename>
命令时,它显示以下错误:
I have successfully run the .hql file but when i run the select *from <tablename>
command in hive it shows the following error:
tweetsavro 的输出是:
The output of tweetsavro is:
hive> desc tweetsavro;
OK
id string
user_friends_count int
user_location string
user_description string
user_statuses_count int
user_followers_count int
user_name string
user_screen_name string
created_at string
text string
retweet_count boolean
retweeted boolean
in_reply_to_user_id bigint
source string
in_reply_to_status_id bigint
media_url_https string
expanded_url string
Time taken: 0.697 seconds, Fetched: 17 row(s)
推荐答案
我遇到了完全相同的问题.该问题存在于时间戳字段(在您的案例中为created_at"列)中,我试图将其作为字符串插入到我的新表中.我的假设是这些数据在我的源代码中采用 [ "null","string"]
格式.我分析了从 sqoop import --as-avrodatafile 进程生成的源 avro 模式.从导入生成的 avro 架构的时间戳列具有以下签名.
<代码>{"name": "order_date",类型":[空",长"],默认":空,"columnName": "order_date",sqlType":93"},
I was facing the exact same issue. The issue existed in the timestamp field("created_at" column in your case) which i was trying to insert as string into my new table. My assumption was this data would be in [ "null","string"]
format in my source. I analyzed the source avro schema which got generated from the sqoop import --as-avrodatafile process. The avro schema generated from import had the below signature for the timestamp column.
{
"name" : "order_date",
"type" : [ "null", "long" ],
"default" : null,
"columnName" : "order_date",
"sqlType" : "93"
},
SqlType 93 代表时间戳数据类型.所以在我的目标表 Avro Schema 文件中,我将数据类型更改为long",这解决了问题.我的猜测可能是您的某一列中的数据类型不匹配.
SqlType 93 stands for Timestamp datatype. So in my target table Avro Schema file I changed the data type to 'long' and this solved the issue. My guess is possibly the mismatch of datatype in one of your columns.
这篇关于失败,异常 java.io.IOException:org.apache.avro.AvroTypeException: Found long,期待在 hive 中的联合的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!