失败,出现异常java.io.IOException:org.apache.avro.AvroTypeException:发现很久,期望在配置单元中工会 [英] Failed with exception java.io.IOException:org.apache.avro.AvroTypeException: Found long, expecting union in hive

查看:2025
本文介绍了失败,出现异常java.io.IOException:org.apache.avro.AvroTypeException:发现很久,期望在配置单元中工会的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

需要帮助!!!

我使用 flume 将twitter feed送入hdfs并加载 hive 进行分析。



步骤如下:

hdfs 中的数据:



我已经在 avsc 文件中描述了 avro schema 并将其放入hadoop:

  {type:record,
name:Doc,
doc:adoc ,
fields:[{name:id,type:string},
{name:user_friends_count,type:[int ,null]},
{name:user_location,type:[string,null]},
{name:user_description,type :[string,null]},
{name:user_statuses_count,type:[int,null]},
{name user_followers_count,type:[int,null]},
{name:user_name,type:[string,null]},
{name:user_screen_name,type:[string,null]},
{name:created_at,type:[string,null]},
{name:text,type:[string,null] },
{name:retweet_count,type:[boolean,null]},
{name:retweeted,type:[ ,null]},
{name:in_reply_to_user_id,type:[long,null]},
{name:source类型:[string,null]},
{name:in_reply_to_status_id,type:[long,null]},
{ :media_url_https,type:[string,null]},
{name:expanded_url,type:[string,null]}]}

我写了一个.hql文件来创建一个表并在其中加载数据:

  create table tweetsavro 
行格式serde
'org.apache.hadoop.hive.serde2.avro.AvroSerDe'
以inputformat存储
'org.apache.hadoop.hive.ql.io.avro.AvroContainerInputFormat'
outputformat
'org.apache.hadoop.hive.ql.io.avro。 AvroContainerOutputFormat '
tblproperties('avro.schema.url'='hdfs:///avro_schema/AvroSchemaFile.avsc');

载入数据inpath'/test/twitter_data/FlumeData.*'覆盖到表格tweetsavro;

我已成功运行.hql文件,但是当我运行 select *从配置单元中的< tablename> 命令显示以下错误:

错误



tweetsavro的输出是:

  hive> desc tweetsavro; 
OK
id字符串
user_friends_count int
user_location字符串
user_description字符串
user_statuses_count int
user_followers_count int
user_name字符串
user_screen_name字符串
created_at字符串
文本字符串
retweet_count布尔
转发布尔
in_reply_to_user_id bigint
源s tring
in_reply_to_status_id bigint
media_url_https字符串
expanded_url字符串
所用时间:0.697秒,提取:17行


解决方案

我面对完全相同的问题。该问题存在于时间戳字段(您的案例中的created_at列),我试图将字符串插入到我的新表中。我的假设是这个数据将在我的源代码中的 [null,string] 格式。我分析了从sqoop import --as-avrodatafile进程生成的源avro模式。根据导入生成的avro模式在timestamp列中具有以下签名。

{
name:order_date,
type:[ null,long],
default:null,
columnName:order_date,
sqlType:93
} code>



SqlType 93代表Timestamp数据类型。因此,在我的目标表Avro模式文件中,我将数据类型更改为'long',这解决了问题。我的猜测可能是你列中的数据类型不匹配。


Need help!!!

I am streaming twitter feeds into hdfs using flume and loading it up in hive for analysis.

The steps are as follows:

Data in hdfs:

I have described the avro schema in an avsc file and put it in hadoop:

 {"type":"record",
 "name":"Doc",
 "doc":"adoc",
 "fields":[{"name":"id","type":"string"},
       {"name":"user_friends_count","type":["int","null"]},
       {"name":"user_location","type":["string","null"]},
       {"name":"user_description","type":["string","null"]},
       {"name":"user_statuses_count","type":["int","null"]},
       {"name":"user_followers_count","type":["int","null"]},
       {"name":"user_name","type":["string","null"]},
       {"name":"user_screen_name","type":["string","null"]},
       {"name":"created_at","type":["string","null"]},
       {"name":"text","type":["string","null"]},
       {"name":"retweet_count","type":["boolean","null"]},
       {"name":"retweeted","type":["boolean","null"]},
       {"name":"in_reply_to_user_id","type":["long","null"]},
       {"name":"source","type":["string","null"]},
       {"name":"in_reply_to_status_id","type":["long","null"]},
       {"name":"media_url_https","type":["string","null"]},
       {"name":"expanded_url","type":["string","null"]}]}

I have written an .hql file to create a table and loaded data in it:

 create table tweetsavro
    row format serde
        'org.apache.hadoop.hive.serde2.avro.AvroSerDe'
    stored as inputformat
        'org.apache.hadoop.hive.ql.io.avro.AvroContainerInputFormat'
    outputformat
        'org.apache.hadoop.hive.ql.io.avro.AvroContainerOutputFormat'
    tblproperties ('avro.schema.url'='hdfs:///avro_schema/AvroSchemaFile.avsc');

    load data inpath '/test/twitter_data/FlumeData.*' overwrite into table tweetsavro;

I have successfully run the .hql file but when i run the select *from <tablename> command in hive it shows the following error:

error

The output of tweetsavro is:

hive> desc tweetsavro;
OK
id                      string                                      
user_friends_count      int                                         
user_location           string                                      
user_description        string                                      
user_statuses_count     int                                         
user_followers_count    int                                         
user_name               string                                      
user_screen_name        string                                      
created_at              string                                      
text                    string                                      
retweet_count           boolean                                     
retweeted               boolean                                     
in_reply_to_user_id     bigint                                      
source                  string                                      
in_reply_to_status_id   bigint                                      
media_url_https         string                                      
expanded_url            string                                      
Time taken: 0.697 seconds, Fetched: 17 row(s)

解决方案

I was facing the exact same issue. The issue existed in the timestamp field("created_at" column in your case) which i was trying to insert as string into my new table. My assumption was this data would be in [ "null","string"] format in my source. I analyzed the source avro schema which got generated from the sqoop import --as-avrodatafile process. The avro schema generated from import had the below signature for the timestamp column.
{ "name" : "order_date", "type" : [ "null", "long" ], "default" : null, "columnName" : "order_date", "sqlType" : "93" },

SqlType 93 stands for Timestamp datatype. So in my target table Avro Schema file I changed the data type to 'long' and this solved the issue. My guess is possibly the mismatch of datatype in one of your columns.

这篇关于失败,出现异常java.io.IOException:org.apache.avro.AvroTypeException:发现很久,期望在配置单元中工会的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆