失败，出现异常java.io.IOException：org.apache.avro.AvroTypeException：发现很久，期望在配置单元中工会 [英] Failed with exception java.io.IOException:org.apache.avro.AvroTypeException: Found long, expecting union in hive

查看：2025 发布时间：2018/6/12 14:09:36 java hadoop hive

本文介绍了失败，出现异常java.io.IOException：org.apache.avro.AvroTypeException：发现很久，期望在配置单元中工会的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

需要帮助!!!

我使用 flume 将twitter feed送入hdfs并加载 hive 进行分析。

步骤如下：

hdfs 中的数据：

我已经在 avsc 文件中描述了 avro schema 并将其放入hadoop：
{type：record， name：Doc， doc：adoc ， fields：[{name：id，type：string}， {name：user_friends_count，type：[int ，null]}， {name：user_location，type：[string，null]}， {name：user_description，type ：[string，null]}， {name：user_statuses_count，type：[int，null]}， {name user_followers_count，type：[int，null]}， {name：user_name，type：[string，null]}， {name：user_screen_name，type：[string，null]}， {name：created_at，type：[string，null]}， {name：text，type：[string，null] }， {name：retweet_count，type：[boolean，null]}， {name：retweeted，type：[ ，null]}， {name：in_reply_to_user_id，type：[long，null]}， {name：source类型：[string，null]}， {name：in_reply_to_status_id，type：[long，null]}， { ：media_url_https，type：[string，null]}， {name：expanded_url，type：[string，null]}]}
我写了一个.hql文件来创建一个表并在其中加载数据：
create table tweetsavro 行格式serde 'org.apache.hadoop.hive.serde2.avro.AvroSerDe' 以inputformat存储 'org.apache.hadoop.hive.ql.io.avro.AvroContainerInputFormat' outputformat 'org.apache.hadoop.hive.ql.io.avro。 AvroContainerOutputFormat ' tblproperties（'avro.schema.url'='hdfs：///avro_schema/AvroSchemaFile.avsc'）; 载入数据inpath'/test/twitter_data/FlumeData.*'覆盖到表格tweetsavro;
我已成功运行.hql文件，但是当我运行 select *从配置单元中的< tablename> 命令显示以下错误：

错误

tweetsavro的输出是：

hive> desc tweetsavro; OK id字符串 user_friends_count int user_location字符串 user_description字符串 user_statuses_count int user_followers_count int user_name字符串 user_screen_name字符串 created_at字符串文本字符串 retweet_count布尔转发布尔 in_reply_to_user_id bigint 源s tring in_reply_to_status_id bigint media_url_https字符串 expanded_url字符串所用时间：0.697秒，提取：17行

解决方案
我面对完全相同的问题。该问题存在于时间戳字段（您的案例中的created_at列），我试图将字符串插入到我的新表中。我的假设是这个数据将在我的源代码中的 [null，string] 格式。我分析了从sqoop import --as-avrodatafile进程生成的源avro模式。根据导入生成的avro模式在timestamp列中具有以下签名。

{ name：order_date， type：[ null，long]， default：null， columnName：order_date， sqlType：93 } code>
SqlType 93代表Timestamp数据类型。因此，在我的目标表Avro模式文件中，我将数据类型更改为'long'，这解决了问题。我的猜测可能是你列中的数据类型不匹配。 Need help!!! I am streaming twitter feeds into hdfs using flume and loading it up in hive for analysis. The steps are as follows: Data in hdfs: I have described the avro schema in an avsc file and put it in hadoop: {"type":"record", "name":"Doc", "doc":"adoc", "fields":[{"name":"id","type":"string"}, {"name":"user_friends_count","type":["int","null"]}, {"name":"user_location","type":["string","null"]}, {"name":"user_description","type":["string","null"]}, {"name":"user_statuses_count","type":["int","null"]}, {"name":"user_followers_count","type":["int","null"]}, {"name":"user_name","type":["string","null"]}, {"name":"user_screen_name","type":["string","null"]}, {"name":"created_at","type":["string","null"]}, {"name":"text","type":["string","null"]}, {"name":"retweet_count","type":["boolean","null"]}, {"name":"retweeted","type":["boolean","null"]}, {"name":"in_reply_to_user_id","type":["long","null"]}, {"name":"source","type":["string","null"]}, {"name":"in_reply_to_status_id","type":["long","null"]}, {"name":"media_url_https","type":["string","null"]}, {"name":"expanded_url","type":["string","null"]}]} I have written an .hql file to create a table and loaded data in it: create table tweetsavro row format serde 'org.apache.hadoop.hive.serde2.avro.AvroSerDe' stored as inputformat 'org.apache.hadoop.hive.ql.io.avro.AvroContainerInputFormat' outputformat 'org.apache.hadoop.hive.ql.io.avro.AvroContainerOutputFormat' tblproperties ('avro.schema.url'='hdfs:///avro_schema/AvroSchemaFile.avsc'); load data inpath '/test/twitter_data/FlumeData.*' overwrite into table tweetsavro; I have successfully run the .hql file but when i run the select *from <tablename> command in hive it shows the following error: error The output of tweetsavro is: hive> desc tweetsavro; OK id string user_friends_count int user_location string user_description string user_statuses_count int user_followers_count int user_name string user_screen_name string created_at string text string retweet_count boolean retweeted boolean in_reply_to_user_id bigint source string in_reply_to_status_id bigint media_url_https string expanded_url string Time taken: 0.697 seconds, Fetched: 17 row(s) 解决方案 I was facing the exact same issue. The issue existed in the timestamp field("created_at" column in your case) which i was trying to insert as string into my new table. My assumption was this data would be in [ "null","string"] format in my source. I analyzed the source avro schema which got generated from the sqoop import --as-avrodatafile process. The avro schema generated from import had the below signature for the timestamp column. { "name" : "order_date", "type" : [ "null", "long" ], "default" : null, "columnName" : "order_date", "sqlType" : "93" }, SqlType 93 stands for Timestamp datatype. So in my target table Avro Schema file I changed the data type to 'long' and this solved the issue. My guess is possibly the mismatch of datatype in one of your columns. 这篇关于失败，出现异常java.io.IOException：org.apache.avro.AvroTypeException：发现很久，期望在配置单元中工会的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

失败，出现异常java.io.IOException：org.apache.avro.AvroTypeException：发现很久，期望在配置单元中工会 [英] Failed with exception java.io.IOException:org.apache.avro.AvroTypeException: Found long, expecting union in hive

问题描述

相关文章

Java开发最新文章

热门教程

热门工具

登录关闭

失败，出现异常java.io.IOException：org.apache.avro.AvroTypeException：发现很久，期望在配置单元中工会 [英] Failed with exception java.io.IOException:org.apache.avro.AvroTypeException: Found long, expecting union in hive

问题描述

相关文章

Java开发最新文章

热门教程

热门工具

登录 关闭

登录关闭