发出Hive AvroSerDe tblProperties最大长度 [英] Issue Hive AvroSerDe tblProperties max length
问题描述
我已经试过下面的命令来创建表:
CREATE EXTERNAL TABLE gaSession
ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.avro.AvroSerDe'
STORED AS
INPUTFORMAT'org.apache.hadoop.hive.ql.io.avro.AvroContainerInputFormat'
OUTPUTFORMAT'org .apache.hadoop.hive.ql.io.avro.AvroContainerOutputFormat'
TBLPROPERTIES('avro.schema.url'='hdfs://<< url>>:<<< port>> ; /<<路径>> /<<文件>> .avsc');
创建似乎可行,但会生成以下表:
hive>显示create table gaSession;
CREATE EXTERNAL TABLE`gaSession`(
`error_error_error_error_error_error_error`字符串COMMENT'来自反序列化器',
`cannot_determine_schema`字符串COMMENT'来自反序列化器',
`检查
`schema` string COMMENT'from deserializer',
`url` string COMMENT'from deserializer',
`和`string COMMENT'from deserializer',
`literal`字符串来自反序列化器的COMMENT')
ROW FORMAT SERDE
'org.apache.hadoop.hive.serde2.avro.AvroSerDe'
...
之后,我复制了定义并用'avro.schema.literal'替换了'avro.schema.url' ,但该表仍然无法正常工作。
但是,当我删除一些(随机)字段时,它可以工作(例如,用下面的定义)。
$ b
CREATE TABLE gaSession
ROW FORMAT SERDE'org.apache.hadoop.hive.serde2.avro.AvroSerDe'
STORED AS
INPUTFORMAT'org.apache.hadoop.hive.ql.io.avro.AvroContainerInputFormat'
OUTPUTFORMAT'org.apache.hadoop.hive.ql.io.avro.AvroContainerOutputFormat'
TBLPROPERTIES('avro.schema .literal'='{type:record,
name:root,
fields:[
{
name:visitorId ,
type:[
long,
null
]
},
{
name: visitNumber,
type:[
long,
null
]
},
{
name :visitId,
type:[
long,
null
]
},
{
name:visitStartTime,
type:[
long,
null
]
},
{
name:date,
type:[
string,
null
]
},
{
name:totals,
type:[
{
type:record,
名称:总计,
字段:[
{
name:visits,
type:[
long,
null
},
{
name:hits,
type:[
long ,
null
]
},
{
name:pageviews,
type:[
long,
null
]
},
{
name:timeOnSite,
type:[
long,
null
]
},
{
name:bounces,
type:[
long,
null
]
},
{
name:transactions,
type:[
long,
null
]
},
{
name:transactionRevenue,
type:[
long,
null
]
},
{
name:newVisits,
type:[
long,
null
]
},
{
name:screenviews,
type:[
long,
null
]
},
{
name:uniqueScreenviews,
type:[
long,
null
]
},
{
name:timeOnScreen,
type:[
long,
null
]
},
{
name:totalTransactionRevenue,
type:[
long,
null
$ null
}';
TBLPROPERTIES / avro.schema.literal是否有最大长度或其他限制? strong>
Hive-Version:0.14.0
Hortonworks支持团队证实,tblproperties的字符数限制为4000个。
因此,通过删除空格可以定义一个更大的表格。否则,您必须使用'avro.schema.url'。
I try to create a table with AvroSerDe. I have already tried following command to create the table:
CREATE EXTERNAL TABLE gaSession
ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.avro.AvroSerDe'
STORED AS
INPUTFORMAT 'org.apache.hadoop.hive.ql.io.avro.AvroContainerInputFormat'
OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.avro.AvroContainerOutputFormat'
TBLPROPERTIES ('avro.schema.url'='hdfs://<<url>>:<<port>>/<<path>>/<<file>>.avsc');
The creation seems to work, but following table is generated:
hive> show create table gaSession;
OK
CREATE EXTERNAL TABLE `gaSession`(
`error_error_error_error_error_error_error` string COMMENT 'from deserializer',
`cannot_determine_schema` string COMMENT 'from deserializer',
`check` string COMMENT 'from deserializer',
`schema` string COMMENT 'from deserializer',
`url` string COMMENT 'from deserializer',
`and` string COMMENT 'from deserializer',
`literal` string COMMENT 'from deserializer')
ROW FORMAT SERDE
'org.apache.hadoop.hive.serde2.avro.AvroSerDe'
...
After it, I copied the definition and replaced 'avro.schema.url' with 'avro.schema.literal', but the table still doesn't work.
But when I delete some (random) fields, it works (e.g. with follwoing definition).
CREATE TABLE gaSession
ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.avro.AvroSerDe'
STORED AS
INPUTFORMAT 'org.apache.hadoop.hive.ql.io.avro.AvroContainerInputFormat'
OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.avro.AvroContainerOutputFormat'
TBLPROPERTIES ('avro.schema.literal'='{"type": "record",
"name": "root",
"fields": [
{
"name": "visitorId",
"type": [
"long",
"null"
]
},
{
"name": "visitNumber",
"type": [
"long",
"null"
]
},
{
"name": "visitId",
"type": [
"long",
"null"
]
},
{
"name": "visitStartTime",
"type": [
"long",
"null"
]
},
{
"name": "date",
"type": [
"string",
"null"
]
},
{
"name": "totals",
"type": [
{
"type": "record",
"name": "totals",
"fields": [
{
"name": "visits",
"type": [
"long",
"null"
]
},
{
"name": "hits",
"type": [
"long",
"null"
]
},
{
"name": "pageviews",
"type": [
"long",
"null"
]
},
{
"name": "timeOnSite",
"type": [
"long",
"null"
]
},
{
"name": "bounces",
"type": [
"long",
"null"
]
},
{
"name": "transactions",
"type": [
"long",
"null"
]
},
{
"name": "transactionRevenue",
"type": [
"long",
"null"
]
},
{
"name": "newVisits",
"type": [
"long",
"null"
]
},
{
"name": "screenviews",
"type": [
"long",
"null"
]
},
{
"name": "uniqueScreenviews",
"type": [
"long",
"null"
]
},
{
"name": "timeOnScreen",
"type": [
"long",
"null"
]
},
{
"name": "totalTransactionRevenue",
"type": [
"long",
"null"
]
}
]
},
"null"
]
}
]
}');
Has TBLPROPERTIES/avro.schema.literal has a max length or other limitations?
Hive-Version: 0.14.0
The Hortonworks support team confirmed, that there is 4000 character limit for tblproperties. So, by removing whitespaces you're able to define a larger table. Otherwise, you have to work with 'avro.schema.url'.
这篇关于发出Hive AvroSerDe tblProperties最大长度的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!