发出Hive AvroSerDe tblProperties最大长度 [英] Issue Hive AvroSerDe tblProperties max length

查看:219
本文介绍了发出Hive AvroSerDe tblProperties最大长度的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我尝试用AvroSerDe创建一个表格。
我已经试过下面的命令来创建表:

  CREATE EXTERNAL TABLE gaSession 
ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.avro.AvroSerDe'
STORED AS
INPUTFORMAT'org.apache.hadoop.hive.ql.io.avro.AvroContainerInputFormat'
OUTPUTFORMAT'org .apache.hadoop.hive.ql.io.avro.AvroContainerOutputFormat'
TBLPROPERTIES('avro.schema.url'='hdfs://<< url>>:<<< port>> ; /<<路径>> /<<文件>> .avsc');

创建似乎可行,但会生成以下表:

  hive>显示create table gaSession; 

CREATE EXTERNAL TABLE`gaSession`(
`error_error_error_error_error_error_error`字符串COMMENT'来自反序列化器',
`cannot_determine_schema`字符串COMMENT'来自反序列化器',
`检查
`schema` string COMMENT'from deserializer',
`url` string COMMENT'from deserializer',
`和`string COMMENT'from deserializer',
`literal`字符串来自反序列化器的COMMENT')
ROW FORMAT SERDE
'org.apache.hadoop.hive.serde2.avro.AvroSerDe'
...

之后,我复制了定义并用'avro.schema.literal'替换了'avro.schema.url' ,但该表仍然无法正常工作。



但是,当我删除一些(随机)字段时,它可以工作(例如,用下面的定义)。
$ b

  CREATE TABLE gaSession 
ROW FORMAT SERDE'org.apache.hadoop.hive.serde2.avro.AvroSerDe'
STORED AS
INPUTFORMAT'org.apache.hadoop.hive.ql.io.avro.AvroContainerInputFormat'
OUTPUTFORMAT'org.apache.hadoop.hive.ql.io.avro.AvroContainerOutputFormat'
TBLPROPERTIES('avro.schema .literal'='{type:record,
name:root,
fields:[
{
name:visitorId ,
type:[
long,
null
]
},
{
name: visitNumber,
type:[
long,
null
]
},
{
name :visitId,
type:[
long,
null
]
},
{
name:visitStartTime,
type:[
long,
null
]
},
{
name:date,
type:[
string,
null
]
},
{
name:totals,
type:[
{
type:record,
名称:总计,
字段:[
{
name:visits,
type:[
long,
null

},
{
name:hits,
type:[
long ,
null
]
},
{
name:pageviews,
type:[
long,
null
]
},
{
name:timeOnSite,
type:[
long,
null
]
},
{
name:bounces,
type:[
long,
null
]
},
{
name:transactions,
type:[
long,
null
]
},
{
name:transactionRevenue,
type:[
long,
null
]
},
{
name:newVisits,
type:[
long,
null
]
},
{
name:screenviews,
type:[
long,
null
]
},
{
name:uniqueScreenviews,
type:[
long,
null
]
},
{
name:timeOnScreen,
type:[
long,
null
]
},
{
name:totalTransactionRevenue,
type:[
long,
null




$ null


}';

TBLPROPERTIES / avro.schema.literal是否有最大长度或其他限制? strong>



Hive-Version:0.14.0

解决方案

Hortonworks支持团队证实,tblproperties的字符数限制为4000个。
因此,通过删除空格可以定义一个更大的表格。否则,您必须使用'avro.schema.url'。


I try to create a table with AvroSerDe. I have already tried following command to create the table:

CREATE EXTERNAL TABLE gaSession
 ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.avro.AvroSerDe'
 STORED AS
 INPUTFORMAT 'org.apache.hadoop.hive.ql.io.avro.AvroContainerInputFormat'
 OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.avro.AvroContainerOutputFormat'
 TBLPROPERTIES ('avro.schema.url'='hdfs://<<url>>:<<port>>/<<path>>/<<file>>.avsc');

The creation seems to work, but following table is generated:

hive> show create table gaSession;
OK
CREATE EXTERNAL TABLE `gaSession`(
  `error_error_error_error_error_error_error` string COMMENT 'from deserializer',
  `cannot_determine_schema` string COMMENT 'from deserializer',
  `check` string COMMENT 'from deserializer',
  `schema` string COMMENT 'from deserializer',
  `url` string COMMENT 'from deserializer',
  `and` string COMMENT 'from deserializer',
  `literal` string COMMENT 'from deserializer')
ROW FORMAT SERDE
  'org.apache.hadoop.hive.serde2.avro.AvroSerDe'
...

After it, I copied the definition and replaced 'avro.schema.url' with 'avro.schema.literal', but the table still doesn't work.

But when I delete some (random) fields, it works (e.g. with follwoing definition).

CREATE TABLE gaSession
     ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.avro.AvroSerDe'
     STORED AS
     INPUTFORMAT 'org.apache.hadoop.hive.ql.io.avro.AvroContainerInputFormat'
     OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.avro.AvroContainerOutputFormat'
     TBLPROPERTIES ('avro.schema.literal'='{"type": "record",
"name": "root",
"fields": [
    {
        "name": "visitorId",
        "type": [
            "long",
            "null"
        ]
    },
    {
        "name": "visitNumber",
        "type": [
            "long",
            "null"
        ]
    },
    {
        "name": "visitId",
        "type": [
            "long",
            "null"
        ]
    },
    {
        "name": "visitStartTime",
        "type": [
            "long",
            "null"
        ]
    },
    {
        "name": "date",
        "type": [
            "string",
            "null"
        ]
    },
    {
        "name": "totals",
        "type": [
            {
                "type": "record",
                "name": "totals",
                "fields": [
                    {
                        "name": "visits",
                        "type": [
                            "long",
                            "null"
                        ]
                    },
                    {
                        "name": "hits",
                        "type": [
                            "long",
                            "null"
                        ]
                    },
                    {
                        "name": "pageviews",
                        "type": [
                            "long",
                            "null"
                        ]
                    },
                    {
                        "name": "timeOnSite",
                        "type": [
                            "long",
                            "null"
                        ]
                    },
                    {
                        "name": "bounces",
                        "type": [
                            "long",
                            "null"
                        ]
                    },
                    {
                        "name": "transactions",
                        "type": [
                            "long",
                            "null"
                        ]
                    },
                    {
                        "name": "transactionRevenue",
                        "type": [
                            "long",
                            "null"
                        ]
                    },
                    {
                        "name": "newVisits",
                        "type": [
                            "long",
                            "null"
                        ]
                    },
                    {
                        "name": "screenviews",
                        "type": [
                            "long",
                            "null"
                        ]
                    },
                    {
                        "name": "uniqueScreenviews",
                        "type": [
                            "long",
                            "null"
                        ]
                    },
                    {
                        "name": "timeOnScreen",
                        "type": [
                            "long",
                            "null"
                        ]
                    },
                    {
                        "name": "totalTransactionRevenue",
                        "type": [
                            "long",
                            "null"
                        ]
                    }
                ]
            },
            "null"
        ]
    }
]
 }');

Has TBLPROPERTIES/avro.schema.literal has a max length or other limitations?

Hive-Version: 0.14.0

解决方案

The Hortonworks support team confirmed, that there is 4000 character limit for tblproperties. So, by removing whitespaces you're able to define a larger table. Otherwise, you have to work with 'avro.schema.url'.

这篇关于发出Hive AvroSerDe tblProperties最大长度的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆