配置单元hive.optimize.sort.dynamic.partition [英] hive setting hive.optimize.sort.dynamic.partition
问题描述
我试图插入带有动态分区的配置单元表。相同的查询在过去的几天内运行良好,但现在正在给出下面的错误。
此任务的诊断消息:java.lang.RuntimeException:
org.apache.hadoop.hive.ql。 metadata.HiveException:配置单元运行时错误:
无法序列化从
x1x128x0x0x46x234x240x192x148x1x68x69x86x50x0x1x128x0x104x118x1x128x0x0x46x234x240x192x148x1x128x0x0x25x1x128x0x0x46x1x128x0x0x72x1x127x255x255x255x0x0x0x0x1x71x66x80x0x255
相性质减少输入键
{列= reducesinkkey0,reducesinkkey1,reducesinkkey2,reducesinkkey3,reducesinkkey4,reducesinkkey5,reducesinkkey6 ,reduceinkkey7,reducinginkkey8,reducinginkkey9,reducinginkkey10,reducinginkkey11,reducinginkkey12,
serialization.lib = org.apache.hadoop.hive.serde2.binarysortable.BinarySortableSerDe,
serialization.sort.order = +++++
在org中。
列表。类型= bigint,字符串,int,bigint,int,int,int,字符串,int,字符串,字符串,字符串,字符串} apache.hadoop.hive.ql.exec.mr.ExecReducer.reduce(ExecReducer.java:283)
at org.apache.hadoop.mapred.ReduceTask.runOldRe ducer(ReduceTask.java:506)
at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:447)
at org.apache.hadoop.mapred.Child $ 4.run(Child .java
失败:执行错误,从
返回代码2 org.apache.hadoop.hive.ql.exec.mr.MapRedTask MapReduce作业启动:
Stage-Stage- 1:映射:1减少:1累积CPU:3.33秒HDFS
读取:889 HDFS写入:314 SUCCESS阶段2:映射:1减少:1
累积CPU:1.42秒HDFS阅读: 675 HDFS Write:0 FAIL
当我使用下面的设置时,查询运行正常
set hive.optimize.sort.dynamic.partition = false
当我将此值设置为true时,它会给出相同的错误。
源表格以序列格式存储,目标表格以RC格式存储。
任何人都可以解释这个设置在内部有什么不同吗?
有时当我们尝试执行<$ c
$ c $> Insert Table 将动态分区设置为True我们得到这些错误。这是因为配置单元传递了一些内部列来帮助reducer阶段,当 hive.optimize.sort.dynamic.partition
被启用时,它不是数据的一部分。这个设置不是稳定的。
这就是为什么在hive0.14.0和更高版本中默认禁用此设置,但在hive0.13.0中默认启用此设置。希望你得到它....
I am trying to insert into a hive table with dynamic partitions. The same query has been running fine for last few days, but is giving the below error now.
Diagnostic Messages for this Task: java.lang.RuntimeException:
org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error:
Unable to deserialize reduce input key from
x1x128x0x0x46x234x240x192x148x1x68x69x86x50x0x1x128x0x104x118x1x128x0x0x46x234x240x192x148x1x128x0x0x25x1x128x0x0x46x1x128x0x0x72x1x127x255x255x255x0x0x0x0x1x71x66x80x0x255
with properties
{columns=reducesinkkey0,reducesinkkey1,reducesinkkey2,reducesinkkey3,reducesinkkey4,reducesinkkey5,reducesinkkey6,reducesinkkey7,reducesinkkey8,reducesinkkey9,reducesinkkey10,reducesinkkey11,reducesinkkey12,
serialization.lib=org.apache.hadoop.hive.serde2.binarysortable.BinarySortableSerDe,
serialization.sort.order=+++++++++++++,
columns.types=bigint,string,int,bigint,int,int,int,string,int,string,string,string,string}
at org.apache.hadoop.hive.ql.exec.mr.ExecReducer.reduce(ExecReducer.java:283)
at org.apache.hadoop.mapred.ReduceTask.runOldReducer(ReduceTask.java:506)
at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:447)
at org.apache.hadoop.mapred.Child$4.run(Child.java
FAILED: Execution Error, return code 2 from
org.apache.hadoop.hive.ql.exec.mr.MapRedTask MapReduce Jobs Launched:
Stage-Stage-1: Map: 1 Reduce: 1 Cumulative CPU: 3.33 sec HDFS
Read: 889 HDFS Write: 314 SUCCESS Stage-Stage-2: Map: 1 Reduce: 1
Cumulative CPU: 1.42 sec HDFS Read: 675 HDFS Write: 0 FAIL
When I use the below setting, the query runs fine
set hive.optimize.sort.dynamic.partition=false
when I set this value to true, it gives the same error.
The Source Table is stored in Sequence Format and Destination Table is stored in RC Format. Can anyone explain what difference does this setting makes internally?
Sometimes when we try to do an Insert Table
with Dynamic Partitions set to True we get these error.
This happens because hive passes some internal columns to help the reducer phase which is not the part of the data when hive.optimize.sort.dynamic.partition
is enabled. This setting is not a stable one.
That is why this setting is disabled by default in hive0.14.0 and later versions but by default enabled in hive0.13.0. Hope you get it....
这篇关于配置单元hive.optimize.sort.dynamic.partition的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!