在Hive中使用横向视图时出现异常 [英] Exception while using lateral view in Hive

查看：217 发布时间：2018/5/31 18:39:59 xml hadoop xpath hive brickhouse

本文介绍了在Hive中使用横向视图时出现异常的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我使用下面的代码来解析Hive中的xml数据。在我的xml数据中，有几个标签正在重复使用，所以我使用brickhouse jar和lateral view来解析标签并放置在Hive表格中。但是当我执行我的代码时，出现错误。请帮助，因为我无法理解我做错了什么。

代码：

  add jar /home/cloudera/brickhouse-0.5.5.jar; 
 CREATE TEMPORARY FUNCTION numeric_range AS'brickhouse.udf.collect.NumericRange'; 
 CREATE TEMPORARY FUNCTION array_index AS'brickhouse.udf.collect.ArrayIndexUDF'; 
 add jar /home/cloudera/hivexmlserde-1.0.5.3.jar; 
 set hive.exec.mode.local.auto = false; 
 DROP TABLE IF EXTERNTS medinfo2; 
创建表medinfo2为
 select array_index（statusCode，n）AS statusCode，
 array_index（startTime，n）AS startTime，
 array_index（endTime，n）AS endTime，
 array_index（strengthValue，n）AS strengthValue，
 array_index（strengthUnits，n）AS strengthUnits 
 from medications_info7 lateral view numeric_range（size（statusCode））n1 as n;

错误：

导致：java.lang.IndexOutOfBoundsException：索引：7，大小：7 $ b $在java.util.ArrayList.rangeCheck（ArrayList.java:635）
at java.util.ArrayList.get（ArrayList.java:411）
at com.ibm.spss.hive.serde2.xml.objectinspector.XmlListObjectInspector.getListElement（XmlListObjectInspector.java:79）
at brickhouse。 udf.collect.ArrayIndexUDF.evaluate（ArrayIndexUDF.java:59）
at org.apache.hadoop.hive.ql.exec.ExprNodeGenericFuncEvaluator._evaluate（ExprNodeGenericFuncEvaluator.java:186）
at org.apache。 hadoop.hive.ql.exec.ExprNodeEvaluator.evaluate（ExprNodeEvaluator.java:77）
at org.apache.hadoop.hive.ql.exec.ExprNodeEvaluatorHead._evaluate（ExprNodeEvaluatorHead.java:44）
at org.apache.hadoop.hive.ql.exec.ExprNodeEvaluator.evaluate（ExprNodeEvaluator.java:77）
at org.apache.hadoop.hive.q l.exec.ExprNodeEvaluator.evaluate（ExprNodeEvaluator.java:65）
at org.apache.hadoop.hive.ql.exec.SelectOperator.processOp（SelectOperator.java:77）
... 25 more
$ b 失败：执行错误，从org.apache.hadoop.hive.ql.exec.mr.MapRedTask返回代码2
MapReduce作业发布：
Stage-Stage-1：映射：1 HDFS读取：0 HDFS写入：0 FAIL
Total MapReduce CPU使用时间：0毫秒

示例：

 < document> 
< code> 10160-0< / code> 
< entryInfo> 
< statusCode>完成< / statusCode> 
< startTime> 20110729< / startTime> 
< endTime> 20110822< / endTime> 
< strengthValue> 24< / strengthValue> 
< strengthUnits> h< / strengthUnits> 
< / entryInfo> 
< entryInfo> 
< statusCode>完成< / statusCode> 
< startTime> 20120130< / startTime> 
< endTime> 20120326< / endTime> 
< strengthValue> 12< / strengthValue> 
< strengthUnits> h< / strengthUnits> 
< / entryInfo> 
< entryInfo> 
< statusCode>完成< / statusCode> 
< startTime> 20100412< / startTime> 
< endTime> 20110822< / endTime> 
< strengthValue> 8< / strengthValue> 
< strengthUnits> d< / strengthUnits> 
< / entryInfo> 
< / document>

我的实际样本很大，包含很多重复的标签。

解决方案

我不知道您的数据在Hive中的样子是什么，因为您没有提供这些信息，所以下面是我如何加载XML进入Hive。

Loader ：

ADD JAR /path/to/jar/hivexmlserde-1.0.5.3.jar; DROP TABLE IF EXISTS db.tbl; CREATE TABLE IF NOT EXISTS db.tbl（ code STRING， entryInfo ARRAY< MAP< STRING，STRING>> ） ROW FORMAT SERDE'com。 ibm.spss.hive.serde2.xml.XmlSerde' WITH SERDEPROPERTIES（ column.xpath.code=/ document / code / text（）， column.xpath .entryInfo=/ document / entryInfo / * ） STORED AS INPUTFORMAT'com.ibm.spss.hive.serde2.xml.XmlInputFormat' OUTPUTFORMAT'org .apache.hadoop.hive.ql.io.IgnoreKeyTextOutputFormat' TBLPROPERTIES（ xmlinput.start=< document>， xmlinput.end=< /文件> ）; LOAD DATA LOCAL INPATH'someFile.xml'INTO TABLE db.tbl;
在 3 - Arrays 一节中的Hive-XML-SerDe 文档中，您可以看到它们使用数组结构处理重复的标签，并在 4 - 地图中，您可以看到他们使用地图处理子标签下的条目。因此， entryInfo 将是类型阵列< MAP< STRING，STRING>> 。

然后，您可以展开此数组，收集类似键/ val的数据，然后重新合并。

查询：
ADD JAR /path/to/jar/hivexmlserde-1.0.5.3.jar; ADD JAR /path/to/jars/brickhouse-0.7.1.jars; CREATE TEMPORARY FUNCTION COLLECT as'brickhouse.udf.collect.CollectUDAF'; 选择代码，m_map ['statusCode'] AS status_code ，m_map ['startTime'] AS start_time ，m_map ['endTime'] AS end_time ，m_map ['strengthValue'] AS strength_value ，m_map ['strengthUnits'] AS strength_units FROM（ SELECT code ，COLLECT（m_keys，m_vals）AS m_map FROM（ SELECT code ，idx ，MAP_KEYS（entry_info_map）[0] AS m_keys ，MAP_VALUES（entry_info_map）[0] AS m_vals FROM（ SELECT code ，entry_info_map ，CASE 当楼层（tmp / 5）= 0 THEN 0 当楼层（tmp / 5）= 1 THEN 1 FLOOR（tmp / 5）= 2 THEN 2 ELSE -1 END as idx FROM db.tbl LATERAL VIEW POSEXPLODE（entryInfo）exptbl AS tmp ，entry_info_map）x）y GROUP BY代码，idx）z
输出：
代码status_code start_time end_time strength_value strength_units 10160-0已完成20110729 20110822 24 h 10160-0已完成20120130 20120326 12 h 10160- 0已完成20100412 20110822 8 d
此外，基本上这个问题已经提出了4次（一个，两个，三个，四个）。这不是一个好主意。只需询问一次，编辑即可添加更多信息，并保持耐心。
I am using the below code to parse xml data in Hive. In my xml data, a few tags are repeating so I am using the brickhouse jar and lateral view to parse the tags and place in Hive tables. But when I am executing my code, I am getting an error. Please help as I am not able to understand what I am doing wrong.

Code:
add jar /home/cloudera/brickhouse-0.5.5.jar; CREATE TEMPORARY FUNCTION numeric_range AS 'brickhouse.udf.collect.NumericRange'; CREATE TEMPORARY FUNCTION array_index AS 'brickhouse.udf.collect.ArrayIndexUDF'; add jar /home/cloudera/hivexmlserde-1.0.5.3.jar; set hive.exec.mode.local.auto=false; DROP TABLE IF EXISTS medinfo2; create table medinfo2 as select array_index(statusCode,n) AS statusCode, array_index(startTime,n) AS startTime, array_index(endTime,n) AS endTime, array_index(strengthValue,n) AS strengthValue, array_index(strengthUnits,n) AS strengthUnits from medications_info7 lateral view numeric_range(size( statusCode )) n1 as n;
Error:

Caused by: java.lang.IndexOutOfBoundsException: Index: 7, Size: 7 at java.util.ArrayList.rangeCheck(ArrayList.java:635) at java.util.ArrayList.get(ArrayList.java:411) at com.ibm.spss.hive.serde2.xml.objectinspector.XmlListObjectInspector.getListElement(XmlListObjectInspector.java:79) at brickhouse.udf.collect.ArrayIndexUDF.evaluate(ArrayIndexUDF.java:59) at org.apache.hadoop.hive.ql.exec.ExprNodeGenericFuncEvaluator._evaluate(ExprNodeGenericFuncEvaluator.java:186) at org.apache.hadoop.hive.ql.exec.ExprNodeEvaluator.evaluate(ExprNodeEvaluator.java:77) at org.apache.hadoop.hive.ql.exec.ExprNodeEvaluatorHead._evaluate(ExprNodeEvaluatorHead.java:44) at org.apache.hadoop.hive.ql.exec.ExprNodeEvaluator.evaluate(ExprNodeEvaluator.java:77) at org.apache.hadoop.hive.ql.exec.ExprNodeEvaluator.evaluate(ExprNodeEvaluator.java:65) at org.apache.hadoop.hive.ql.exec.SelectOperator.processOp(SelectOperator.java:77) ... 25 more

FAILED: Execution Error, return code 2 from org.apache.hadoop.hive.ql.exec.mr.MapRedTask MapReduce Jobs Launched: Stage-Stage-1: Map: 1 HDFS Read: 0 HDFS Write: 0 FAIL Total MapReduce CPU Time Spent: 0 msec

Sample:
<document> <code>10160-0</code> <entryInfo> <statusCode>completed</statusCode> <startTime>20110729</startTime> <endTime>20110822</endTime> <strengthValue>24</strengthValue> <strengthUnits>h</strengthUnits> </entryInfo> <entryInfo> <statusCode>completed</statusCode> <startTime>20120130</startTime> <endTime>20120326</endTime> <strengthValue>12</strengthValue> <strengthUnits>h</strengthUnits> </entryInfo> <entryInfo> <statusCode>completed</statusCode> <startTime>20100412</startTime> <endTime>20110822</endTime> <strengthValue>8</strengthValue> <strengthUnits>d</strengthUnits> </entryInfo> </document>
My actual sample is huge in size and contains a lot of these tags which are repeated.
解决方案
I don't know what your data looks like in Hive because you didn't provide that information so here is how I loaded your XML into Hive.

Loader:
ADD JAR /path/to/jar/hivexmlserde-1.0.5.3.jar; DROP TABLE IF EXISTS db.tbl; CREATE TABLE IF NOT EXISTS db.tbl ( code STRING, entryInfo ARRAY<MAP<STRING,STRING>> ) ROW FORMAT SERDE 'com.ibm.spss.hive.serde2.xml.XmlSerde' WITH SERDEPROPERTIES ( "column.xpath.code"="/document/code/text()", "column.xpath.entryInfo"="/document/entryInfo/*" ) STORED AS INPUTFORMAT 'com.ibm.spss.hive.serde2.xml.XmlInputFormat' OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.IgnoreKeyTextOutputFormat' TBLPROPERTIES ( "xmlinput.start"="<document>", "xmlinput.end"="</document>" ); LOAD DATA LOCAL INPATH 'someFile.xml' INTO TABLE db.tbl;
In the Hive-XML-SerDe documentation under section 3 - Arrays, you can see that they use an array structure to handle repeated tags and in 4 - Maps, you can see that they use maps to handle entries under a sub-tag. So, entryInfo will be of type ARRAY<MAP<STRING,STRING>>.

You can then explode this array, collect like key/vals, and re-combine.

Query:
ADD JAR /path/to/jar/hivexmlserde-1.0.5.3.jar; ADD JAR /path/to/jars/brickhouse-0.7.1.jars; CREATE TEMPORARY FUNCTION COLLECT AS 'brickhouse.udf.collect.CollectUDAF'; SELECT code , m_map['statusCode'] AS status_code , m_map['startTime'] AS start_time , m_map['endTime'] AS end_time , m_map['strengthValue'] AS strength_value , m_map['strengthUnits'] AS strength_units FROM ( SELECT code , COLLECT(m_keys, m_vals) AS m_map FROM ( SELECT code , idx , MAP_KEYS(entry_info_map)[0] AS m_keys , MAP_VALUES(entry_info_map)[0] AS m_vals FROM ( SELECT code , entry_info_map , CASE WHEN FLOOR(tmp / 5) = 0 THEN 0 WHEN FLOOR(tmp / 5) = 1 THEN 1 WHEN FLOOR(tmp / 5) = 2 THEN 2 ELSE -1 END AS idx FROM db.tbl LATERAL VIEW POSEXPLODE(entryInfo) exptbl AS tmp, entry_info_map ) x ) y GROUP BY code, idx ) z
Output:
code status_code start_time end_time strength_value strength_units 10160-0 completed 20110729 20110822 24 h 10160-0 completed 20120130 20120326 12 h 10160-0 completed 20100412 20110822 8 d
Also, you've basically asked this question 4 times (one, two, three, four). This is not a good idea. Just ask once, edit to add more information, and be patient.

这篇关于在Hive中使用横向视图时出现异常的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

在Hive中使用横向视图时出现异常 [英] Exception while using lateral view in Hive

问题描述

相关文章

分布式计算/Hadoop最新文章

热门教程

热门工具

登录关闭

在Hive中使用横向视图时出现异常 [英] Exception while using lateral view in Hive

问题描述

相关文章

分布式计算/Hadoop最新文章

热门教程

热门工具

登录 关闭

登录关闭