xml到配置单元表.地图数组 [英] xml to hive table. array of maps

查看:47
本文介绍了xml到配置单元表.地图数组的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我的xml看起来像这样

My xml looks like this

<TAG>
  <REQUEST_ID>1</REQUEST_ID>
  <APPLICATION_ID>2</APPLICATION_ID>
  <EXTERNAL_SYSTEM_CODE>RB</EXTERNAL_SYSTEM_CODE>
  <CCM_CHECK>
    <CCM_CHECK_ID>101</CCM_CHECK_ID>
    <CCM_CHECK_RESULT>10</CCM_CHECK_RESULT>
  </CCM_CHECK>
  <VERIF_ANSWERS>
    <CHECK_CODE>101</CHECK_CODE>
    <QUESTION_CODE>1</QUESTION_CODE>
    <BOOKMARK_NUMBER>1</BOOKMARK_NUMBER>
    <ANSWER_VALUE>NN</ANSWER_VALUE>
  </VERIF_ANSWERS>
  <VERIF_ANSWERS>
    <CHECK_CODE>101</CHECK_CODE>
    <QUESTION_CODE>2</QUESTION_CODE>
    <BOOKMARK_NUMBER>1</BOOKMARK_NUMBER>
    <ANSWER_VALUE>NN</ANSWER_VALUE>
  </VERIF_ANSWERS>
</TAG>

这就是我从中创建表的方式

this is how I create a table from it

CREATE EXTERNAL TABLE s_sourcedata.evkuzmin_test_xml(
  request_id string
  , application_id string
  , external_system_code string
  , ccm_check map<string, string>
  , verif_answers array<struct<verif_answer:array<map<string, string>>>>
)
ROW FORMAT SERDE 'com.ibm.spss.hive.serde2.xml.XmlSerDe'
WITH SERDEPROPERTIES (
  "column.xpath.request_id"="/TAG/REQUEST_ID/text()",
  "column.xpath.application_id"="/TAG/APPLICATION_ID/text()",
  "column.xpath.external_system_code"="/TAG/EXTERNAL_SYSTEM_CODE/text()",
  "column.xpath.ccm_check"="/TAG/CCM_CHECK/*",
  "column.xpath.verif_answers"="/TAG")
STORED AS
INPUTFORMAT 'com.ibm.spss.hive.serde2.xml.XmlInputFormat'
OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.IgnoreKeyTextOutputFormat'
LOCATION '/storage/s_sourcedata/db/evkuzmin_test_xml'
TBLPROPERTIES (
  "xmlinput.start"="<TAG",
  "xmlinput.end"="</TAG>"
);

结果如下

1,2,RB,"{""CCM_CHECK_ID"":""101"",""CCM_CHECK_RESULT"":""10""}","[{""verif_answer"":null}]"

如何像我对 ccm_check 所做的那样,将 verif_answers 变成一组关键评估者对?

How can i turn verif_answers into an array of key valur pairs like I did for ccm_check?

我尝试用与 ccm_check 相同的方式进行此操作?但只有第一个 VERIF_ANSWERS .

I tried doing it the same way I did for ccm_check? but got only the first VERIF_ANSWERS.

VERIF_ANSWERS 的数量可以变化.在这种情况下有2个?但可以是0或10.

The number of VERIF_ANSWERS can vary. In this case there are 2? but there can be 0 or 10.

推荐答案

CREATE EXTERNAL TABLE myxml(
  request_id string
  , application_id string
  , external_system_code string
  , ccm_check map<string, string>
  , verif_answers array<map<string, string>>
)
ROW FORMAT SERDE 'com.ibm.spss.hive.serde2.xml.XmlSerDe'
WITH SERDEPROPERTIES (
  "column.xpath.request_id"="/TAG/REQUEST_ID/text()",
  "column.xpath.application_id"="/TAG/APPLICATION_ID/text()",
  "column.xpath.external_system_code"="/TAG/EXTERNAL_SYSTEM_CODE/text()",
  "column.xpath.ccm_check"="/TAG/CCM_CHECK/*",
  "column.xpath.verif_answers"="/TAG/VERIF_ANSWERS/*")
STORED AS
INPUTFORMAT 'com.ibm.spss.hive.serde2.xml.XmlInputFormat'
OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.IgnoreKeyTextOutputFormat'
LOCATION 'file:///home/cloudera/xmlfiles'
TBLPROPERTIES (
  "xmlinput.start"="<TAG",
  "xmlinput.end"="</TAG>"
);

select myxml.verif_answers from myxml;

INFO  : OK
+----------------------------------------------------+--+
|                myxml.verif_answers                 |
+----------------------------------------------------+--+
| [{"CHECK_CODE":"101"},{"QUESTION_CODE":"1"},{"BOOKMARK_NUMBER":"1"},{"ANSWER_VALUE":"NN"},{"CHECK_CODE":"101"},{"QUESTION_CODE":"2"},{"BOOKMARK_NUMBER":"1"},{"ANSWER_VALUE":"NN"}] |
+----------------------------------------------------+--+
1 row selected (0.306 seconds)

这篇关于xml到配置单元表.地图数组的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆