如何将xml文件加载到Hive中 [英] How to load xml file into Hive
问题描述
示例:
<?xml version ='1.0'encoding ='iso-8859-1'?>
< section1>
< id> 1233222< / id>
//有很多xml关键字
< / section1>
< section2>
//有很多xml分类
< / section2>
< section3>
//拥有很多xml密码
< / section3>
< section4>
//有很多xml关键字
< / section4>
< / xml>
我有四张桌子
section1Table
id section1 // fields
section2Table
id section2
第3节表格
id section3
section4表格
id section4
现在我想分割并将数据加载到每个表中。
我该如何实现这一目标。任何人都可以帮助我
谢谢
更新
我试过了下面的内容:
CREATE EXTERNAL TABLE test(name STRING)LOCATION'/ user / sornalingam / zipped / output / Tagged / t1'; \
SELECT xpath(name,'// section1')FROM test LIMIT 1;
但我得到以下错误:
java.lang.RuntimeException:org.apache.hadoop.hive.ql.metadata.HiveException:处理行{name时出现Hive运行时错误:<?xml version ='1.0' encoding ='iso-8859-1'?>}
您有以下几种选择:
CREATE TABLE xmlfiles(id int,xmlfile string)
。然后使用 XPath UDF 来处理XML。
// section1
),请按照下面的本教程通过XPath直接获取到Hive。
这取决于您对这些方法的经验和舒适程度。 p>
Im working on Hive tables im having the following problem. I am having more than 1 billion of xml files in my HDFS. What i want to do is, Each xml file having the 4 different sections. Now i want to split and load the each part in the each table for every xml file
Example :
<?xml version='1.0' encoding='iso-8859-1'?>
<section1>
<id> 1233222 </id>
// having lot of xml tages
</section1>
<section2>
// having lot of xml tages
</section2>
<section3>
// having lot of xml tages
</section3>
<section4>
// having lot of xml tages
</section4>
</xml>
And i have the four tables
section1Table
id section1 // fields
section2Table
id section2
section3Table
id section3
section4Table
id section4
Now i want to split and load the data into each table.
How can i achieve this . Can anyone help me
Thanks
UPDATE
I have tried the following
CREATE EXTERNAL TABLE test(name STRING) LOCATION '/user/sornalingam/zipped/output/Tagged/t1';\
SELECT xpath (name, '//section1') FROM test LIMIT 1 ;
but i got the following error
java.lang.RuntimeException: org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while processing row {"name":"<?xml version='1.0' encoding='iso-8859-1'?>"}
You have several options:
- Load the XML into a Hive table with a string column, one per row (e.g.
CREATE TABLE xmlfiles (id int, xmlfile string)
. Then use an XPath UDF to do work on the XML. - Since you know the XPath's of what you want (e.g.
//section1
), follow the instructions in the second half of this tutorial to ingest directly into Hive via XPath. - Map your XML to Avro as described here because a SerDe exists for seamless Avro-to-Hive mapping.
- Use XPath to store your data in a regular text file in HDFS and then ingest that into Hive.
It depends on your level of experience and comfort with these approaches.
这篇关于如何将xml文件加载到Hive中的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!