如何将 xml 文件加载到 Hive 中 [英] How to load xml file into Hive
本文介绍了如何将 xml 文件加载到 Hive 中的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!
问题描述
我正在处理 Hive 表,但遇到以下问题.我的 HDFS 中有超过 10 亿个 xml 文件.我想要做的是,每个 xml 文件都有 4 个不同的部分.现在我想为每个 xml 文件拆分和加载每个表中的每个部分
Im working on Hive tables im having the following problem. I am having more than 1 billion of xml files in my HDFS. What i want to do is, Each xml file having the 4 different sections. Now i want to split and load the each part in the each table for every xml file
示例:
<?xml version='1.0' encoding='iso-8859-1'?>
<section1>
<id> 1233222 </id>
// having lot of xml tages
</section1>
<section2>
// having lot of xml tages
</section2>
<section3>
// having lot of xml tages
</section3>
<section4>
// having lot of xml tages
</section4>
</xml>
我有四张桌子
section1Table
id section1 // fields
section2Table
id section2
section3Table
id section3
section4Table
id section4
现在我想将数据拆分并加载到每个表中.
Now i want to split and load the data into each table.
我怎样才能做到这一点.谁能帮帮我
How can i achieve this . Can anyone help me
谢谢
更新
我已经尝试了以下
CREATE EXTERNAL TABLE test(name STRING) LOCATION '/user/sornalingam/zipped/output/Tagged/t1';
SELECT xpath (name, '//section1') FROM test LIMIT 1 ;
但我收到以下错误
java.lang.RuntimeException: org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while processing row {"name":"<?xml version='1.0' encoding='iso-8859-1'?>"}
推荐答案
您有几个选择:
- 将 XML 加载到带有字符串列的 Hive 表中,每行一个(例如
CREATE TABLE xmlfiles (id int, xmlfile string)
.然后使用 XPath UDF 处理 XML. - 既然您知道所需的 XPath(例如
//section1
),请按照 本教程以通过 XPath 直接摄取到 Hive. - 按照此处的说明将您的 XML 映射到 Avro,因为SerDe 存在用于无缝 Avro-to-Hive 映射.
- 使用 XPath 将数据存储在 HDFS 中的常规文本文件中,然后将其提取到 Hive 中.
- Load the XML into a Hive table with a string column, one per row (e.g.
CREATE TABLE xmlfiles (id int, xmlfile string)
. Then use an XPath UDF to do work on the XML. - Since you know the XPath's of what you want (e.g.
//section1
), follow the instructions in the second half of this tutorial to ingest directly into Hive via XPath. - Map your XML to Avro as described here because a SerDe exists for seamless Avro-to-Hive mapping.
- Use XPath to store your data in a regular text file in HDFS and then ingest that into Hive.
这取决于您对这些方法的经验和舒适度.
It depends on your level of experience and comfort with these approaches.
这篇关于如何将 xml 文件加载到 Hive 中的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!
查看全文