如何将xml文件加载到Hive中 [英] How to load xml file into Hive

查看:564
本文介绍了如何将xml文件加载到Hive中的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在研究Hive表,我有以下问题。我的HDFS中有超过10亿个xml文件。我想要做的是,每个XML文件有4个不同的部分。现在我想为每个xml文件拆分和加载每个表中的每个部分。



示例:

 <?xml version ='1.0'encoding ='iso-8859-1'?> 

< section1>
< id> 1233222< / id>
//有很多xml关键字
< / section1>

< section2>
//有很多xml分类
< / section2>

< section3>
//拥有很多xml密码
< / section3>

< section4>
//有很多xml关键字
< / section4>

< / xml>

我有四张桌子

  section1Table 

id section1 // fields

section2Table

id section2

第3节表格

id section3

section4表格

id section4

现在我想分割并将数据加载到每个表中。



我该如何实现这一目标。任何人都可以帮助我



谢谢

更新



我试过了下面的内容:

  CREATE EXTERNAL TABLE test(name STRING)LOCATION'/ user / sornalingam / zipped / output / Tagged / t1'; \ 


SELECT xpath(name,'// section1')FROM test LIMIT 1;

但我得到以下错误:

  java.lang.RuntimeException:org.apache.hadoop.hive.ql.metadata.HiveException:处理行{name时出现Hive运行时错误:<?xml version ='1.0' encoding ='iso-8859-1'?>} 


解决方案

您有以下几种选择:


  • 将XML加载到具有字符串列的Hive表中,每行一个(例如 CREATE TABLE xmlfiles(id int,xmlfile string)。然后使用 XPath UDF 来处理XML。

  • 由于您知道所需的XPath(例如 // section1 ),请按照下面的本教程通过XPath直接获取到Hive。

  • 按照此处所述将XML映射到Avro,因为 SerDe 存在无缝Avro-to-Hive映射。
  • 使用XPath将数据存储在然后将它摄入到Hive中。



这取决于您对这些方法的经验和舒适程度。 p>

Im working on Hive tables im having the following problem. I am having more than 1 billion of xml files in my HDFS. What i want to do is, Each xml file having the 4 different sections. Now i want to split and load the each part in the each table for every xml file

Example :

            <?xml version='1.0' encoding='iso-8859-1'?>

            <section1>
                <id> 1233222 </id>
               // having lot of xml tages 
            </section1>

            <section2>
               // having lot of xml tages 
            </section2>

            <section3>
               // having lot of xml tages 
            </section3>

            <section4>
               // having lot of xml tages 
            </section4>

            </xml>

And i have the four tables

        section1Table

        id       section1    // fields 

        section2Table

        id       section2

        section3Table 

        id       section3

        section4Table

        id       section4

Now i want to split and load the data into each table.

How can i achieve this . Can anyone help me

Thanks

UPDATE

I have tried the following

CREATE EXTERNAL TABLE test(name STRING) LOCATION '/user/sornalingam/zipped/output/Tagged/t1';\


SELECT xpath (name, '//section1') FROM test LIMIT 1 ;

but i got the following error

java.lang.RuntimeException: org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while processing row {"name":"<?xml version='1.0' encoding='iso-8859-1'?>"}

解决方案

You have several options:

  • Load the XML into a Hive table with a string column, one per row (e.g. CREATE TABLE xmlfiles (id int, xmlfile string). Then use an XPath UDF to do work on the XML.
  • Since you know the XPath's of what you want (e.g. //section1), follow the instructions in the second half of this tutorial to ingest directly into Hive via XPath.
  • Map your XML to Avro as described here because a SerDe exists for seamless Avro-to-Hive mapping.
  • Use XPath to store your data in a regular text file in HDFS and then ingest that into Hive.

It depends on your level of experience and comfort with these approaches.

这篇关于如何将xml文件加载到Hive中的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆