在Spark中阅读XML [英] Read XML in spark

查看:204
本文介绍了在Spark中阅读XML的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试使用spark-xml jar阅读pysaprk中的xml/nested xml.

i am trying to read xml/nested xml in pysaprk uing spark-xml jar.

df = sqlContext.read \
  .format("com.databricks.spark.xml")\
   .option("rowTag", "hierachy")\
   .load("test.xml"

执行时,数据框未正确创建.

when i execute, dataframe is not creating properly.

    +--------------------+
    |                 att|
    +--------------------+
    |[[1,Data,[Wrapped...|
    +--------------------+

我提到的

xml格式如下:

xml format i have is mentioned below :

推荐答案

heirarchy应该是 rootTag ,而att应该是 rowTag as

heirarchy should be rootTag and att should be rowTag as

df = spark.read \
    .format("com.databricks.spark.xml") \
    .option("rootTag", "hierarchy") \
    .option("rowTag", "att") \
    .load("test.xml")

您应该获得

+-----+------+----------------------------+
|Order|attval|children                    |
+-----+------+----------------------------+
|1    |Data  |[[[1, Studyval], [2, Site]]]|
|2    |Info  |[[[1, age], [2, gender]]]   |
+-----+------+----------------------------+

schema

root
 |-- Order: long (nullable = true)
 |-- attval: string (nullable = true)
 |-- children: struct (nullable = true)
 |    |-- att: array (nullable = true)
 |    |    |-- element: struct (containsNull = true)
 |    |    |    |-- Order: long (nullable = true)
 |    |    |    |-- attval: string (nullable = true)

找到有关 databricks xml

这篇关于在Spark中阅读XML的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆