在Spark中阅读XML [英] Read XML in spark
本文介绍了在Spark中阅读XML的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!
问题描述
我正在尝试使用spark-xml jar阅读pysaprk中的xml/nested xml.
i am trying to read xml/nested xml in pysaprk uing spark-xml jar.
df = sqlContext.read \
.format("com.databricks.spark.xml")\
.option("rowTag", "hierachy")\
.load("test.xml"
执行时,数据框未正确创建.
when i execute, dataframe is not creating properly.
+--------------------+
| att|
+--------------------+
|[[1,Data,[Wrapped...|
+--------------------+
我提到的
xml格式如下:
xml format i have is mentioned below :
推荐答案
heirarchy
应该是 rootTag ,而att
应该是 rowTag as
heirarchy
should be rootTag and att
should be rowTag as
df = spark.read \
.format("com.databricks.spark.xml") \
.option("rootTag", "hierarchy") \
.option("rowTag", "att") \
.load("test.xml")
您应该获得
+-----+------+----------------------------+
|Order|attval|children |
+-----+------+----------------------------+
|1 |Data |[[[1, Studyval], [2, Site]]]|
|2 |Info |[[[1, age], [2, gender]]] |
+-----+------+----------------------------+
和schema
root
|-- Order: long (nullable = true)
|-- attval: string (nullable = true)
|-- children: struct (nullable = true)
| |-- att: array (nullable = true)
| | |-- element: struct (containsNull = true)
| | | |-- Order: long (nullable = true)
| | | |-- attval: string (nullable = true)
找到有关 databricks xml
这篇关于在Spark中阅读XML的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!
查看全文