使用sparkxml从xml提取标签属性 [英] Extracting tag attributes from xml using sparkxml
问题描述
我正在使用com.databricks.spark.xml加载xml文件,并且我想使用sql上下文读取标签属性.
I am loading a xml file using com.databricks.spark.xml and i want to read a tag attribute using the sql context .
XML:
<Receipt>
<Sale>
<DepartmentID>PR</DepartmentID>
<Tax TaxExempt="false" TaxRate="10.25"/>
</Sale>
</Receipt>
加载文件的依据,
val df = sqlContext.read.format("com.databricks.spark.xml").option("rowTag","Receipt").load("/home/user/sale.xml");
df.registerTempTable("SPtable");
打印模式:
root
|-- Sale: array (nullable = true)
| |-- element: struct (containsNull = true)
| | |-- DepartmentID: long (nullable = true)
| | |-- Tax: string (nullable = true)
现在我想从Tax中提取标签属性TaxExempt.我尝试了以下代码,这给了我错误.
Now i want to extract the tag attribute TaxExempt from Tax.I tried the following code and it is giving me error .
val tax =sqlContext.sql("select Sale.Tax.TaxExempt from SPtable");
错误:
org.apache.spark.sql.AnalysisException: cannot resolve 'Sale.Tax[TaxExempt]' due to data type mismatch: argument 2 requires integral type, however, 'TaxExempt' is of string type.; line 1 pos 7
高度赞赏任何帮助.
推荐答案
数据框的第一个打印模式,在我的情况下,它的打印格式如下,其Spark xml版本为0.3.3
First print schema of the dataframe, in my case it is printed like below with spark xml version 0.3.3
|-- Sale: struct (nullable = true)
| |-- DepartmentID: string (nullable = true)
| |-- Tax: struct (nullable = true)
| | |-- #VALUE: string (nullable = true)
| | |-- @TaxExempt: boolean (nullable = true)
| | |-- @TaxRate: double (nullable = true)
然后在注册临时表之后,使用以下查询选择xml属性
Then use the below query to select xml attributes, after registering the temptable
sqlContext.sql(从模板中选择Sale.Tax ['@ TaxRate']作为TaxRate").show();
sqlContext.sql("select Sale.Tax['@TaxRate'] as TaxRate from temptable").show();
下面是结果
| TaxRate |
| TaxRate|
+ ----- +
+-----+
| 10.25 |
|10.25|
从0.4.1开始,我认为默认情况下属性以下划线(_)开头,在这种情况下,查询属性时只需使用_而不是 @ .
Starting from 0.4.1, i think the attributes by default starts with underscore(_), in this case just use _ instead of @ while querying attributes.
这篇关于使用sparkxml从xml提取标签属性的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!