USQL - 如何使用 xml 提取器从 xml 文件中提取属性值 [英] USQL - How to extract the attribute value from xml file using xml extractor
本文介绍了USQL - 如何使用 xml 提取器从 xml 文件中提取属性值的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!
问题描述
如何使用自定义提取器使用 U-SQL 作业从 XML 文件中提取属性值.我可以从 XML 文件中提取子元素值.
示例 Xml 文件:<?xml version="1.0" encoding="UTF-8"?><用户><用户 ID="001"><名字>大卫</名字><LastName>bacham</LastName></用户><用户 ID="002"><FirstName>xyz</FirstName><LastName>abc</LastName></用户></用户>
我可以使用以下代码提取名字和姓氏.如何将 ID 值作为 csv 文件的一部分获取.
示例 U sql 作业:
参考汇编 [Microsoft.Analytics.Samples.Formats];@input = 提取名字字符串,姓氏字符串来自@"/USERS.xml"使用新的 Microsoft.Analytics.Samples.Formats.Xml.XmlExtractor("User",new SQL.MAP{{"名字","名字"},{"姓氏","姓氏"});@output = SELECT * FROM @input;输出@输出到/USERS.csv"使用 Outputters.Csv();
解决方案
您可以在 Databricks 中轻松完成此操作,例如
%sql创建表用户使用 com.databricks.spark.xml选项(路径/FileStore/tables/input42.xml",rowTag用户")
然后阅读表格:
%sql选择 *来自用户;
如果您必须使用 U-SQL 执行此操作,请使用
How to extract the attribute value from XML file using custom extractor using U-SQL job. I can able to extract the sub element values from XML file.
sample Xml File:
<?xml version="1.0" encoding="UTF-8"?>
<Users>
<User ID="001">
<FirstName>david</FirstName>
<LastName>bacham</LastName>
</User>
<User ID="002">
<FirstName>xyz</FirstName>
<LastName>abc</LastName>
</User>
</Users>
I can able to extract Firstname and lastname using the below code.How can i get ID value as a part of csv file.
Sample U sql Job:
REFERENCE ASSEMBLY [Microsoft.Analytics.Samples.Formats];
@input = EXTRACT
FirstName string,
LastName string
FROM @"/USERS.xml"
USING new Microsoft.Analytics.Samples.Formats.Xml.XmlExtractor("User",
new SQL.MAP<string, string> {
{"FirstName","FirstName"},
{"LastName","LastName"}
);
@output = SELECT * FROM @input;
OUTPUT @output
TO "/USERS.csv"
USING Outputters.Csv();
解决方案
You can do this easily in Databricks, eg
%sql
CREATE TABLE User
USING com.databricks.spark.xml
OPTIONS (path "/FileStore/tables/input42.xml", rowTag "User")
Then read the table:
%sql
SELECT *
FROM User;
If you must do it with U-SQL then using the XmlDomExtractor
from the Formats assembly worked for me:
REFERENCE ASSEMBLY [Microsoft.Analytics.Samples.Formats];
DECLARE @inputFile string = "/input/input40.xml";
@input =
EXTRACT
id string,
firstName string,
lastName string
FROM @inputFile
USING new Microsoft.Analytics.Samples.Formats.Xml.XmlDomExtractor(rowPath : "/Users/User",
columnPaths : new SQL.MAP<string, string>{
{ "@ID", "id" },
{ "FirstName", "firstName" },
{ "LastName", "lastName" }
}
);
@output =
SELECT *
FROM @input;
OUTPUT @output
TO "/output/output.csv"
USING Outputters.Csv();
My results:
这篇关于USQL - 如何使用 xml 提取器从 xml 文件中提取属性值的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!
查看全文