如何读取XML文件Azure Databricks Spark [英] How can I read a XML file Azure Databricks Spark

查看:107
本文介绍了如何读取XML文件Azure Databricks Spark的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我一直在MSDN论坛上寻找一些信息,但是找不到一个好的论坛/在Spark网站上阅读时,我暗示在这里我会有更多的机会. 因此,最重要的是,我想读取一个Blob存储,其中存在XML文件(所有小文件)的连续供稿,最后,我们将这些文件存储在Azure DW中. 使用Azure Databricks可以使用Spark和python,但是找不到读取" xml类型的方法.一些示例脚本使用了xml.etree.ElementTree库,但是我无法将其导入. 因此,对我有帮助的一个有益指导都将受到赞赏.

I was looking for some info on the MSDN forums but couldn't find a good forum/ While reading on the spark site I've the hint that here I would have better chances. So bottom line, I want to read a Blob storage where there is a contiguous feed of XML files, all small files, finaly we store these files in a Azure DW. Using Azure Databricks I can use Spark and python, but I can't find a way to 'read' the xml type. Some sample script used a library xml.etree.ElementTree but I can't get it imported.. So any help pushing me a a good direction is appreciated.

推荐答案

一种方法是使用databricks spark-xml库:

One way is to use the databricks spark-xml library :

  1. 将spark-xml库导入到您的工作空间中 https://docs.databricks.com/user-guide/library.html#create-a-library (在maven/spark包部分中搜索spark-xml并将其导入)
  2. 将库附加到群集 https://docs.databricks.com/user-guide/libraries.html#attach-a-library-to-a-cluster
  3. 在笔记本中使用以下代码读取xml文件,其中"note"是我的xml文件的根.
  1. Import the spark-xml library into your workspace https://docs.databricks.com/user-guide/libraries.html#create-a-library (search spark-xml in the maven/spark package section and import it)
  2. Attach the library to your cluster https://docs.databricks.com/user-guide/libraries.html#attach-a-library-to-a-cluster
  3. Use the following code in your notebook to read the xml file, where "note" is the root of my xml file.

xmldata = spark.read.format('xml').option("rootTag","note").load('dbfs:/mnt/mydatafolder/xmls/note.xml')

xmldata = spark.read.format('xml').option("rootTag","note").load('dbfs:/mnt/mydatafolder/xmls/note.xml')

示例:

这篇关于如何读取XML文件Azure Databricks Spark的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆