在Emr群集上安装com.databricks.spark.xml [英] Install com.databricks.spark.xml on emr cluster

查看:69
本文介绍了在Emr群集上安装com.databricks.spark.xml的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

有人知道如何在EMR群集上安装 com.databricks.spark.xml 软件包.

Does anyone knows how do I do to install the com.databricks.spark.xml package on EMR cluster.

我成功连接到主emr,但是不知道如何在emr集群上安装软件包.

I succeeded to connect to master emr but don't know how to install packages on the emr cluster.

代码

sc.install_pypi_package("com.databricks.spark.xml")

推荐答案

在EMR主节点上:

cd /usr/lib/spark/jars
sudo wget https://repo1.maven.org/maven2/com/databricks/spark-xml_2.11/0.9.0/spark-xml_2.11-0.9.0.jar

请确保根据您的Spark版本和 https://中提供的准则选择正确的jar.github.com/databricks/spark-xml .

Make sure to select the correct jar according to your Spark version and the guidelines provided in https://github.com/databricks/spark-xml.

然后,启动Jupyter笔记本,您应该能够运行以下内容:

Then, launch your Jupyter notebook and you should be able to run the following:

df = spark.read.format('com.databricks.spark.xml').options(rootTag='objects').options(rowTag='object').load("s3://bucket-name/sample.xml")

这篇关于在Emr群集上安装com.databricks.spark.xml的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆