Google Spread Sheet Spark库 [英] Google Spread Sheet Spark library

查看:85
本文介绍了Google Spread Sheet Spark库的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在使用 https://github.com/potix2/spark-google-spreadsheets 库,用于在spark中读取电子表格文件.它在我的本地环境中运行良好.

I am using https://github.com/potix2/spark-google-spreadsheets library for reading the spread sheet file in spark. It is working perfectly in my local.

val df = sqlContext.read.
    format("com.github.potix2.spark.google.spreadsheets").
    option("serviceAccountId", "xxxxxx@developer.gserviceaccount.com").
    option("credentialPath", "/path/to/credentail.p12").
    load("<spreadsheetId>/worksheet1")

我创建了一个包含所有凭据的新程序集jar,并使用该jar读取文件.但是我在读取credentialPath文件时遇到问题.我尝试使用

I created a new assembly jar with included all the credentials and use that jar for reading the file. But I am facing issue with reading the credentialPath file. I tried using

getClass.getResourceAsStream("/resources/Aircraft/allAircraft.txt")

但是库仅支持绝对路径.请帮助我解决此问题.

But library only supports absolute path. Please help me to resolve this issue.

推荐答案

您可以使用 spark-submit SparkContext.addFile()分发凭据文件.如果要在工作程序节点中获取凭证文件的本地路径,则应调用 SparkFiles.get("credential filename").

You can use --files argument of spark-submit or SparkContext.addFile() to distribute a credential file. If you want to get a local path of the credential file in worker node, you should call SparkFiles.get("credential filename").

import org.apache.spark.SparkFiles

// you can also use `spark-submit --files=credential.p12`
sqlContext.sparkContext.addFile("credential.p12")
val credentialPath = SparkFiles.get("credential.p12")

val df = sqlContext.read.
    format("com.github.potix2.spark.google.spreadsheets").
    option("serviceAccountId", "xxxxxx@developer.gserviceaccount.com").
    option("credentialPath", credentialPath).
    load("<spreadsheetId>/worksheet1")

这篇关于Google Spread Sheet Spark库的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆