带有请求参数的Spark Read JSON [英] Spark Read JSON with Request Parameters

查看:306
本文介绍了带有请求参数的Spark Read JSON的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试从 IBM Cloud的DB2仓库文档.这要求我传递一个请求正文,其中必须提供useridpassword作为请求参数.

I'm trying to read a JSON response from IBM Cloud's DB2 Warehouse documentation. This requires me to pass a request body wherein I have to supply userid and password as request parameters.

要使用spark.read.json进行阅读,我没有发现可以提供请求参数的任何内容.无论如何,有什么可以使用的吗?

To read using spark.read.json, I did not find anything wherein request parameters could be supplied. Is there anyway using which we could do that?

通常,我会使用scalaj-httpplay-json库,例如,单独使用Scala读取JSON:

Usually I would read the JSON using Scala alone using scalaj-http and play-json libraries like:

val body = Json.obj(Constants.KEY_USERID -> userid, Constants.KEY_PASSWORD -> password)

val response = Json.parse(Http(url + Constants.KEY_ENDPOINT_AUTH_TOKENS)
    .header(Constants.KEY_CONTENT_TYPE , "application/json") 
    .header(Constants.KEY_ACCEPT , "application/json")   
    .postData(body.toString())  
    .asString.body)  

我的要求是我不能使用这两个库,而必须在spark框架中使用scala来完成.

My requirement is I cannot use these 2 libraries and have to do it using scala with the spark framework.

推荐答案

您不能直接将spark.read.json用于REST API数据提取.

You can not use spark.read.json directly for REST API data ingestion.

首先,发出您的API调用请求以获取响应数据,然后使用Spark将其转换为DataFrame.请注意,如果您的API是分页的,则需要进行多次调用才能获取所有数据.

First, make your API call request to get response data and then convert it to DataFrame with Spark. Note that if your API is paginated then, you'll need to make multiple calls to get all data.

对于您的示例,您需要调用身份验证终结点才能获取Bearer token,然后将其添加到请求标头中:

For your example, you need to call authentication endpoint in order to get a Bearer token and then add it to the request header :

Authorization: Bearer <your_token>

仅使用Scala(例如scala.io.Source.fromURL)即可完成所有这部分操作.

All this part could be done using only Scala (example scala.io.Source.fromURL).

获得response_data后,请使用spark将其转换为DF:

Once you get the response_data, use spark to convert it to DF :

import spark.implicits._
val df = spark.read.json(Seq(response_data).toDS)

这篇关于带有请求参数的Spark Read JSON的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆