Spark发送DataFrame作为HTTP Post请求的主体 [英] Spark Send DataFrame as body of HTTP Post request

查看:137
本文介绍了Spark发送DataFrame作为HTTP Post请求的主体的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个数据帧,希望将其作为 HTTP Post 请求的正文发送,最好的 Sparky 方法是什么?
如何控制多个HTTP请求?如果记录数变大,有什么方法可以将发送数据帧分成多个HTTP Post调用?

I have a data frame which I want to send it as the body of HTTP Post request, what's the best Sparky way to do it?
How can I control a number of HTTP requests? If the number of records gets bigger is there any way to split sending data frame into multiple HTTP Post call?

假设我的数据框是这样的:

let's say my data frame is like this:

+--------------------------------------+------------+------------+------------------+
|               user_id                |    city    | user_name  |   facebook_id    |
+--------------------------------------+------------+------------+------------------+
| 55c3c59d-0163-46a2-b495-bc352a8de883 | Toronto    | username_x | 0123482174440907 |
| e2ddv22d-4132-c211-4425-9933aa8de454 | Washington | username_y | 0432982476780234 |
+--------------------------------------+------------+------------+------------------+

我想在向该终结点的HTTP Post请求的正文中包含 user_id facebook_id localhost:8080/api/spark

I want to have user_id and facebook_id in the body of HTTP Post request to this endpoint localhost:8080/api/spark

推荐答案

您可以在Dataframe上使用 foreachPartition 方法实现此目的.我假设在这里您想并行地对数据帧中的每一行进行Http调用. foreachPartition 在Dataframe的每个分区上并行运行.如果您想在单个HTTP发布调用中将多行一起批处理,也可以通过将 makeHttpCall 方法的签名从 Row 更改为 Iterator [Row]

You can achieve this using foreachPartition method on a Dataframe. I am assuming here you want to make an Http Call for each row in the Dataframe in parallel. foreachPartition operates on each partition of the Dataframe in parallel. If you wanted to batch multiple rows together in a single HTTP post call that too is possible by changing the signature of the makeHttpCall method from Row to Iterator[Row]

  def test(): Unit = {
    val df: DataFrame = null
    df.foreachPartition(_.foreach(x => makeHttpCall(x)))
  }

  def makeHttpCall(row: Row) = {
    val json = Json.obj("user_name" -> row.getString(2), "facebook_id" -> row.getString(3))
    /**
      * code make Http call
      */
  }

用于发出大量Http请求 makeHttpCall .确保数据框中有足够数量的分区,以便每个分区都足够小以发出Http Post请求.

for making bulk Http request makeHttpCall. make sure you have sufficient number of partitions in the dataframe so that each partition is small enough to make your Http Post request.

import org.apache.spark.sql.{DataFrame, Row}
import play.api.libs.json.Json

  def test(): Unit = {
    val df: DataFrame = null
    df.foreachPartition(x => makeHttpCall(x))
  }

  def makeHttpCall(row: Iterator[Row]) = {
    val json = Json.arr(row.toSeq.map(x => Json.obj("user_name" -> x.getString(2), "facebook_id" -> x.getString(3))))
    /**
      * code make Http call
      */
  }

这篇关于Spark发送DataFrame作为HTTP Post请求的主体的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆