如何基于传递的参数从RDD中提取值 [英] How to extract values from an RDD based on the parameter passed

查看:34
本文介绍了如何基于传递的参数从RDD中提取值的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我已经创建了键值RDD,但是我不确定如何从中选择值.

I have created an key-value RDD , but i am not sure how to select the values from it.

val mapdf = merchantData_df.rdd.map(row => {
    val Merchant_Name = row.getString(0)
    val Display_Name = row.getString(1)
    val Store_ID_name = row.getString(2)
    val jsonString = s"{Display_Name: $Display_Name, Store_ID_name: $Store_ID_name}"
    (Merchant_Name, jsonString)
})

scala> mapdf.take(4).foreach(println)
(Amul,{Display_Name: Amul, Store_ID_name: null})
(Nestle,{Display_Name: Nestle, Store_ID_name: null})
(Ace,{Display_Name: Ace , Store_ID_name: null})
(Acme ,{Display_Name: Acme Fresh Market, Store_ID_name: Acme Markets})

现在假设我输入到函数的字符串将是 Amul ,我对 DisplayName的预期输出是Amul ,而另一个对 StoreID返回NULL 的函数>.

Now suppose my input string to a function will be Amul, My expected output for DisplayName is Amul and another function for StoreID to return NULL.

我该如何实现?

我不想为此目的使用SparkSQL

I don't want to use SparkSQL for this purpose

推荐答案

将输入数据框指定为

+-----------------+-----------------+-------------+
|Merchant_Name    |Display_Name     |Store_ID_name|
+-----------------+-----------------+-------------+
|Fitch            |Fitch            |null         |
|Kids             |Kids             |null         |
|Ace Hardware     |Ace Hardware     |null         |
| Fresh Market    |Acme  Market     |Acme Markets |
|Adventure        | Island          |null         |
+-----------------+-----------------+-------------+

您可以将字符串参数编写为

You can write a function with string parameter as

import org.apache.spark.sql.functions._
def filterRowsWithKey(key: String) = df.filter(col("Merchant_Name") === key).select("Display_Name", "Store_ID_name")

并以

filterRowsWithKey("Fitch").show(false)

会给你

+------------+-------------+
|Display_Name|Store_ID_name|
+------------+-------------+
|Fitch       |null         |
+------------+-------------+

我希望答案会有所帮助

已更新

如果要从函数返回第一行作为字符串,则可以

If you want first row as string to be returned from the function then you can do

import org.apache.spark.sql.functions._
def filterRowsWithKey(key: String) = df.filter(col("Merchant_Name") === key).select("Display_Name", "Store_ID_name").first().mkString(",")

println(filterRowsWithKey("Fitch"))

应该给您

Fitch,null

如果找不到传递的键,则上面的函数将引发异常,为安全起见,您可以使用以下函数

above function will throw exception if the key passed is not found so to be safe you can use following function

import org.apache.spark.sql.functions._
def filterRowsWithKey(key: String) = {
  val filteredDF = df.filter(col("Merchant_Name") === key).select("Display_Name", "Store_ID_name")
  if(filteredDF.count() > 0) filteredDF.first().mkString(",") else "key not found"
}

这篇关于如何基于传递的参数从RDD中提取值的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆