在星火RDD地图格式化嵌套地图 [英] formatting a nested map in a map in Spark rdd
问题描述
我有一个文本文件,它看起来像这样:
I have a text file that looks like this:
1007|CNSMR_CARD|1|1|1|1|1|1|1
1007|CNSMR_LOCL_IM_CHKG|1|1|1|1|1|1|1
1009|CNSMR_DIRCT_CHKG|4|4|4|4|4|1|1
1009|CNSMR_DIRCT_OTHR|4|4|4|4|4|1|1
1009|CNSMR_DIRCT_SAVG|4|4|4|4|4|1|1
1009|CNSMR_LOCL_IM_CHKG|4|4|4|4|4|1|1
1010|CNSMR_LOCL_IM_CHKG|1|1|1|1|1|1|1
1012|CNSMR_LOCL_IM_CHKG|1|1|1|1|1|1|1
1033|CNSMR_DIRCT_CHKG|1|1|1|1|2|1|1
然后创建一个这样的RDD:
Then create an rdd like this:
val custFile = sc.textFile("custInfo.txt").map(line => line.split('|'))
val custPrd = custFile.map(a => (a(0), ((a(1)), Map("PRVCY_MAIL: " -> a(2), "PRVCY_CALL: " -> a(3), "PRVCY_SWP: " -> a(4), "PRVCY_FCRA: " -> a(5), "PRVCY_GLBA: " -> a(6), "PRVCY_PIPE: " -> a(7), "PRVCY_AFIL: " -> a(8)))))
val custGrp = custPrd.groupByKey
val custPrdGrp = custGrp.map{case (k, vals) => {val valsString = vals.mkString(", "); s"'$k' | {$valsString}" }}
这使我回这个结果:
which gives me back this results:
res4: Array[String] = Array(
'106' | {(CNSMR_LOCL_IM_CHKG,Map(PRVCY_MAIL: -> 4, PRVCY_GLBA: -> 4, PRVCY_FCRA: -> 4, PRVCY_AFIL: -> 1, PRVCY_PIPE: -> 1, PRVCY_CALL: -> 4, PRVCY_SWP: -> 4))},
'107' | {(CNSMR_DIRCT_CHKG,Map(PRVCY_MAIL: -> 1, PRVCY_GLBA: -> 1, PRVCY_FCRA: -> 1, PRVCY_AFIL: -> 1, PRVCY_PIPE: -> 1, PRVCY_CALL: -> 4, PRVCY_SWP: -> 1)), (CNSMR_DIRCT_SAVG,Map(PRVCY_MAIL: -> 1, PRVCY_GLBA: -> 1, PRVCY_FCRA: -> 1, PRVCY_AFIL: -> 1, PRVCY_PIPE: -> 1, PRVCY_CALL: -> 4, PRVCY_SWP: -> 1))}
但我想一个数组是这样的:
but what I want an array like this:
'106' | {'CNSMR_LOCL_IM_CHKG': {PRVCY_MAIL: 4, PRVCY_GLBA: 4, PRVCY_FCRA: 4, PRVCY_AFIL: 1, PRVCY_PIPE: 1, PRVCY_CALL: 4, PRVCY_SWP: 4}}
'107' | {'CNSMR_DIRCT_CHKG': {PRVCY_MAIL: 1, PRVCY_GLBA: 1, PRVCY_FCRA: 1, PRVCY_AFIL: 1, PRVCY_PIPE: 1, PRVCY_CALL: 4, PRVCY_SWP: 1}}, {'CNSMR_DIRCT_SAVG': {PRVCY_MAIL: 1, PRVCY_GLBA: 1, PRVCY_FCRA: 1, PRVCY_AFIL: 1, PRVCY_PIPE: 1, PRVCY_CALL: 4, PRVCY_SWP: 1}}
第二个映射格式,我想是这样的,但得到了一个错误:
to format the second map, I tried something like this but got an error:
val custPrdGrp = custGrp.map{case (k, vals) => {val valsString = vals map { case (val1, val2, val3, val4, val5, val6, val7) => {val sets = vals.mkString(", "); s"$val1, $val2, $val3, $val4, $val5, $val6, $val7"}}.mkString(", "); s"'$k' | {$valsString}" }}
<console>:27: error: missing parameter type for expanded function
The argument types of an anonymous function must be fully known. (SLS 8.5)
Expected type was: ?
val custPrdGrp = custGrp.map{case (k, vals) => {val valsString = vals map { case (val1, val2, val3, val4, val5, val6, val7) => {val sets = vals.mkString(", "); s"$val1, $val2, $val3, $val4, $val5, $val6, $val7"}}.mkString(", "); s"'$k' | {$valsString}" }}
^
你如何在星火地图格式化嵌套的地图吗?
How do you format a nested map in a map in Spark?
推荐答案
让我们先从简单的地图[字符串,字符串]
val m: Map[String,String] = Map(
"PRVCY_MAIL" -> "1", "PRVCY_GLBA" -> "1",
"PRVCY_FCRA" -> "1", "PRVCY_AFIL" -> "1",
"PRVCY_PIPE" -> "1", "PRVCY_CALL" -> "1",
"PRVCY_SWP" -> "1"
)
请注意,我下降的格式化元素,如:
和whitscapes。它不要求买在我看来,干净多了。
Note that I dropped formatting elements like :
and whitscapes. It is not required buy in my opinion much cleaner.
现在,我们可以定义两个小帮手:
Now we can define two small helper:
def formatMap(sep: String = ": ",
left: String = "{", right: String = "}")(m: Map[String, String]) = {
val items = m.toSeq.map{case (k, v) => s"$k$sep$v"}.mkString(", ")
s"$left$items$right"
}
让检查它是如何工作的。
Lets check how it works
scala> formatMap()(m)
res50: String = {PRVCY_CALL: 1, PRVCY_SWP: 1, PRVCY_MAIL: 1, PRVCY_AFIL: 1, PRVCY_FCRA: 1, PRVCY_PIPE: 1, PRVCY_GLBA: 1}
scala> formatMap(sep="=")(m)
res51: String = {PRVCY_CALL=1, PRVCY_SWP=1, PRVCY_MAIL=1, PRVCY_AFIL=1, PRVCY_FCRA=1, PRVCY_PIPE=1, PRVCY_GLBA=1}
scala> formatMap(sep="|", left="[", right="]")(m)
res52: String = [PRVCY_CALL|1, PRVCY_SWP|1, PRVCY_MAIL|1, PRVCY_AFIL|1, PRVCY_FCRA|1, PRVCY_PIPE|1, PRVCY_GLBA|1]
现在可以收拾你已经拥有。首先让提取物名称:
Now lets clean up what you already have. First lets extract names:
val keys = Array(
"PRVCY_MAIL", "PRVCY_CALL", "PRVCY_SWP", "PRVCY_FCRA",
"PRVCY_GLBA", "PRVCY_PIPE", "PRVCY_AFIL"
)
改写图:
val custPrd = custFile.map(a => (a(0), (a(1), keys.zip(a.drop(2)).toMap)))
集团作为前
val custGrp = custPrd.groupByKey
和图
val custPrdGrp = custGrp.map{case (k, vals) => {
val valsString = vals.map{case (id, m) => {
val fmtM = formatMap()(m)
s"'$id': $fmtM"
}}.mkString(", ")
s"'$k' | {$valsString}"
}}
快速检查:
scala> custPrdGrp.first
res56: String = '1012' | {'CNSMR_LOCL_IM_CHKG': {PRVCY_CALL: 1, PRVCY_SWP: 1, PRVCY_MAIL: 1, PRVCY_AFIL: 1, PRVCY_FCRA: 1, PRVCY_PIPE: 1, PRVCY_GLBA: 1}}
您也许应该提取上述类似的方式,我为 formatMap
这样做,但我会离开它作为一个练习对你使用匿名函数。
You should probably extract anonymous function used above in a similar way I've done for formatMap
but I'll leave it as an exercise for you.
这篇关于在星火RDD地图格式化嵌套地图的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!