格式化火花RDD一个(K(V,W))对 [英] format a (K, (v, w)) pair in spark rdd
本文介绍了格式化火花RDD一个(K(V,W))对的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!
问题描述
我有一个这样的RDD:
VAL custFile = sc.textFile(custInfo.txt),地图(行=> line.split('|'))。VAL custPrd = custFile.map(一个= GT;(A(0),((一(1))中,(a(2),(3),(4),(5),(6)中,(7),(8)))))VAL custGrp = custPrd.groupByKeycustGrp.saveAsTextFile(custinfo2)
这产生这样的:
(1104,CompactBuffer((S_SAVG,(1,1,1,1,1,1,1)),(CN_SAVG,(4,4,1,1,4 ,1,1))))
我怎么可以用这样的:
custPrdGrp.map {情况下(K,丘壑)=> {VAL valsString = vals.mkString(,);的{$ķ:,{$ valsString}}}}
格式化(K(V,W))对...我试过,但得到了一个错误:
VAL custPrdRep = custPrdGrp.map({情况下(K(V,W))=> {VAL valsString = v.mkString(); VAL valsPrvcy = W .mkString(,); S'$ {ķ}'| [$ valsString]}})
<&控制台GT;:27:错误:构造函数不能被实例预期的类型;
实测值:(T1,T2)
要求:可迭代[(字符串,(字符串,字符串,字符串,字符串,字符串,字符串,字符串))
VAL custPrdRep = custPrdGrp.map({壳体(K,(V,W))=> {VAL valsString = v.mkString(,); VAL valsPrvcy = w.mkString(,);的'$ [KP}'| [$ valsString]}})
^
<&控制台GT;:27:错误:未找到:值v
VAL custPrdRep = custPrdGrp.map({壳体(K,(V,W))=> {VAL valsString = v.mkString(,); VAL valsPrvcy = w.mkString(,);的'$ [KP}'| [$ valsString]}})
^
<&控制台GT;:27:错误:未找到:值w
VAL custPrdRep = custPrdGrp.map({壳体(K,(V,W))=> {VAL valsString = v.mkString(,); VAL valsPrvcy = w.mkString(,);的'$ [KP}'| [$ valsString]}})
我愿意数组是这样的:
('1104'| {'S_SAVG:{A:'1',B:'1',C:'1',D:1,E:' 1'中,f:'1',G:'1'},'CN_SAVG':{一个:'4',b:'4',C:'1',D:1,例如:'4'中,f:'1',G:'1'}})
解决方案
那么,有相当多的细节在这里,但这样的事情应该工作:
VAL键=名单(A,B,C,D,E,F,G)custGrp.map {情况下(K,丘壑)=> {
VAL valsString =瓦尔斯地图{
案例(VAL1,将val2)=> {
缬氨酸对=键
//创建someLetter:someNumber'双
.ZIP(val2.productIterator.map {情况下(X:字符串)=> X} .toSeq)
.MAP {情况下(K,V)=>的$ K:'$ V'}
//加入到一个字符串
.mkString(,)
//添加钥匙
的'$ VAL1:{$}对
}
}
//上述合并
VAL valsComb = valsString.mkString(,)
//创建最终的字符串
的('$ K'| {$ valsComb})
}}
您可以通过首先建立一个正确的数据结构简化事情。例如,通过使用元组的地图,而不是:
地图(S_SAVG - >地图(A - >中1,B - >中1,...),... )
I have an rdd like this:
val custFile = sc.textFile("custInfo.txt").map(line => line.split('|'))
val custPrd = custFile.map(a => (a(0), ((a(1)), (a(2), a(3), a(4), a(5), a(6), a(7), a(8)))))
val custGrp = custPrd.groupByKey
custGrp.saveAsTextFile("custinfo2")
that produces this:
(1104,CompactBuffer((S_SAVG,(1,1,1,1,1,1,1)), (CN_SAVG,(4,4,1,1,4,1,1))))
how can I use something like this:
custPrdGrp.map{case (k, vals) => {val valsString = vals.mkString(", "); s"{$k:, {$valsString}}" }}
to format a (k, (v, w)) pair...I tried this but got an error:
val custPrdRep = custPrdGrp.map({case (k, (v, w)) => {val valsString = v.mkString(", "); val valsPrvcy = w.mkString(", "); s"'${k}'| [$valsString]" }})
<console>:27: error: constructor cannot be instantiated to expected type;
found : (T1, T2)
required: Iterable[(String, (String, String, String, String, String, String, String))]
val custPrdRep = custPrdGrp.map({case (k, (v, w)) => {val valsString = v.mkString(", "); val valsPrvcy = w.mkString(", "); s"'${k}'| [$valsString]" }})
^
<console>:27: error: not found: value v
val custPrdRep = custPrdGrp.map({case (k, (v, w)) => {val valsString = v.mkString(", "); val valsPrvcy = w.mkString(", "); s"'${k}'| [$valsString]" }})
^
<console>:27: error: not found: value w
val custPrdRep = custPrdGrp.map({case (k, (v, w)) => {val valsString = v.mkString(", "); val valsPrvcy = w.mkString(", "); s"'${k}'| [$valsString]" }})
I'd want the array to look like this:
('1104'|{'S_SAVG': {a: '1', b: '1', c: '1', d: '1', e: '1', f: '1', g: '1'}, 'CN_SAVG': {a: '4', b: '4', c: '1', d: '1', e: '4', f: '1', g: '1'}})
解决方案
Well, there is quite a lot of details here but something like this should work:
val keys = List("a", "b", "c", "d", "e", "f", "g")
custGrp.map{case (k, vals) => {
val valsString = vals map {
case (val1, val2) => {
val pairs = keys
// Create someLetter: 'someNumber' pairs
.zip(val2.productIterator.map{case (x: String) => x}.toSeq)
.map{case (k, v) => s"$k: '$v'"}
// Join into a single string
.mkString(", ")
// Add "key"
s"'$val1': {$pairs}"
}
}
// Combine above
val valsComb = valsString.mkString(", ")
// Create final string
s"('$k'|{$valsComb})"
}}
You could simplify things by creating a correct data structure in the first place. For example by using Maps instead of tuples:
Map("S_SAVG" -> Map("a" -> "1", "b" -> "1", ...), ...)
这篇关于格式化火花RDD一个(K(V,W))对的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!
查看全文