通过连接多个DataFrame来连接列 [英] concat columns by joining multiple DataFrames

查看:91
本文介绍了通过连接多个DataFrame来连接列的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有多个数据框,我需要连接地址和基于zip的条件.实际上我有sql查询,我需要将其转换为数据框连接我已经编写了UDF,它可以很好地处理多列以获取单个列的情况,

I have multiple dataframes I need to concat the addresses and zip based condition.Actually I had sql query which i need to convert to dataframe join I had written UDF which is working fine for concating multiple columns to obtain a single column,

val getConcatenated = udf( (first: String, second: String,third: String,fourth: String,five: String,six: String) => { first + "," + second + "," +third + "," +fourth + "," +five + "," +six } )

MySQl查询

  select 
CONCAT(al.Address1,',',al.Address2,',',al.Zip) AS AtAddress,
CONCAT(rl.Address1,',',rl.Address2,',',rl.Zip) AS RtAddress,
CONCAT(d.Address1,',',d.Address2,','d.Zip) AS DAddress,
CONCAT(s.Address1,',',s.Address2,',',s.Zip) AS SAGddress,
CONCAT(vl.Address1,',',vl.Address2,',vl.Zip) AS VAddress,
CONCAT(sg.Address1,',',sg.Address2,',sg.Zip) AS SAGGddress
FROM
si s inner join 
at a on s.cid = a.cid and s.cid =a.cid
inner join De d on s.cid = d.cid AND d.aid = a.aid 
inner join SGrpM sgm on s.cid = sgm.cid and s.sid =sgm.sid and sgm.status=1
inner join SeG sg on sgm.cid =sg.cid and sgm.gid =sg.gid 
inner join bd bu on s.cid = bu.cid and s.sid =bu.sid
inner join locas al on a.ALId = al.lid
inner join locas rl on a.RLId = rl.lid
inner join locas vl on a.VLId = vl.lid

我在加入数据框时遇到问题,这给了我空值.

I am facing issue when joining the dataframes which gives me null value.

val DS = DS_SI.join(at,Seq("cid","sid"),"inner").join(DS_DE,Seq("cid","aid"),"inner") .join(DS_SGrpM,Seq("cid","sid"),"inner").join(DS_SG,Seq("cid","gid"),"inner") .join(at,Seq("cid","sid"),"inner")
.join(DS_BD,Seq("cid","sid"),"inner").join(DS_LOCAS("ALId") <=> DS_LOCATION("lid") && at("RLId") <=> DS_LOCAS("lid")&& at("VLId") <=> DS_LOCAS("lid"),"inner")

我试图像上面那样加入我的dataFrames,这并没有给出正确的结果,然后我想通过添加列来进行合并.withColumn("AtAddress",getConcatenated()).withColumn("RtAddress",getConcatenated())....

Iam trying to join my dataFrames like above which is not giving be proper results and then I want to concat by adding the column .withColumn("AtAddress",getConcatenated()) .withColumn("RtAddress",getConcatenated())....

任何人都告诉我如何有效地实现这一目标,我是否正确地加入了数据框或对此有任何更好的方法....

Any one tell me how effectively we can achieve this and am I joining the dataframes correctly or any better approach for this .....

推荐答案

您可以使用 示例:

import org.apache.spark.sql.functions._
df.withColumn("title", concat_ws(", ", DS_DE("Address2"), DS_DE("Address2"), DS_DE("Zip")))

这篇关于通过连接多个DataFrame来连接列的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆