加入两个连接键不同的DataFrames,只选择一些列 [英] Join two DataFrames where the join key is different and only select some columns

查看:19
本文介绍了加入两个连接键不同的DataFrames,只选择一些列的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我想做的是:

使用各自的ida_idb_idAB代码>.我想选择 A 中的所有列和 B

Join two DataFrames A and B using their respective id columns a_id and b_id. I want to select all columns from A and two specific columns from B

我尝试了类似下面用不同引号括起来的内容,但仍然无法正常工作.我觉得在 pyspark 中,应该有一个简单的方法来做到这一点.

I tried something like what I put below with different quotation marks but still not working. I feel in pyspark, there should have a simple way to do this.

A_B = A.join(B, A.id == B.id).select(A.*, B.b1, B.b2)

我知道你可以写

A_B = sqlContext.sql("SELECT A.*, B.b1, B.b2 FROM A JOIN B ON A.a_id = B.b_id")

这样做,但我更喜欢上面的伪代码.

to do this but I would like to do it more like the pseudo code above.

推荐答案

你的伪代码基本正确.如果 id 列存在于两个 DataFrames 中,这个稍微修改的版本将起作用:

Your pseudocode is basically correct. This slightly modified version would work if the id column existed in both DataFrames:

A_B = A.join(B, on="id").select("A.*", "B.b1", "B.b2")

来自 pyspark.sql.DataFrame.join():

如果 on 是一个字符串或一个字符串列表,表示连接的名称列,列必须存在于两侧,这将执行等值连接.

If on is a string or a list of strings indicating the name of the join column(s), the column(s) must exist on both sides, and this performs an equi-join.

由于键不同,您可以使用 withColumn()(或 withColumnRenamed())在两个 DataFrame 中创建一个具有相同名称的列:

Since the keys are different, you can just use withColumn() (or withColumnRenamed()) to created a column with the same name in both DataFrames:

A_B = A.withColumn("id", col("a_id")).join(B.withColumn("id", col("b_id")), on="id")\
    .select("A.*", "B.b1", "B.b2")

如果你的 DataFrames 有很长很复杂的名字,你也可以使用 alias() 来让事情变得更简单:

If your DataFrames have long complicated names, you could also use alias() to make things easier:

A_B = long_data_frame_name1.alias("A").withColumn("id", col("a_id"))\
    .join(long_data_frame_name2.alias("B").withColumn("id", col("b_id")), on="id")\
    .select("A.*", "B.b1", "B.b2")

这篇关于加入两个连接键不同的DataFrames,只选择一些列的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆