连接两个数据帧,其中连接键不同,并且仅选择一些列 [英] Join two DataFrames where the join key is different and only select some columns

查看:405
本文介绍了连接两个数据帧,其中连接键不同,并且仅选择一些列的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我想做的是:

使用它们各自的ida_idb_id连接两个DataFrames AB.我想从A中选择所有列,并从B

Join two DataFrames A and B using their respective id columns a_id and b_id. I want to select all columns from A and two specific columns from B

我尝试了以下类似的操作,但使用了不同的引号,但仍然无法正常工作.我觉得在pyspark中,应该有一种简单的方法来做到这一点.

I tried something like what I put below with different quotation marks but still not working. I feel in pyspark, there should have a simple way to do this.

A_B = A.join(B, A.id == B.id).select(A.*, B.b1, B.b2)

我知道你会写

A_B = sqlContext.sql("SELECT A.*, B.b1, B.b2 FROM A JOIN B ON A.a_id = B.b_id")

要这样做,但我想更像上面的伪代码那样做.

to do this but I would like to do it more like the pseudo code above.

推荐答案

您的伪代码基本上是正确的.如果在两个DataFrame中都存在id列,则可以对该版本进行稍作修改:

Your pseudocode is basically correct. This slightly modified version would work if the id column existed in both DataFrames:

A_B = A.join(B, on="id").select("A.*", "B.b1", "B.b2")

对于

如果on是表示联接名称的字符串或字符串列表 列,则该列必须同时存在于两边,这样可以执行 等值联接.

If on is a string or a list of strings indicating the name of the join column(s), the column(s) must exist on both sides, and this performs an equi-join.

由于键是不同的,因此您可以使用withColumn()(或withColumnRenamed())在两个DataFrame中创建具有相同名称的列:

Since the keys are different, you can just use withColumn() (or withColumnRenamed()) to created a column with the same name in both DataFrames:

A_B = A.withColumn("id", col("a_id")).join(B.withColumn("id", col("b_id")), on="id")\
    .select("A.*", "B.b1", "B.b2")

如果您的DataFrame具有长而复杂的名称,则还可以使用alias()使事情变得简单:

If your DataFrames have long complicated names, you could also use alias() to make things easier:

A_B = long_data_frame_name1.alias("A").withColumn("id", col("a_id"))\
    .join(long_data_frame_name2.alias("B").withColumn("id", col("b_id")), on="id")\
    .select("A.*", "B.b1", "B.b2")

这篇关于连接两个数据帧,其中连接键不同,并且仅选择一些列的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆