PySpark:将 DataFrame 列的值与另一个 DataFrame 列匹配 [英] PySpark: match the values of a DataFrame column against another DataFrame column

查看：105 发布时间：2021/11/12 5:44:05 python apache-spark pyspark

本文介绍了PySpark:将 DataFrame 列的值与另一个 DataFrame 列匹配的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

在 Pandas DataFrame 中，我可以使用 DataFrame.isin() 函数将列值与另一列进行匹配.

In Pandas DataFrame, I can use DataFrame.isin() function to match the column values against another column.

例如:假设我们有一个 DataFrame:

For example: suppose we have one DataFrame:

df_A = pd.DataFrame({'col1': ['A', 'B', 'C', 'B', 'C', 'D'], 
                     'col2': [1, 2, 3, 4, 5, 6]})
df_A

    col1  col2
0    A     1
1    B     2
2    C     3
3    B     4
4    C     5
5    D     6

和另一个数据帧:

df_B = pd.DataFrame({'col1': ['C', 'E', 'D', 'C', 'F', 'G', 'H'], 
                     'col2': [10, 20, 30, 40, 50, 60, 70]})
df_B

    col1  col2
0    C    10
1    E    20
2    D    30
3    C    40
4    F    50
5    G    60
6    H    70

我可以使用 .isin() 函数将 df_B 的列值与 df_A

I can use .isin() function to match the column values of df_B against the column values of df_A

例如:

df_B[df_B['col1'].isin(df_A['col1'])]

产量:

    col1  col2
0    C    10
2    D    30
3    C    40

PySpark DataFrame 中的等效操作是什么?

df_A = pd.DataFrame({'col1': ['A', 'B', 'C', 'B', 'C', 'D'], 
                     'col2': [1, 2, 3, 4, 5, 6]})
df_A = sqlContext.createDataFrame(df_A)

df_B = pd.DataFrame({'col1': ['C', 'E', 'D', 'C', 'F', 'G', 'H'], 
                     'col2': [10, 20, 30, 40, 50, 60, 70]})
df_B = sqlContext.createDataFrame(df_B)


df_B[df_B['col1'].isin(df_A['col1'])]

上面的 .isin() 代码给了我一个错误信息:

The .isin() code above gives me an error messages:

u'resolved attribute(s) col1#9007 missing from 
col1#9012,col2#9013L in operator !Filter col1#9012 IN 
(col1#9007);;\n!Filter col1#9012 IN (col1#9007)\n+- 
LogicalRDD [col1#9012, col2#9013L]\n'

PySpark:将 DataFrame 列的值与另一个 DataFrame 列匹配 [英] PySpark: match the values of a DataFrame column against another DataFrame column

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录关闭

PySpark:将 DataFrame 列的值与另一个 DataFrame 列匹配 [英] PySpark: match the values of a DataFrame column against another DataFrame column

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录 关闭

登录关闭