Spark多个条件加入 [英] Spark Multiple Conditions Join

查看:240
本文介绍了Spark多个条件加入的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在使用spark sql联接三个表,但是在多列条件下出现错误.

I am using spark sql to join three tables, however i get error with multiple column conditions.

test_table = (T1.join(T2,T1.dtm == T2.kids_dtm, "inner")
          .join(T3, T3.kids_dtm == T1.dtm
                and T2.room_id == T3.room_id
                and T2.book_id == T3.book_id, "inner"))

错误:

  Traceback (most recent call last):
  File "<stdin>", line 4, in <module>
  File "/opt/cloudera/parcels/CDH-5.7.0-1.cdh5.7.0.p0.45/lib/spark/python/pyspark/sql/column.py", line 447, in __nonzero__
    raise ValueError("Cannot convert column into bool: please use '&' for 'and', '|' for 'or', "
ValueError: Cannot convert column into bool: please use '&' for 'and', '|' for 'or', '~' for 'not' when building DataFrame boolean expressions.

我没有指定"and",而是尝试输入&".和&&" ,但这些都不起作用.任何帮助,将不胜感激.

Instead of specifying "and", i have tried putting "&" and "&&" , but none of these work. Any help would be appreciated.

推荐答案

Nvm,以下使用&"的作品和方括号:

Nvm, following works with use of "&" and brackets:

test_table = (T1.join(T2,T1.dtm == T2.kids_dtm, "inner")
      .join(T3, (T3.kids_dtm == T1.dtm)
            & (T2.room_id == T3.room_id)
            & (T2.book_id == T3.book_id), "inner"))

这篇关于Spark多个条件加入的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆