Apache Spark处理案例陈述 [英] Apache spark dealing with case statements

查看:83
本文介绍了Apache Spark处理案例陈述的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在处理将SQL代码转换为PySpark代码并遇到一些SQL语句的问题.我不知道如何在pyspark中处理案件陈述?我打算创建一个RDD,然后使用rdd.map,然后进行一些逻辑检查.那是正确的方法吗?请帮忙!

I am dealing with transforming SQL code to PySpark code and came across some SQL statements. I don't know how to approach case statments in pyspark? I am planning on creating a RDD and then using rdd.map and then do some logic checks. Is that the right approach? Please help!

基本上,我需要遍历RDD或DF中的每一行,并且基于某种逻辑,我需要编辑列值之一.

Basically I need to go through each line in the RDD or DF and based on some logic I need to edit one of the column values.

     case  
               when (e."a" Like 'a%' Or e."b" Like 'b%') 
                And e."aa"='BW' And cast(e."abc" as decimal(10,4))=75.0 Then 'callitA'

               when (e."a" Like 'b%' Or e."b" Like 'a%') 
                And e."aa"='AW' And cast(e."abc" as decimal(10,4))=75.0 Then 'callitB'

else

'CallitC'

推荐答案

我对python不好.但是会尝试给出一些我在scala中所做的事情的提示.

Im not good in python. But will try to give some pointers of what I have done in scala.

问题:rdd.map,然后进行一些逻辑检查.那是正确的方法吗?

Question : rdd.map and then do some logic checks. Is that the right approach?

这是一种方法.

withColumn是另一种方法

withColumn is another approach

DataFrame.withColumn方法支持添加新列或替换现有列相同名称的列.

DataFrame.withColumn method in pySpark supports adding a new column or replacing existing columns of the same name.

在这种情况下,您必须处理Column 通过-spark udf或其他语法

In this context you have to deal with Column via - spark udf or when otherwise syntax

例如:

from pyspark.sql import functions as F
df.select(df.name, F.when(df.age > 4, 1).when(df.age < 3, -1).otherwise(0)).show()


+-----+--------------------------------------------------------+
| name|CASE WHEN (age > 4) THEN 1 WHEN (age < 3) THEN -1 ELSE 0|
+-----+--------------------------------------------------------+
|Alice|                                                      -1|
|  Bob|                                                       1|
+-----+--------------------------------------------------------+


from pyspark.sql import functions as F
df.select(df.name, F.when(df.age > 3, 1).otherwise(0)).show()

+-----+---------------------------------+
| name|CASE WHEN (age > 3) THEN 1 ELSE 0|
+-----+---------------------------------+
|Alice|                                0|
|  Bob|                                1|
+-----+---------------------------------+

您也可以使用udf代替when otherwise.

you can use udf instead of when otherwise as well.

这篇关于Apache Spark处理案例陈述的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆