Apache Spark处理案例陈述 [英] Apache spark dealing with case statements
问题描述
我正在处理将SQL代码转换为PySpark代码并遇到一些SQL语句的问题.我不知道如何在pyspark中处理案件陈述?我打算创建一个RDD,然后使用rdd.map,然后进行一些逻辑检查.那是正确的方法吗?请帮忙!
I am dealing with transforming SQL code to PySpark code and came across some SQL statements. I don't know how to approach case statments in pyspark? I am planning on creating a RDD and then using rdd.map and then do some logic checks. Is that the right approach? Please help!
基本上,我需要遍历RDD或DF中的每一行,并且基于某种逻辑,我需要编辑列值之一.
Basically I need to go through each line in the RDD or DF and based on some logic I need to edit one of the column values.
case
when (e."a" Like 'a%' Or e."b" Like 'b%')
And e."aa"='BW' And cast(e."abc" as decimal(10,4))=75.0 Then 'callitA'
when (e."a" Like 'b%' Or e."b" Like 'a%')
And e."aa"='AW' And cast(e."abc" as decimal(10,4))=75.0 Then 'callitB'
else
'CallitC'
推荐答案
我对python不好.但是会尝试给出一些我在scala中所做的事情的提示.
Im not good in python. But will try to give some pointers of what I have done in scala.
问题:
rdd.map
,然后进行一些逻辑检查.那是正确的方法吗?
Question :
rdd.map
and then do some logic checks. Is that the right approach?
这是一种方法.
withColumn
is another approach
DataFrame.withColumn
方法支持添加新列或替换现有列相同名称的列.
DataFrame.withColumn
method in pySpark supports adding a new column or replacing existing columns of the same name.
在这种情况下,您必须处理Column
通过-spark udf或其他语法
In this context you have to deal with Column
via - spark udf or when otherwise syntax
from pyspark.sql import functions as F
df.select(df.name, F.when(df.age > 4, 1).when(df.age < 3, -1).otherwise(0)).show()
+-----+--------------------------------------------------------+
| name|CASE WHEN (age > 4) THEN 1 WHEN (age < 3) THEN -1 ELSE 0|
+-----+--------------------------------------------------------+
|Alice| -1|
| Bob| 1|
+-----+--------------------------------------------------------+
from pyspark.sql import functions as F
df.select(df.name, F.when(df.age > 3, 1).otherwise(0)).show()
+-----+---------------------------------+
| name|CASE WHEN (age > 3) THEN 1 ELSE 0|
+-----+---------------------------------+
|Alice| 0|
| Bob| 1|
+-----+---------------------------------+
您也可以使用udf代替when
otherwise
.
you can use udf instead of when
otherwise
as well.
这篇关于Apache Spark处理案例陈述的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!