Apache Spark处理案例陈述 [英] Apache spark dealing with case statements

查看：83 发布时间：2020/9/4 0:47:19 apache-spark pyspark spark-dataframe rdd pyspark-sql

本文介绍了Apache Spark处理案例陈述的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我正在处理将SQL代码转换为PySpark代码并遇到一些SQL语句的问题.我不知道如何在pyspark中处理案件陈述?我打算创建一个RDD，然后使用rdd.map，然后进行一些逻辑检查.那是正确的方法吗?请帮忙！

I am dealing with transforming SQL code to PySpark code and came across some SQL statements. I don't know how to approach case statments in pyspark? I am planning on creating a RDD and then using rdd.map and then do some logic checks. Is that the right approach? Please help!

基本上，我需要遍历RDD或DF中的每一行，并且基于某种逻辑，我需要编辑列值之一.

Basically I need to go through each line in the RDD or DF and based on some logic I need to edit one of the column values.

     case  
               when (e."a" Like 'a%' Or e."b" Like 'b%') 
                And e."aa"='BW' And cast(e."abc" as decimal(10,4))=75.0 Then 'callitA'

               when (e."a" Like 'b%' Or e."b" Like 'a%') 
                And e."aa"='AW' And cast(e."abc" as decimal(10,4))=75.0 Then 'callitB'

else

'CallitC'

推荐答案

我对python不好.但是会尝试给出一些我在scala中所做的事情的提示.

Im not good in python. But will try to give some pointers of what I have done in scala.

问题:rdd.map，然后进行一些逻辑检查.那是正确的方法吗?

Question : rdd.map and then do some logic checks. Is that the right approach?

这是一种方法.

withColumn是另一种方法

withColumn is another approach

DataFrame.withColumn方法支持添加新列或替换现有列相同名称的列.

DataFrame.withColumn method in pySpark supports adding a new column or replacing existing columns of the same name.

在这种情况下，您必须处理Column 通过-spark udf或其他语法

In this context you have to deal with Column via - spark udf or when otherwise syntax

例如:

from pyspark.sql import functions as F
df.select(df.name, F.when(df.age > 4, 1).when(df.age < 3, -1).otherwise(0)).show()


+-----+--------------------------------------------------------+
| name|CASE WHEN (age > 4) THEN 1 WHEN (age < 3) THEN -1 ELSE 0|
+-----+--------------------------------------------------------+
|Alice|                                                      -1|
|  Bob|                                                       1|
+-----+--------------------------------------------------------+


from pyspark.sql import functions as F
df.select(df.name, F.when(df.age > 3, 1).otherwise(0)).show()

+-----+---------------------------------+
| name|CASE WHEN (age > 3) THEN 1 ELSE 0|
+-----+---------------------------------+
|Alice|                                0|
|  Bob|                                1|
+-----+---------------------------------+

您也可以使用udf代替when otherwise.

you can use udf instead of when otherwise as well.

这篇关于Apache Spark处理案例陈述的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

Apache Spark处理案例陈述 [英] Apache spark dealing with case statements

问题描述

推荐答案

问题:`rdd.map`，然后进行一些逻辑检查.那是正确的方法吗?

Question : `rdd.map` and then do some logic checks. Is that the right approach?

相关文章

其他开发最新文章

热门教程

热门工具

登录关闭

Apache Spark处理案例陈述 [英] Apache spark dealing with case statements

问题描述

推荐答案

问题:rdd.map，然后进行一些逻辑检查.那是正确的方法吗?

Question : rdd.map and then do some logic checks. Is that the right approach?

相关文章

其他开发最新文章

热门教程

热门工具

登录 关闭

问题:`rdd.map`，然后进行一些逻辑检查.那是正确的方法吗?

Question : `rdd.map` and then do some logic checks. Is that the right approach?

登录关闭