Pyspark:如何向上或向下舍入(舍入到最近的) [英] Pyspark: how to round up or down (round to the nearest)

查看:26
本文介绍了Pyspark:如何向上或向下舍入(舍入到最近的)的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个 df 看起来像这样

I have a df that looks like this

TEST_schema = StructType([StructField("date", StringType(), True),\
                          StructField("col1", FloatType(), True),\
                          ])
TEST_data = [('2020-08-01',1.22),('2020-08-02',1.15),('2020-08-03',5.4),('2020-08-04',2.6),('2020-08-05',3.5),\
             ('2020-08-06',2.2),('2020-08-07',2.7),('2020-08-08',-1.6),('2020-08-09',1.3)]
rdd3 = sc.parallelize(TEST_data)
TEST_df = sqlContext.createDataFrame(TEST_data, TEST_schema)
TEST_df = TEST_df.withColumn("date",to_date("date", 'yyyy-MM-dd'))
TEST_df.show() 

+----------+-----+
|      date|col1 |
+----------+-----+
|2020-08-01| 1.22|
|2020-08-02| 1.15|
|2020-08-03| 5.4 |
|2020-08-04| 2.6 |
|2020-08-05| 3.5 |
|2020-08-06| 2.2 |
|2020-08-07| 2.7 |
|2020-08-08|-1.6 |
|2020-08-09| 1.3 |
+----------+-----+

逻辑:将 col1 舍入到最接近的值并返回为 整数ma​​x( rounded value , 0)

Logic : round col1 to the nearest and return as integer , and max( rounded value , 0)

结果 df 如下所示:

the resulted df looks like this:

+----------+----+----+
|      date|col1|want|
+----------+----+----+
|2020-08-01| 1.2|   1|
|2020-08-02| 1.1|   1|
|2020-08-03| 5.4|   5|
|2020-08-04| 2.6|   3|
|2020-08-05| 3.5|   4|
|2020-08-06| 2.2|   2|
|2020-08-07| 2.7|   3|
|2020-08-08|-1.6|   0|
|2020-08-09| 1.3|   1|
+----------+----+----+

推荐答案

首先,我在这里检查它是否小于零.这里我们使用pyspark函数中的when方法,首先我们检查列中的值是否小于零,如果是将使其为零,否则我们取列中的实际值然后转换为 intfrom pyspark.sql 导入函数为 F

First, here i am checking whether it's lessthan zero or not. Here we are using when method in pyspark functions, first we check whether the value in the column is lessthan zero, if it is will make it to zero, otherwise we take the actual value in the column then cast to int from pyspark.sql import functions as F

TEST_df.withColumn("want", F.bround(F.when(TEST_df["col1"] < 0, 0).otherwise(TEST_df["col1"])).cast("int")).show()
+----------+----+----+
|      date|col1|want|
+----------+----+----+
|2020-08-01|1.22|   1|
|2020-08-02|1.15|   1|
|2020-08-03| 5.4|   5|
|2020-08-04| 2.6|   3|
|2020-08-05| 3.5|   4|
|2020-08-06| 2.2|   2|
|2020-08-07| 2.7|   3|
|2020-08-08|-1.6|   0|
|2020-08-09| 1.3|   1|
+----------+----+----+

这篇关于Pyspark:如何向上或向下舍入(舍入到最近的)的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆