Pyspark圆形功能出现问题 [英] Trouble With Pyspark Round Function

查看:76
本文介绍了Pyspark圆形功能出现问题的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

在使pyspark中的round函数工作时遇到一些麻烦-我有下面的代码块,在这里我试图将new_bid列舍入到小数点后两位,然后将该列重命名为bid-我正在导入pyspark.sql.functions AS func供参考,并使用其中包含的round函数:

Having some trouble getting the round function in pyspark to work - I have the below block of code, where I'm trying to round the new_bid column to 2 decimal places, and rename the column as bid afterwards - I'm importing pyspark.sql.functions AS func for reference, and using the round function contained within it:

output = output.select(col("ad").alias("ad_id"),
                       col("part").alias("part_id"),
                       func.round(col("new_bid"), 2).alias("bid"))

这里的new_bid列的类型为float-所生成的数据框没有新命名的bid列四舍五入到小数点后2位,而是8位或9位小数.

the new_bid column here is of type float - the resulting dataframe does not have the newly named bid column rounded to 2 decimal places as I am trying to do, rather it is still 8 or 9 decimal places out.

我已经尝试了各种方法,但是似乎无法使结果数据帧具有舍入后的值-任何指针将不胜感激!谢谢!

I've tried various things but can't seem to get the resulting dataframe to have the rounded value - any pointers would be greatly appreciated! Thanks!

推荐答案

以下是一些处理玩具数据的方法:

Here are a couple of ways to do it with some toy data:

spark.version
# u'2.2.0'

import pyspark.sql.functions as func

df = spark.createDataFrame(
        [(0.0, 0.2, 3.45631),
         (0.4, 1.4, 2.82945),
         (0.5, 1.9, 7.76261),
         (0.6, 0.9, 2.76790),
         (1.2, 1.0, 9.87984)],
         ["col1", "col2", "col3"])

df.show()
# +----+----+-------+ 
# |col1|col2|   col3|
# +----+----+-------+
# | 0.0| 0.2|3.45631| 
# | 0.4| 1.4|2.82945|
# | 0.5| 1.9|7.76261| 
# | 0.6| 0.9| 2.7679| 
# | 1.2| 1.0|9.87984| 
# +----+----+-------+

# round 'col3' in a new column:
df2 = df.withColumn("col4", func.round(df["col3"], 2)).withColumnRenamed("col4","new_col3")
df2.show()
# +----+----+-------+--------+ 
# |col1|col2|   col3|new_col3|
# +----+----+-------+--------+
# | 0.0| 0.2|3.45631|    3.46|
# | 0.4| 1.4|2.82945|    2.83|
# | 0.5| 1.9|7.76261|    7.76|
# | 0.6| 0.9| 2.7679|    2.77|
# | 1.2| 1.0|9.87984|    9.88|
# +----+----+-------+--------+

# round & replace existing 'col3':
df3 = df.withColumn("col3", func.round(df["col3"], 2))
df3.show()
# +----+----+----+ 
# |col1|col2|col3| 
# +----+----+----+ 
# | 0.0| 0.2|3.46| 
# | 0.4| 1.4|2.83| 
# | 0.5| 1.9|7.76| 
# | 0.6| 0.9|2.77| 
# | 1.2| 1.0|9.88| 
# +----+----+----+ 

这是个人喜好,但我不是colalias的忠实拥护者-我更喜欢withColumnwithColumnRenamed.不过,如果您想坚持使用selectcol,则应按照以下方法改编自己的代码段:

It's a personal taste, but I am not a great fan of either col or alias - I prefer withColumn and withColumnRenamed instead. Nevertheless, if you would like to stick with select and col, here is how you should adapt your own code snippet:

from pyspark.sql.functions import col

df4 = df.select(col("col1").alias("new_col1"), 
                col("col2").alias("new_col2"), 
                func.round(df["col3"],2).alias("new_col3"))
df4.show()
# +--------+--------+--------+ 
# |new_col1|new_col2|new_col3| 
# +--------+--------+--------+
# |     0.0|     0.2|    3.46|
# |     0.4|     1.4|    2.83|
# |     0.5|     1.9|    7.76|
# |     0.6|     0.9|    2.77|
# |     1.2|     1.0|    9.88|
# +--------+--------+--------+

PS始终提供一个示例数据和您的问题以及所有相关导入的期望结果是一个好主意-请参阅

PS It is always a good idea to provide some sample data and a desired outcome with your question, as well as any relevant imports - see How do I ask a good question?.

这篇关于Pyspark圆形功能出现问题的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆