Spark SQL 更改数字格式 [英] Spark SQL change format of the number
问题描述
在 show
命令之后 spark 打印以下内容:
After show
command spark prints the following:
+-----------------------+---------------------------+
|NameColumn |NumberColumn |
+-----------------------+---------------------------+
|name |4.3E-5 |
+-----------------------+---------------------------+
有没有办法将 NumberColumn
格式更改为类似 0.000043
的格式?
Is there a way to change NumberColumn
format to something like 0.000043
?
推荐答案
你可以使用 format_number
function 为
import org.apache.spark.sql.functions.format_number
df.withColumn("NumberColumn", format_number($"NumberColumn", 5))
这里 5 是您要显示的小数位
here 5 is the decimal places you want to show
正如您在上面的链接中看到的,format_number
函数返回一个 string 列
As you can see in the link above that the format_number
functions returns a string column
format_number(列 x, int d)
将数字列 x 格式化为类似 '#,###,###.##' 的格式,四舍五入到 d 位小数,并将结果作为字符串列返回.
format_number(Column x, int d)
Formats numeric column x to a format like '#,###,###.##', rounded to d decimal places, and returns the result as a string column.
如果你不需要 ,
你可以调用定义为
If your don't require ,
you can call regexp_replace
function which is defined as
regexp_replace(e 列,字符串模式,字符串替换)
用 rep 替换指定字符串值中与 regexp 匹配的所有子字符串.
regexp_replace(Column e, String pattern, String replacement)
Replace all substrings of the specified string value that match regexp with rep.
并将其用作
import org.apache.spark.sql.functions.regexp_replace
df.withColumn("NumberColumn", regexp_replace(format_number($"NumberColumn", 5), ",", ""))
因此逗号 (,
) 应该删除大数字.
Thus comma (,
) should be removed for large numbers.
这篇关于Spark SQL 更改数字格式的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!