在Apache Spark Databricks上的Scala笔记本中,如何正确地将数组强制转换为类型为October(30,0)的类型? [英] In a Scala notebook on Apache Spark Databricks how do you correctly cast an array to type decimal(30,0)?
问题描述
我正在尝试将数组强制转换为Decimal(30,0),以便动态地在select中使用:
I am trying to cast an array as Decimal(30,0) for use in a select dynamically as:
WHERE array_contains(myArrayUDF(), someTable.someColumn)
但是,当使用以下方法进行投射时:
However when casting with:
val arrIds = someData.select("id").withColumn("id", col("id")
.cast(DecimalType(30, 0))).collect().map(_.getDecimal(0))
Databricks接受并签名,但是看起来已经是错误的:intArrSurrIds:Array [java.math.BigDecimal] = Array(2181890000000,...)//即,一个BigDecimal
Databricks accepts that and signature however already looks wrong to be: intArrSurrIds: Array[java.math.BigDecimal] = Array(2181890000000,...) // ie, a BigDecimal
这将导致以下错误:
SQL语句中的错误:AnalysisException:由于数据类型不匹配而无法解析:输入到array_contains函数的数组应该是跟在后面的元素类型相同的值,但是它是[array< decimal(38,18)>,十进制(30,0)]
Error in SQL statement: AnalysisException: cannot resolve.. due to data type mismatch: Input to function array_contains should have been array followed by a value with same element type, but it's [array<decimal(38,18)>, decimal(30,0)]
如何在Spark Databricks Scala笔记本中正确转换为十进制(30,0)而不是十进制(38,18)?
How do you correctly cast as decimal(30,0) in Spark Databricks Scala notebook instead of decimal(38,18) ?
任何帮助表示赞赏!
推荐答案
您可以使用以下代码将 arrIds
设置为 Array [Decimal]
:
You can make arrIds
an Array[Decimal]
using the code below:
import org.apache.spark.sql.functions.col
import org.apache.spark.sql.types.{Decimal, DecimalType}
val arrIds = someData.select("id")
.withColumn("id", col("id").cast(DecimalType(30, 0)))
.collect()
.map(row => Decimal(row.getDecimal(0), 30, 0))
但是,它不能解决您的问题,因为一旦您创建了用户定义的函数,就会失去精度和规模,这个答案
However, it will not solve your problem because you lose the precision and scale once you create your user defined function, as I explain in this answer
要解决您的问题,您需要将列 someTable.someColumn
转换为与UDF返回类型相同的精度和小数位数的Decimal.因此,您的 WHERE
子句应为:
To solve your problem, you need to cast the column someTable.someColumn
to Decimal with the same precision and scale than the UDF returned type. So your WHERE
clause should be:
WHERE array_contains(myArray(), cast(someTable.someColumn as Decimal(38, 18)))
这篇关于在Apache Spark Databricks上的Scala笔记本中,如何正确地将数组强制转换为类型为October(30,0)的类型?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!