创建一个 Spark udf 函数来迭代一个字节数组并将其转换为数字 [英] Create an Spark udf function to iterate over an Array of bytes and convert it to numeric

查看:25
本文介绍了创建一个 Spark udf 函数来迭代一个字节数组并将其转换为数字的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我在 spark (python) 中有一个包含字节数组的 Dataframe

I have a Dataframe with an array of bytes in spark (python)

DF.select(DF.myfield).show(1, False)
+----------------+                                                              
|myfield         |
+----------------+
|[00 8F 2B 9C 80]|
+----------------+

我正在尝试将此数组转换为字符串

i'm trying to convert this array to a string

'008F2B9C80'

然后到数值

int('008F2B9C80',16)/1000000
> 2402.0

我找到了一些 udf 样本,所以我已经可以像这样提取数组的一部分:

I have found some udf sample, so i already can extract a part of the array like this :

u = f.udf(lambda a: format(a[1],'x'))
DF.select(u(DF['myfield'])).show()
+------------------+                                                            
|<lambda>(myfield) |
+------------------+
|                8f|
+------------------+

现在如何遍历整个数组?是否可以在 udf 函数中执行我必须编写的所有操作?

Now how to iterate over the whole array ? Is it possible to do all the operations i have to code in the udf function ?

也许有最好的方法来做演员???

May be there is a best way to do the cast ???

感谢您的帮助

推荐答案

这里是 scala df 解决方案.您需要导入 scala.math.BigInteger

Here is the scala df solution. You need to import the scala.math.BigInteger

scala> val df = Seq((Array("00","8F","2B","9C","80"))).toDF("id")
df: org.apache.spark.sql.DataFrame = [id: array<string>]

scala> df.withColumn("idstr",concat_ws("",'id)).show
+--------------------+----------+
|                  id|     idstr|
+--------------------+----------+
|[00, 8F, 2B, 9C, 80]|008F2B9C80|
+--------------------+----------+


scala> import scala.math.BigInt
import scala.math.BigInt

scala> def convertBig(x:String):String = BigInt(x.sliding(2,2).map( x=> Integer.parseInt(x,16)).map(_.toByte).toArray).toString
convertBig: (x: String)String

scala> val udf_convertBig =  udf( convertBig(_:String):String )
udf_convertBig: org.apache.spark.sql.expressions.UserDefinedFunction = UserDefinedFunction(<function1>,StringType,Some(List(StringType)))

scala> df.withColumn("idstr",concat_ws("",'id)).withColumn("idBig",udf_convertBig('idstr)).show(false)
+--------------------+----------+----------+
|id                  |idstr     |idBig     |
+--------------------+----------+----------+
|[00, 8F, 2B, 9C, 80]|008F2B9C80|2402000000|
+--------------------+----------+----------+


scala>

scala 的 BigInteger 没有对应的 spark,所以我将 udf() 结果转换为字符串.

There is no spark equivalent for scala's BigInteger, so I'm converting the udf() result to string.

这篇关于创建一个 Spark udf 函数来迭代一个字节数组并将其转换为数字的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆