如何计算作为列值的表达式? [英] How to evaluate expressions that are the column values?

查看:47
本文介绍了如何计算作为列值的表达式?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个具有数百万行的大数据框,如下所示:

I have a big dataframe with millions of rows as follows:

A    B    C    Eqn
12   3    4    A+B
32   8    9    B*C
56   12   2    A+B*C

如何计算 Eqn 列中的表达式?

How to evaluate the expressions in the Eqn column?

推荐答案

您可以创建一个自定义UDF来评估这些算术函数

You could create a custom UDF that evaluates these arithmetic functions

def evalUDF = udf((a:Int, b:Int, c:Int, eqn:String) => {
 val eqnParts = eqn
    .replace("A", a.toString)
    .replace("B", b.toString)
    .replace("C", c.toString)
    .split("""\b""")
    .toList

  val (sum, _) = eqnParts.tail.foldLeft((eqnParts.head.toInt, "")){
    case ((runningTotal, "+"), num) => (runningTotal + num.toInt, "") 
    case ((runningTotal, "-"), num) => (runningTotal - num.toInt, "") 
    case ((runningTotal, "*"), num) => (runningTotal * num.toInt, "") 
    case ((runningTotal, _), op) => (runningTotal, op)
  }

  sum
})

evalDf
  .withColumn("eval", evalUDF('A, 'B, 'C, 'Eqn))
  .show()

输出:

+---+---+---+-----+----+
|  A|  B|  C|  Eqn|eval|
+---+---+---+-----+----+
| 12|  3|  4|  A+B|  15|
| 32|  8|  9|  B*C|  72|
| 56| 12|  2|A+B*C| 136|
+---+---+---+-----+----+

如您所见,它很有效,但是非常脆弱(空格,未知运算符等会破坏代码),并且不遵循操作顺序(否则最后一个应该是92)

As you can see this works, but is very fragile (spaces, unknown operators, etc will break the code) and doesn't adhere to order of operations (otherwise the last should have been 92)

因此,您可以自己编写所有内容,或者找到一些已经做到这一点的库(例如 https://gist.github.com/daixque/1610753 )?

So you could write all that yourself or find some library that already does that perhaps (like https://gist.github.com/daixque/1610753)?

也许性能开销会很大(尤其是您开始使用递归解析器),但是至少您可以在数据帧上执行它,而不是先收集它

Maybe the performance overhead will be very large (especially it you start using recursive parsers), But at least you can perform it on a dataframe instead of collecting it first

这篇关于如何计算作为列值的表达式?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆