如何评估作为列值的表达式? [英] How to evaluate expressions that are the column values?

查看:27
本文介绍了如何评估作为列值的表达式?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个包含数百万行的大数据框,如下所示:

I have a big dataframe with millions of rows as follows:

A    B    C    Eqn
12   3    4    A+B
32   8    9    B*C
56   12   2    A+B*C

如何计算Eqn列中的表达式?

How to evaluate the expressions in the Eqn column?

推荐答案

您可以创建一个自定义 UDF 来计算这些算术函数

You could create a custom UDF that evaluates these arithmetic functions

def evalUDF = udf((a:Int, b:Int, c:Int, eqn:String) => {
 val eqnParts = eqn
    .replace("A", a.toString)
    .replace("B", b.toString)
    .replace("C", c.toString)
    .split("""\b""")
    .toList

  val (sum, _) = eqnParts.tail.foldLeft((eqnParts.head.toInt, "")){
    case ((runningTotal, "+"), num) => (runningTotal + num.toInt, "") 
    case ((runningTotal, "-"), num) => (runningTotal - num.toInt, "") 
    case ((runningTotal, "*"), num) => (runningTotal * num.toInt, "") 
    case ((runningTotal, _), op) => (runningTotal, op)
  }

  sum
})

evalDf
  .withColumn("eval", evalUDF('A, 'B, 'C, 'Eqn))
  .show()

输出:

+---+---+---+-----+----+
|  A|  B|  C|  Eqn|eval|
+---+---+---+-----+----+
| 12|  3|  4|  A+B|  15|
| 32|  8|  9|  B*C|  72|
| 56| 12|  2|A+B*C| 136|
+---+---+---+-----+----+

正如你所看到的,这是有效的,但非常脆弱(空格、未知运算符等会破坏代码)并且不遵守操作顺序(否则最后一个应该是 92)

As you can see this works, but is very fragile (spaces, unknown operators, etc will break the code) and doesn't adhere to order of operations (otherwise the last should have been 92)

所以你可以自己编写所有这些,或者找到一些已经这样做的库(比如 https://gist.github.com/daixque/1610753)?

So you could write all that yourself or find some library that already does that perhaps (like https://gist.github.com/daixque/1610753)?

也许性能开销会非常大(尤其是当您开始使用递归解析器时),但至少您可以在数据帧上执行它而不是先收集它

Maybe the performance overhead will be very large (especially it you start using recursive parsers), But at least you can perform it on a dataframe instead of collecting it first

这篇关于如何评估作为列值的表达式?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆