Spark SQL sql("<​​某些聚合查询>").first().getDouble(0)给我不一致的结果 [英] Spark SQL sql(&quot;&lt;some aggregate query&gt;&quot;).first().getDouble(0) give me inconsistent results

查看:98
本文介绍了Spark SQL sql("<​​某些聚合查询>").first().getDouble(0)给我不一致的结果的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有下面的查询,该查询应该查找列值的平均值并返回一个单一数字的结果.

I have the below query which is supposed to find an average of the column values and return me the result which is a single number.

val avgVal = hiveContext.sql("select round(avg(amount), 4) from users.payment where dt between '2018-05-09' and '2018-05-09'").first().getDouble(0)

在此声明中,我面临行为不一致的情况.这通常会因以下错误而失败,但是通过Hive执行时会给出非NULL的结果."

I'm facing inconsistent behavior at this statement. This often fails with below error however it gives non-NULL results when executed through Hive."

18/05/10 11:01:12 ERROR ApplicationMaster: User class threw exception: java.lang.NullPointerException: Value at index 0 in null
java.lang.NullPointerException: Value at index 0 in null
    at org.apache.spark.sql.Row$class.getAnyValAs(Row.scala:475)
    at org.apache.spark.sql.Row$class.getDouble(Row.scala:243)
    at org.apache.spark.sql.catalyst.expressions.GenericRow.getDouble(rows.scala:192)

之所以使用HiveContext而不是SQLContext,是因为后者不支持我在代码中广泛使用的某些聚合函数.

The reason why I use HiveContext instead of SQLContext is that the later doesn't support some of the aggregation functions which I use extensively in my code.

能否请您帮助我理解为什么会出现此问题以及如何解决?

Could you please help me understand why this problem occurs and how to solve?

推荐答案

您需要将查询分为两个部分:

You need to divide query and get into two parts:

var result = hiveContext.sql("select round(avg(amount), 4) from users.payment where dt between '2018-05-09' and '2018-05-09'");
var first = result.first();
if (first != null && !first.isNullAt(0)) {
var avgVal = first.getDouble(0);
}

这样可以避免NPE.在列表和数组中也将需要它.

This would avoid NPE. This would also be needed in List and array.

对于插入或更新查询,您甚至需要用 try ... catch 块包围以捕获运行时异常.

For insert or update query, you even need to surround with try...catch block to catch runtime exception.

这篇关于Spark SQL sql("<​​某些聚合查询>").first().getDouble(0)给我不一致的结果的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆