如何过滤掉 Teradata SQL 中的 NaN FLOAT 值? [英] How do I filter out NaN FLOAT values in Teradata SQL?

查看:83
本文介绍了如何过滤掉 Teradata SQL 中的 NaN FLOAT 值?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

使用 Teradata 数据库,可以通过 Java 将 NaN、-Inf 和 +Inf 的值加载到 FLOAT 列中.不幸的是,一旦这些值进入表中,在编写需要过滤掉它们的 SQL 时就会变得很困难.没有 IsNaN() 函数,也不能CAST ('NaN' as FLOAT)"并使用相等比较.

With the Teradata database, it is possible to load values of NaN, -Inf, and +Inf into FLOAT columns through Java. Unfortunately, once those values get into the tables, they make life difficult when writing SQL that needs to filter them out. There is no IsNaN() function, nor can you "CAST ('NaN' as FLOAT)" and use an equality comparison.

我想做的是,

SELECT
  SUM(VAL**2)
FROM
  DTM
WHERE
  NOT ABS(VAL) > 1e+21 AND
  NOT VAL = CAST ('NaN' AS FLOAT)

但失败并出现错误 2620,格式或数据包含错误字符.",特别是在 CAST 上.我已经尝试过简单的... AND NOT VAL = 'NaN'",它也因类似的原因而失败(3535,字符串无法转换为数值.").我似乎无法弄清楚如何在 SQL 语句中表示 NaN.即使我可以在 SQL 语句中成功表示 NaN,我也会担心比较会失败.根据 IEEE 754 规范,NaN = NaN 应评估为假.我真正需要的是 IsNaN() 函数.然而这个功能似乎并不存在.

but that fails with error 2620, "The format or data contains a bad character.", specifically on the CAST. I've tried simply "... AND NOT VAL = 'NaN'", which also fails for a similar reason (3535, "A character string failed conversion to a numeric value."). I cannot seem to figure out how to represent NaN within the SQL statement. Even if I could represent NaN successfully in an SQL statement, I would be concerned that the comparison would fail. According to the IEEE 754 spec, NaN = NaN should evaluate to false. What I really seem to need is an IsNaN() function. Yet that function does not seem to exist.

推荐答案

我想出了一个解决方法,我会与你们中的那些正在寻找解决方案的人分享.但首先,对我来说很明显 Teradata 对 NaN 浮点值的处理是不完整的,我偶然发现的任何行为都可能是无意的,并且在不同版本之间不一致.所以我提供以下免费建议,没有任何形式的保证、承诺或责任.警告清空者.

I have figured out a work-around, and I'll share that with those of you searching for a solution. But first, it's obvious to me that Teradata's handling of NaN floating point values is incomplete and any behavior I've stumbled into figuring out is likely unintentional and inconsistent across different versions. So I offer the following free advice, with no guarantees, promises, or liabilities of any kind. Caveat emptor.

深入研究数据后,我发现如果我将 FLOAT 值转换为 VARCHAR(50),NaN 值会以 22 个星号的字符串形式出现 (**********************).我可以改为转换为 VARCHAR(1),而 NaN 以单个星号 (*) 的形式出现.这种比较并没有那么糟糕.

After drilling down into the data, I find that if I CAST the FLOAT value to a VARCHAR(50), the NaN values come out as a string of 22 asterisks (**********************). I can cast to a VARCHAR(1) instead, and the NaN comes out as a single asterisk (*). That comparison isn't that bad.

SELECT
  SUM(VAL**2)
FROM
  DTM
WHERE
  NOT CAST (VAL AS VARCHAR(1)) = '*' AND
  ABS(VAL) < 1.0e+21

我要注意两件事.

首先,我对 Teradata 版本的NOT ABS(VAL) > 1.0e+21"的最初愿望似乎在幕后转换为ABS(VAL) <= 1.0e+21".由于比较的相等性,这(有时)会失败 - 错误 [2651],涉及 VAL 的运算错误计算表达式".我假设从NOT >"到<="的转换正在发生,因为NOT ABS(VAL) >= 1.0e+21"工作正常(但看起来很难看).使用 "ABS(VAL) <1.0e+21" 可以很好地满足需求.

First, my original desire of "NOT ABS(VAL) > 1.0e+21" on my version of Teradata seems to get converted to "ABS(VAL) <= 1.0e+21" under the covers. This (sometimes) fails because of the equality potion of the comparison -- error [2651], "Operation Error computing expression involving VAL." I'm assuming that the conversion from "NOT >" to "<=" is happening, because "NOT ABS(VAL) >= 1.0e+21" works fine (but looks ugly). Using "ABS(VAL) < 1.0e+21" works just fine and captures the need.

第二,虽然我无法始终如一地重现它,但我在历史上有一些例子,其中ABS(VAL) <= 1.0e+21"确实有效地屏蔽了 NaN,有些地方没有,有些地方它因 [2651](见上文)而失败,但 SQL 和表中的数据是相同的.我能得出的唯一结论是,当涉及 NaN 时(或者它认为可能是),Teradata 在评估比较的方式上不一致.不同的放大器可能会以不同的方式处理它,但我不确定.也就是说,where 子句中的上述两个比较一致且有效地筛选出了 Inf 和 NaN 值.

Second, while I can't reproduce it consistently, I have examples in my history where "ABS(VAL) <= 1.0e+21" does effectively screen out NaN, some where it doesn't, and some where it fails with [2651] (see above), yet the SQL and the data in the tables were identical. The only conclusion I can come to is that Teradata is inconsistent in how it evaluates the comparison when NaN is involved (or it thinks it might be). It's possible that different amps are handling it differently, but I don't know for sure. That said, the above two comparisons in the where clause consistently and effectively screen out the Inf and NaN values.

这篇关于如何过滤掉 Teradata SQL 中的 NaN FLOAT 值?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆