BigQuery查询/结果中的NULL值行为 [英] NULL values behaviour in BigQuery Queries/Results

查看:229
本文介绍了BigQuery查询/结果中的NULL值行为的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

想询问/讨论BigQuery中的空值行为.

Wanted to ask / discuss with you about null values behaviour in BigQuery.

我注意到,过滤掉NULLABLE列中的实值将导致同时过滤出所请求的值和NULL值.

I have noticed that filtering out real values in a NULLABLE column, will results filtering out both the value requested and NULL values.

执行以下查询,例如:

select * from
(select NULL as some_nullable_col, "name1" as name),
(select 4 as some_nullable_col, "name2" as name),
(select 1 as some_nullable_col, "name3" as name),
(select 7 as some_nullable_col, "name4" as name),
(select 3 as some_nullable_col, "name5" as name)
--WHERE some_nullable_col != 3

所有结果均按预期返回

然后:

select * from
(select NULL as some_nullable_col, "name1" as name),
(select 4 as some_nullable_col, "name2" as name),
(select 1 as some_nullable_col, "name3" as name),
(select 7 as some_nullable_col, "name4" as name),
(select 3 as some_nullable_col, "name5" as name)
WHERE some_nullable_col != 3

将省略2列.值3和null.

will omit 2 columns. the value 3 and null.

我想发生这种情况是因为BigQuery不会索引空值/不会在where子句上扫描空值以提高效率,但这也会带来麻烦:

I guess this happens because BigQuery won't index null values / won't scan null values on where clause for efficiency, but it also brings troubles:

每次我在可为空的列上进行过滤时,该过滤器的外观将如下所示: WHERE some_nullable_col != 3 OR some_nullable_col IS NULL

Each time I filter on a nullable column, the filter will look like WHERE some_nullable_col != 3 OR some_nullable_col IS NULL

这显然不太舒服.

只想得到一个解释/BigQuery的路线图是否可以解决此问题?

Just wanted to get an explanation / does BigQuery's roadmap offers a fix for this issue?

推荐答案

是的,没错,NULL与some_nullable_col != 3之类的比较器不匹配.让我解释一下原因.

Yes, you are right that NULL is not matched to the comparator like some_nullable_col != 3. Let me explain the reason why.

Google正在使用键值存储作为BigQuery的基础数据存储.与传统的关系数据库不同,数据按行和字段分段,并存储在许多不同的位置.如果数据为NULL,则BigQuery认为该数据不存在,因此不会向数据存储写入任何内容.这样,除了"IS NULL"以外,该字段将永远不会与任何比较器匹配.这是设计使然,Google目前还没有任何计划来改变其工作方式.

Google are using a key value store as an underlying data storage for BigQuery. Unlike traditional relational database, data are fragmented by the row and fields and stored into many different locations. If the data is NULL, BigQuery consider the data does not exist, hence nothing is written to the data storage. As such, that filed will never be matched with any comparators except for "IS NULL". This is by design and Google don't have any plans to change the way it works at the moment.

此问题的解决方法是为这些字段设置特殊值.例如,如果该字段的类型为字符串,则可以使用空字符串"而不是NULL.如果字段类型是非负整数,则可以使用"-1"作为特殊值.我了解这并不是真正的最佳选择,在许多情况下,在查询中添加"IS NULL"语句可能会更好.这只是给您另一个选择.

The workaround for this is to set the special value for those fields. For example, if the type of that field is string, then you can use the null string "" instead of NULL. If the field type is non-negative integer, you can use "-1" as a special value. I understand that this is not really optimal, and it could be better to add "IS NULL" statement in your query in many situation. This is just to give you another option.

顺便说一句,我在MySQL实例上尝试了类似的操作,其行为方式与BigQuery相同.即,查询不会返回带有"=!"的NULL记录.比较器.

By the way, I tried the similar thing on my MySQL instance, and the way it behaves is the same as BigQuery. Namely, the query does not return NULL records with "=!" comparator.

例如,

mysql> select * from test1;
+------+------------+
| id   | num        |
+------+------------+
|    0 | aaa        |
|    1 | bbb        |
|    8 | sdfsdfgsdf |
|    9 | NULL       |
| NULL | sdfsdfsfsf |
+------+------------+
5 rows in set (0.19 sec)

mysql> select * from test1 where id != 8;
+------+------+
| id   | num  |
+------+------+
|    0 | aaa  |
|    1 | bbb  |
|    9 | NULL |
+------+------+
3 rows in set (0.18 sec)

所以我认为这是SQL领域的标准行为.

So I think this is a standard behavior in SQL's world.

这篇关于BigQuery查询/结果中的NULL值行为的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆