所有列中的雪花计数为空 [英] Snowflake count nulls in all columns

查看:12
本文介绍了所有列中的雪花计数为空的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我见过几个这样的问题-Count NULL Values from multiple columns with SQL

但是,是否真的没有一种方法来计算一个表中的空值(比方说,超过30列)?好像我不想把它们都按名称指定一样?

推荐答案

但是,是否真的没有一种方法来计算一个表中的空值(比方说,超过30列)?好像我不想把它们都按名称指定一样?

是的,正是如此。我不明白为什么这么难--就像 pandas 的一行字一样?

这里的关键点是,如果没有随附电池提供某些内容,则需要编写您自己的版本。它并不像看起来那么难。

假设输入表如下:

CREATE OR REPLACE TABLE t AS SELECT $1 AS col1, $2 AS col2, $3 AS col3, $4 AS col4 
FROM VALUES (1,2,3,10),(NULL,2,3,10),(NULL,NULL,4,10),(NULL,NULL,NULL,10);

SELECT * FROM t;
/*
+------+------+------+------+
| COL1 | COL2 | COL3 | COL4 |
+------+------+------+------+
| 1    | 2    | 3    |   10 |
| NULL | 2    | 3    |   10 |
| NULL | NULL | 4    |   10 |
| NULL | NULL | NULL |   10 |
+------+------+------+------+
*/

您可能知道如何编写提供所需输出的查询,但由于问题中没有提供,我将使用我自己的版本:

WITH cte AS (
  SELECT
      COUNT(*) AS total_rows
      ,total_rows - COUNT(col1) AS col1
      ,total_rows - COUNT(col2) AS col2
      ,total_rows - COUNT(col3) AS col3
      ,total_rows - COUNT(col4) AS col4
  FROM t
)
SELECT COLUMN_NAME, NULLS_COLUMN_COUNT,SUM(NULLS_COLUMN_COUNT) OVER() AS NULLS_TOTAL_COUNT
FROM cte
UNPIVOT (NULLS_COLUMN_COUNT FOR COLUMN_NAME IN (col1,col2,col3, col4))
ORDER BY COLUMN_NAME;
/*
+-------------+--------------------+-------------------+
| COLUMN_NAME | NULLS_COLUMN_COUNT | NULLS_TOTAL_COUNT |
+-------------+--------------------+-------------------+
| COL1        |                  3 |                 6 |
| COL2        |                  2 |                 6 |
| COL3        |                  1 |                 6 |
| COL4        |                  0 |                 6 |
+-------------+--------------------+-------------------+
*/
在这里我们可以看到,查询本质上是静电,几乎没有移动的parts(column_count_list/table_name/column_list):

WITH cte AS (
  SELECT
      COUNT(*) AS total_rows
      <column_count_list>
  FROM <table_name>
)
SELECT COLUMN_NAME, NULLS_COLUMN_COUNT,SUM(NULLS_COLUMN_COUNT) OVER() AS NULLS_TOTAL_COUNT
FROM cte
UNPIVOT (NULLS_COLUMN_COUNT FOR COLUMN_NAME IN (<column_list>))
ORDER BY COLUMN_NAME;

现在使用元数据和变量:

-- input
SET sch_name = 'my_schema';
SET tab_name = 't';

SELECT
     LISTAGG(c.COLUMN_NAME, ', ') WITHIN GROUP(ORDER BY c.COLUMN_NAME) AS column_list
    ,ANY_VALUE(c.TABLE_SCHEMA || '.' || c.TABLE_NAME) AS full_table_name
    ,LISTAGG(REPLACE(SPACE(6) || ',total_rows - COUNT(<col_name>) AS <col_name>' 
                              || CHAR(13)
             , '<col_name>', c.COLUMN_NAME), '') 
     WITHIN GROUP(ORDER BY COLUMN_NAME) AS column_count_list

    ,REPLACE(REPLACE(REPLACE(
'WITH cte AS (
  SELECT
      COUNT(*) AS total_rows
<column_count_list>
  FROM <table_name>
)
SELECT COLUMN_NAME, NULLS_COLUMN_COUNT,SUM(NULLS_COLUMN_COUNT) OVER() AS NULLS_TOTAL_COUNT
FROM cte
UNPIVOT (NULLS_COLUMN_COUNT FOR COLUMN_NAME IN (<column_list>))
ORDER BY COLUMN_NAME;'
    ,'<column_count_list>',     column_count_list)
    ,'<table_name>',            full_table_name)
    ,'<column_list>',           column_list) AS query_to_run

FROM INFORMATION_SCHEMA.COLUMNS c
WHERE TABLE_SCHEMA = UPPER($sch_name)
  AND TABLE_NAME = UPPER($tab_name);

运行代码将生成要运行的查询:

复制输出并运行它将生成输出。如果需要,可以使用存储过程进一步细化和包装此模板(但我将其留作练习)。

这篇关于所有列中的雪花计数为空的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆