将表转换为许多行的一键编码 [英] Transform table to one-hot encoding for many rows

查看:44
本文介绍了将表转换为许多行的一键编码的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有以下格式的SQL表:

  ID猫1个1个1天1楼2个2℃2天3个3楼 

现在,我想创建一个表,每行一个ID,并且一行中有多个Cat.我想要的输出如下所示:

  ID A B C D E F1 1 1 0 1 0 12 0 1 1 1 0 03 1 0 0 0 0 1 

我发现:

(下次尝试共享表格)

现在我们可以进行一些手动1-hot编码:

  SELECT ID,MAX(IF(cat ='A',1,0))cat_A,MAX(IF(cat ='B',1,0))cat_B,MAX(IF(cat ='C',1,0))cat_C来自数据按ID分组 

现在,我们要编写一个脚本,该脚本将自动创建所需的列:

  SELECT STRING_AGG(FORMAT("MAX(IF(cat ='%s',1,0))cat_%s",cat,cat),',')从 (SELECT DISTINCT猫来自数据按1排序) 

这将生成一个字符串,您可以将其复制并粘贴到查询中,然后使用1-hot编码您的数组/行:

  SELECT ID,MAX(IF(cat ='A',1,0))cat_A,MAX(IF(cat ='B',1,0))cat_B,MAX(IF(cat ='C',1,0))cat_C,MAX(IF(cat ='D',1,0))cat_D,MAX(IF(cat ='F',1,0))cat_F来自数据按ID分组 

这正是问题的所在.您可以使用SQL生成SQL,但需要使用该结果编写一个新查询.

I have a SQL table of the following format:

ID  Cat
1   A   
1   B
1   D
1   F
2   B
2   C
2   D
3   A
3   F

Now, I want to create a table with one ID per row, and multiple Cat's in a row. My desired output looks as follows:

ID  A  B  C  D  E  F
1   1  1  0  1  0  1
2   0  1  1  1  0  0
3   1  0  0  0  0  1

I have found:

Transform table to one-hot-encoding of single column value

However, I have more than 1000 Cat's, so I am looking for code to write this automatically, rather than manually. Who can help me with this?

解决方案

First let me transform the data you pasted into an actual table:

WITH data AS (
  SELECT REGEXP_EXTRACT(data2, '[0-9]') id, REGEXP_EXTRACT(data2, '[A-Z]') cat
  FROM (
    SELECT SPLIT("""1   A   
    1   B
    1   D
    1   F
    2   B
    2   C
    2   D
    3   A
    3   F""", '\n') AS data1
  ), UNNEST(data1) data2
)

SELECT * FROM data

(try sharing a table next time)

Now we can do some manual 1-hot encoding:

SELECT id 
 , MAX(IF(cat='A',1,0)) cat_A
 , MAX(IF(cat='B',1,0)) cat_B
 , MAX(IF(cat='C',1,0)) cat_C
FROM data
GROUP BY id

Now we want to write a script that will automatically create the columns we want:

SELECT STRING_AGG(FORMAT("MAX(IF(cat='%s',1,0))cat_%s", cat, cat), ', ') 
FROM (
  SELECT DISTINCT cat
  FROM data
  ORDER BY 1
)

That generates a string that you can copy paste into a query, that 1-hot encodes your arrays/rows:

SELECT id
,
MAX(IF(cat='A',1,0))cat_A, MAX(IF(cat='B',1,0))cat_B, MAX(IF(cat='C',1,0))cat_C, MAX(IF(cat='D',1,0))cat_D, MAX(IF(cat='F',1,0))cat_F
FROM data
GROUP BY id

And that's exactly what the question was asking for. You can generate SQL with SQL, but you'll need to write a new query using that result.

这篇关于将表转换为许多行的一键编码的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆