连接同一表中以定界符分隔的列 [英] join columns separated by delimiter in same table

查看:89
本文介绍了连接同一表中以定界符分隔的列的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有以下数据集

color_code   fav_color_code    color_code_name    fav_color_name 
1|2          5                 blue|white         black
3|4          7|9               green|red          pink|yellow

我需要的第一值 color_code 加入到 color_code_name 的第一值和 color_code color_code_name 等的第二个值。

I need to join first value of color_code to first value of color_code_name and second value of color_code to second value of color_code_name etc..

code                color
1                   blue
2                   white
5                   black
3                   green
4                   red
7                   pink
9                   yellow

我正在使用以下代码,但是由于没有<$,所以它正在交叉连接c $ c> id 到 join 之后。如果我映射2列而不是多列,则此代码有效。

I am using the below code, but it is doing cross join since I dont have an id to join upon. This code work if I am mapping 2 columns but not multiple columns.

有人可以帮助我获得预期的结果吗?

Could someone help me to get the expected result?

SELECT 
        t1.code AS code, 
        t2.color AS color, 
      FROM 
        (
          SELECT 
            c.value :: varchar AS code, 
            row_number() over(
              order by 
                code
            ) AS rownum 
          FROM 
            table, 
            lateral flatten (
              input => split(color_code, '|')
            ) c  
          UNION 
          SELECT 
            d.value :: varchar AS code, 
            row_number() OVER(
              ORDER BY 
                code
            ) AS rownum
            FROM 
            table, 
            lateral flatten (
              input => split(fav_color_code, '|')
            ) d 
        ) t1 
        JOIN (
          SELECT 
            f.value :: varchar AS color, 
            row_number() OVER(
              ORDER BY 
                color
            ) AS rownum 
          FROM 
            table, 
            lateral flatten (
              input => split(color_code_name, '|')
            ) f 
          UNION 
          SELECT 
            g.value :: varchar AS color, 
            row_number() OVER(
              ORDER BY 
                color
            ) AS rownum 
          FROM 
            table, 
            lateral flatten (
              input => split(fav_color_name, '|')
            ) g 
        ) t2 ON (t1.rownum = t2.rownum) 
      ORDER BY 
        t1.color


推荐答案

您可以按照以下步骤进行此方法以进行解释,因为我认为一步之遥就是一团糟。

You can follow this approach in several steps for explanation purposes, because I think in one step is a mess.

注意:
代码解决方案是在 Hive 帖子没有指定任何 sql-query-engine ),但是在 Hive 中,几乎所有内容都 sql-standard

NOTE: The code solution is done in Hive(the post doesn't specify any sql-query-engine), but in Hive almost everything is sql-standard.

原始数据

+--------------------+------------------------+-------------------------+------------------------+--+
| colors.color_code  | colors.fav_color_code  | colors.color_code_name  | colors.fav_color_name  |
+--------------------+------------------------+-------------------------+------------------------+--+
| 1|2                | 5                      | blue|white              | black                  |
| 3|4                | 7|9                    | green|red               | pink|yellow            |
+--------------------+------------------------+-------------------------+------------------------+--

首先,我们创建一个 temp表,其中颜色ID ,其中我们将 code列 split 列连接在一起放入数组,然后爆炸 数组并用行号

First we create a temp table with color ids where we concatenate the code columns, split the column into an array and then explode the array with a rownumber

CREATE TABLE tc1 AS
SELECT ROW_NUMBER() OVER() AS rownum, CAST(color_id AS INT) as color_id
FROM colors
LATERAL VIEW EXPLODE(SPLIT(CONCAT(color_code,'|', fav_color_code),'\\|')) a1 AS color_id;

我们用 temp表 >颜色名称,我们按照以前的方法进行操作,但是现在我们将 color_name 列,分割将列放入数组,然后展开 数组使用行数

We create a second temp table with color names and we follow the approach as before but now we concatenate the color_name columns, split the column into an array and then explode the array with a rownumber

CREATE TABLE tc2 AS
SELECT ROW_NUMBER() OVER() AS rownum, color_name
FROM colors
LATERAL VIEW EXPLODE(SPLIT(CONCAT(color_code_name,'|', fav_color_name),'\\|')) a1 AS color_name;

我们通过 rownum join 临时表

we join temp tables by rownum

SELECT color_id, color_name
FROM tc1
JOIN tc2 ON(tc1.rownum = tc2.rownum)
ORDER BY color_id;

预期输出

+-----------+-------------+--+
| color_id  | color_name  |
+-----------+-------------+--+
| 1         | blue        |
| 2         | white       |
| 3         | green       |
| 4         | red         |
| 5         | black       |
| 7         | pink        |
| 9         | yellow      |
+-----------+-------------+--+

虽然不是轻量级查询,但一次执行相同的操作

Doing the same thing in one shot, though is not a light-weight query

SELECT tc1.color_id, tc2.color_name
FROM (SELECT ROW_NUMBER() OVER() AS rownum, CAST(color_id AS INT) as color_id
      FROM colors
      LATERAL VIEW EXPLODE(SPLIT(CONCAT(color_code,'|', fav_color_code),'\\|')) a1 AS color_id) AS tc1
JOIN (SELECT ROW_NUMBER() OVER() AS rownum, color_name
      FROM colors
      LATERAL VIEW EXPLODE(SPLIT(CONCAT(color_code_name,'|', fav_color_name),'\\|')) a1 AS color_name) AS tc2
ON(tc1.rownum = tc2.rownum)
ORDER BY tc1.color_id;

预期输出

+---------------+-----------------+--+
| tc1.color_id  | tc2.color_name  |
+---------------+-----------------+--+
| 1             | blue            |
| 2             | white           |
| 3             | green           |
| 4             | red             |
| 5             | black           |
| 7             | pink            |
| 9             | yellow          |
+---------------+-----------------+--+

这篇关于连接同一表中以定界符分隔的列的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆