Hive查询:匹配字符串数组中的列值以生成标志 [英] Hive Query: Matching column Values from Array of string to make Flags

查看:738
本文介绍了Hive查询:匹配字符串数组中的列值以生成标志的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一些记录,其中每一行都属于某些类别(数据类型-字符串数组)和唯一类别的单独列表(数据类型-字符串).我需要将每一行与唯一列表匹配,并为其创建标志.

I have some records where every row belongs to some categories (data type - array of string) and a separate list of unique category(data type - string). I need to match every row with unique list and create flags for it.

Input:
------
ID   Category
1    ["Physics","Math"]
2    ["Math"]
3    ["Math,"Chemistry"]
4    ["Physics","Computer"]

现在我在本地的excel中有单独的类别唯一列表,如下所示:

Now I have separate list of unique list of category in excel in local like below:

Unique Category
["Physics"]
["Math"]
["Chemistry"]
["Computer"]

最终输出应如下所示:

ID   Category                  Math_F  Physics_F  Computer_F  Chemistry_F
1    ["Physics","Math"]          1         1          0           0
2    ["Math"]                    1         0          0           0
3    ["Math,"Chemistry"]         1         0          0           1
4    ["Physics","Computer"]      0         1          1           0

有人可以帮助您进行查询,步骤和解释吗?我是Hive的新手.

Can someone please help with query, steps and explanation. I am new to Hive.

推荐答案

使用array_contains():

SELECT ID,
       Category,
       CASE
           WHEN array_contains(Category, 'Math') THEN 1
           ELSE 0
       END Math_F,
       CASE
           WHEN array_contains(Category, 'Physics') THEN 1
           ELSE 0
       END Physics_F,
       CASE
           WHEN array_contains(Category, 'Computer') THEN 1
           ELSE 0
       END Computer_F,
       CASE
           WHEN array_contains(Category, 'Chemistry') THEN 1
           ELSE 0
       END Chemistry_F
FROM TABLE t;

如果要使用唯一类别的数组动态构建列,请使用其他一些工具来构建查询. 例如,可以使用shell脚本来完成此操作.

And if you want columns to be constructed dynamically using your array of unique categories, then use some other tool for building query. For example it can be done using shell script.

请参阅此基于预定义数组构建SQL的示例.您可以轻松地从文件中添加数组读数:

See this example of building SQL based on pre-defined array. You can easily add your array reading from the file:

#!/bin/bash

#define array
array=( Physics Math Computer Chemistry )

#initial sql
sql="select ID,   
       Category,"

#get length of array
arraylength=${#array[@]}

#get first flag column
columns="CASE
  WHEN array_contains(Category,'${array[0]}') THEN 1 
  ELSE 0 
          END ${array[0]}_F"

#attach all other flags:
for (( i=1; i<=$(( $arraylength-1 )); i++ ))
do 
    columns="$columns,
CASE
    WHEN array_contains(Category,'${array[$i]}') THEN 1
    ELSE 0 
 END ${array[$i]}_F"
done

#final SQL
sql="$sql 
$columns
from table t;
"
#print result
 echo "$sql"

结果:

 SELECT ID,
       Category,
       CASE
           WHEN array_contains(Category, 'Physics') THEN 1
           ELSE 0
       END Physics_F,
       CASE
           WHEN array_contains(Category, 'Math') THEN 1
           ELSE 0
       END Math_F,
       CASE
           WHEN array_contains(Category, 'Computer') THEN 1
           ELSE 0
       END Computer_F,
       CASE
           WHEN array_contains(Category, 'Chemistry') THEN 1
           ELSE 0
       END Chemistry_F
FROM TABLE t;

您可以将Hive调用添加到上述脚本中:hive -e "$sql"以执行它,或将其保存到文件中.

You can add Hive call to the above script: hive -e "$sql" to execute it, or save it to the file instead.

这篇关于Hive查询:匹配字符串数组中的列值以生成标志的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆