在HIVE QL的表格中添加一列 [英] Add a column in a table in HIVE QL

查看:587
本文介绍了在HIVE QL的表格中添加一列的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我在HIVE中编写代码以创建一个由1300行和6列组成的表:

  create table test1作为response_time_min,
AVG(response_time_avg)作为response_time_avg,
MAX(response_time_max)作为response_time_max,$ b作为response_time_min,
SUM(access_count)AS max_count,
MIN(response_time_min) $ b SUM(response_time_tot)as response_time_tot,
COUNT(*)as row_count
FROM sheet WHERE ts_update BETWEEN unix_timestamp('2012-11-01 00:00:00')AND
unix_timestamp( '2012-11-30 00:00:00')和cd_office ='016'
GROUP BY cd_screen_function ORDER BY max_count DESC,cd_screen_function;

现在我想添加另一列为 access_count1 其中包含所有1300行的唯一值,值将为 sum(max_count)。 max_count是我现有表格中的一列。我该怎么做?我试图通过以下代码来改变表: ALTER TABLE test1 ADD COLUMNS(access_count1 int)set default sum(max_count);

解决方案

无法在Hive中添加具有默认值的列。您有用于添加列的正确的语法 ALTER TABLE test1 ADD COLUMNS(access_count1 int); ,您只需要清除默认总和(max_count)。作为添加列的结果,不会更改支持您的表的文件。 Hive处理缺失数据是通过解释 NULL 作为该列中每个单元格的值。



需要填充列的问题。不幸的是,在Hive中,本质上需要重写整个表格,这次是填充列。用新列重新运行原始查询可能会更容易。或者您可以将列添加到您现在拥有的表中,然后选择所有列以及新列的值。

您也可以选择总是 COALESCE 将列设置为所需的默认值,现在将其保留为 NULL 。如果您希望 NULL 的含义与您希望的默认值不同,则此选项失败。它还要求你依赖于永远记住 COALESCE



如果你对你的交易能力非常有信心使用支持Hive的文件,您也可以直接修改它们以添加默认设置。总的来说,我会建议不要这样做,因为大部分时间它会更慢,更危险。可能有些情况下,它是有道理的,所以我已经包含了这个选项的完整性。


I'm writing a code in HIVE to create a table consisting of 1300 rows and 6 columns:

create table test1 as SELECT cd_screen_function,
     SUM(access_count) AS max_count,
     MIN(response_time_min) as response_time_min,
     AVG(response_time_avg) as response_time_avg,
     MAX(response_time_max) as response_time_max,
     SUM(response_time_tot) as response_time_tot,
     COUNT(*) as row_count
     FROM sheet WHERE  ts_update BETWEEN unix_timestamp('2012-11-01 00:00:00') AND 
     unix_timestamp('2012-11-30 00:00:00') and cd_office = '016'
     GROUP BY cd_screen_function ORDER BY max_count DESC, cd_screen_function;

Now I want to add another column as access_count1 which consists one unique value for all 1300 rows and value will be sum(max_count). max_count is a column in my existing table. How I can do that? I am trying to alter the table by this code ALTER TABLE test1 ADD COLUMNS (access_count1 int) set default sum(max_count);

解决方案

You cannot add a column with a default value in Hive. You have the right syntax for adding the column ALTER TABLE test1 ADD COLUMNS (access_count1 int);, you just need to get rid of default sum(max_count). No changes to that files backing your table will happen as a result of adding the column. Hive handles the "missing" data by interpreting NULL as the value for every cell in that column.

So now your have the problem of needing to populate the column. Unfortunately in Hive you essentially need to rewrite the whole table, this time with the column populated. It may be easier to rerun your original query with the new column. Or you could add the column to the table you have now, then select all of its columns plus value for the new column.

You also have the option to always COALESCE the column to your desired default and leave it NULL for now. This option fails when you want NULL to have a meaning distinct from your desired default. It also requires you to depend on always remembering to COALESCE.

If you are very confident in your abilities to deal with the files backing Hive, you could also directly alter them to add your default. In general I would recommend against this because most of the time it will be slower and more dangerous. There might be some case where it makes sense though, so I've included this option for completeness.

这篇关于在HIVE QL的表格中添加一列的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆