在配置单元表的顶部添加一些行 [英] Add some lines at the top of hive table

查看:212
本文介绍了在配置单元表的顶部添加一些行的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我在蜂巢中有一个这种形式的表格(之前):

AB_dimp|SF_0060H00000nhSrmQAE|EBA Order 1127735|Execute|New From
AB_dimp|SF_0060H00000nhSwkQAE|EBA Order 1127725|Execute|New From
AB_Dimp|SF_0060H00000nhSyDQAU|EBA Order 1127728|Execute|New From

我想将这3行以这种形式(之后)显示在配置单元中的表格的顶部:

[Yellow]
Cat ID|AN_Net|
[network]
AB_dimp|SF_0060H00000nhSkPQAU|EBA Order 1127708|Execute|New From
AB_DIMP|SF_0060H00000nhSl8QAE|EBA Order 1127709|Execute|New From
AB_DIMP|SF_0060H00000nhSrmQAE|EBA Order 1127735|Execute|New From

请问如何在Hive中实现这一目标?

解决方案

使用全部合并:

select '[Yellow]' as col_name union all
select 'ID|AN_Net|'           union all
select '[network]'            union all
select col_name from your_table;

如果要在表中添加这些行,不仅可以选择它们,也不需要中间表即可实现:

insert overwrite your_table 
select * from 
(
    select '[Yellow]' as col_name union all
    select 'ID|AN_Net|'           union all
    select '[network]'            union all
    select col_name from your_table
)s;

但是请记住,表中的行不是有序的.当选择表而不中,选择正在并行地对许多映射器执行.基础文件正在拆分,并且映射器读取每个自己的拆分.它们彼此完全隔离地执行,并且返回结果也独立.您会看到,返回结果的速度更快,您只能看到order by保证返回的行的顺序.这意味着,下次您以某种可能性选择该表时,可能会返回这些其他行而不是第一行.只有ORDER BY可以保证行的顺序.并且您需要具有一些可用于对行进行排序的列,例如id,或者您的列可用于order by. 如果表很小,则有可能在单个映射器上读取它,并且将以原始顺序返回行,就像在基础文件中一样.

要保留文件中行的顺序,您可以添加row_order列,并在ORDER BY的上部查询中使用它:

select  DRM_Pln_Parent, opportunityid, opportunity_name
   from
   (
   SELECT 1 as row_order, '[hier]' as DRM_Pln_Parent, '' as opportunityid, '' as opportunity_name
UNION ALL
   SELECT 2 as row_order, 'Opportunity ID|SF_AllOpportunities|' as DRM_Pln_Parent, '' as opportunityid, '' as opportunity_name
UNION ALL
   SELECT 3 as row_order, '[relation]' as DRM_Pln_Parent, '' as opportunityid, '' as opportunity_name
UNION ALL 
   SELECT DISTINCT 4 as row_order, 'SF_AllOpportunities' AS DRM_Pln_Parent, 
CONCAT('SF_',opportunityid) as opportunityid, 
opportunity_name, 
from ...

   )s
order by row_order  

为进一步了解,请参见以下答案: https://stackoverflow.com/a/43368113/2700344

I have a table of this form in hive (Before):

AB_dimp|SF_0060H00000nhSrmQAE|EBA Order 1127735|Execute|New From
AB_dimp|SF_0060H00000nhSwkQAE|EBA Order 1127725|Execute|New From
AB_Dimp|SF_0060H00000nhSyDQAU|EBA Order 1127728|Execute|New From

And I want to put these 3 lines to appear at the top of that table in hive in this form (After):

[Yellow]
Cat ID|AN_Net|
[network]
AB_dimp|SF_0060H00000nhSkPQAU|EBA Order 1127708|Execute|New From
AB_DIMP|SF_0060H00000nhSl8QAE|EBA Order 1127709|Execute|New From
AB_DIMP|SF_0060H00000nhSrmQAE|EBA Order 1127735|Execute|New From

How can I achieve that in Hive please?

解决方案

Use union all:

select '[Yellow]' as col_name union all
select 'ID|AN_Net|'           union all
select '[network]'            union all
select col_name from your_table;

If you want to add these lines in the table, not only select them, you do not need intermediate table to achieve this:

insert overwrite your_table 
select * from 
(
    select '[Yellow]' as col_name union all
    select 'ID|AN_Net|'           union all
    select '[network]'            union all
    select col_name from your_table
)s;

But bear in mind, that rows in the table are not ordered. When you select table without order by, select is being executed in parallel on many mappers. The underlying file(s) are being splitted and mappers read each own splits. They are executed quite isolated from each other in parallel and return results also independently. Which is faster it's result will be returned faster, you see, only order by guarantees the order of rows returned. This means that next time when you select this table with some probability you may have these additional rows returned not as first ones. Only ORDER BY can guarantee the order of rows. And you need to have some column which you can use for ordering rows, like id, or your column can be used in the order by. If the table is small it is a chance that it will be read on single mapper and rows will be returned in the original order, like in the underlying file.

To preserve order of rows in a file you can add row_order column and use it in the upper query in the ORDER BY:

select  DRM_Pln_Parent, opportunityid, opportunity_name
   from
   (
   SELECT 1 as row_order, '[hier]' as DRM_Pln_Parent, '' as opportunityid, '' as opportunity_name
UNION ALL
   SELECT 2 as row_order, 'Opportunity ID|SF_AllOpportunities|' as DRM_Pln_Parent, '' as opportunityid, '' as opportunity_name
UNION ALL
   SELECT 3 as row_order, '[relation]' as DRM_Pln_Parent, '' as opportunityid, '' as opportunity_name
UNION ALL 
   SELECT DISTINCT 4 as row_order, 'SF_AllOpportunities' AS DRM_Pln_Parent, 
CONCAT('SF_',opportunityid) as opportunityid, 
opportunity_name, 
from ...

   )s
order by row_order  

For better understanding see also this answer: https://stackoverflow.com/a/43368113/2700344

这篇关于在配置单元表的顶部添加一些行的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆