在配置单元表的顶部添加一些行 [英] Add some lines at the top of hive table
问题描述
我在蜂巢中有一个这种形式的表格(之前):
AB_dimp|SF_0060H00000nhSrmQAE|EBA Order 1127735|Execute|New From
AB_dimp|SF_0060H00000nhSwkQAE|EBA Order 1127725|Execute|New From
AB_Dimp|SF_0060H00000nhSyDQAU|EBA Order 1127728|Execute|New From
我想将这3行以这种形式(之后)显示在配置单元中的表格的顶部:
[Yellow]
Cat ID|AN_Net|
[network]
AB_dimp|SF_0060H00000nhSkPQAU|EBA Order 1127708|Execute|New From
AB_DIMP|SF_0060H00000nhSl8QAE|EBA Order 1127709|Execute|New From
AB_DIMP|SF_0060H00000nhSrmQAE|EBA Order 1127735|Execute|New From
请问如何在Hive中实现这一目标?
使用全部合并:
select '[Yellow]' as col_name union all
select 'ID|AN_Net|' union all
select '[network]' union all
select col_name from your_table;
如果要在表中添加这些行,不仅可以选择它们,也不需要中间表即可实现:
insert overwrite your_table
select * from
(
select '[Yellow]' as col_name union all
select 'ID|AN_Net|' union all
select '[network]' union all
select col_name from your_table
)s;
但是请记住,表中的行不是有序的.当选择表而不
要保留文件中行的顺序,您可以添加row_order列,并在ORDER BY的上部查询中使用它:
select DRM_Pln_Parent, opportunityid, opportunity_name
from
(
SELECT 1 as row_order, '[hier]' as DRM_Pln_Parent, '' as opportunityid, '' as opportunity_name
UNION ALL
SELECT 2 as row_order, 'Opportunity ID|SF_AllOpportunities|' as DRM_Pln_Parent, '' as opportunityid, '' as opportunity_name
UNION ALL
SELECT 3 as row_order, '[relation]' as DRM_Pln_Parent, '' as opportunityid, '' as opportunity_name
UNION ALL
SELECT DISTINCT 4 as row_order, 'SF_AllOpportunities' AS DRM_Pln_Parent,
CONCAT('SF_',opportunityid) as opportunityid,
opportunity_name,
from ...
)s
order by row_order
为进一步了解,请参见以下答案: https://stackoverflow.com/a/43368113/2700344 >
I have a table of this form in hive (Before):
AB_dimp|SF_0060H00000nhSrmQAE|EBA Order 1127735|Execute|New From
AB_dimp|SF_0060H00000nhSwkQAE|EBA Order 1127725|Execute|New From
AB_Dimp|SF_0060H00000nhSyDQAU|EBA Order 1127728|Execute|New From
And I want to put these 3 lines to appear at the top of that table in hive in this form (After):
[Yellow]
Cat ID|AN_Net|
[network]
AB_dimp|SF_0060H00000nhSkPQAU|EBA Order 1127708|Execute|New From
AB_DIMP|SF_0060H00000nhSl8QAE|EBA Order 1127709|Execute|New From
AB_DIMP|SF_0060H00000nhSrmQAE|EBA Order 1127735|Execute|New From
How can I achieve that in Hive please?
Use union all:
select '[Yellow]' as col_name union all
select 'ID|AN_Net|' union all
select '[network]' union all
select col_name from your_table;
If you want to add these lines in the table, not only select them, you do not need intermediate table to achieve this:
insert overwrite your_table
select * from
(
select '[Yellow]' as col_name union all
select 'ID|AN_Net|' union all
select '[network]' union all
select col_name from your_table
)s;
But bear in mind, that rows in the table are not ordered. When you select table without order by
, select is being executed in parallel on many mappers. The underlying file(s) are being splitted and mappers read each own splits. They are executed quite isolated from each other in parallel and return results also independently. Which is faster it's result will be returned faster, you see, only order by guarantees the order of rows returned. This means that next time when you select this table with some probability you may have these additional rows returned not as first ones. Only ORDER BY can guarantee the order of rows. And you need to have some column which you can use for ordering rows, like id, or your column can be used in the order by.
If the table is small it is a chance that it will be read on single mapper and rows will be returned in the original order, like in the underlying file.
To preserve order of rows in a file you can add row_order column and use it in the upper query in the ORDER BY:
select DRM_Pln_Parent, opportunityid, opportunity_name
from
(
SELECT 1 as row_order, '[hier]' as DRM_Pln_Parent, '' as opportunityid, '' as opportunity_name
UNION ALL
SELECT 2 as row_order, 'Opportunity ID|SF_AllOpportunities|' as DRM_Pln_Parent, '' as opportunityid, '' as opportunity_name
UNION ALL
SELECT 3 as row_order, '[relation]' as DRM_Pln_Parent, '' as opportunityid, '' as opportunity_name
UNION ALL
SELECT DISTINCT 4 as row_order, 'SF_AllOpportunities' AS DRM_Pln_Parent,
CONCAT('SF_',opportunityid) as opportunityid,
opportunity_name,
from ...
)s
order by row_order
For better understanding see also this answer: https://stackoverflow.com/a/43368113/2700344
这篇关于在配置单元表的顶部添加一些行的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!