在hiveQL中嵌套选择 [英] Nested select in hiveQL

查看:964
本文介绍了在hiveQL中嵌套选择的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

在我的一个用例中,我有两个表,即flow和conf。流量表包含所有航班数据的列表。它有列creationdate,datafilename,aircraftid。 conf表包含配置信息。它有列configdate,aircraftid,configurationame。对于一种飞机类型创建了多种配置版本。所以,当我们处理一个数据文件名时,我们需要从流表中识别出aircraftid,并从数据文件名创建之前创建的conf表中选取配置。所以,我试过了,

  FROM(
SELECT
F_FILE_CREATION_DATE,
F_FILE_ARCHIVED_RELATIVE_PATH,
F_FILE_ARCHIVED_NAME,
K_AIRCRAFT
来自T_FLOW f)x左连接

选择c.config_date,c.aircraft_id,c.configuration从t_conf c
)y on y.aircraft_id = x.K_AIRCRAFT
选择
x.F_FILE_CREATION_DATE,
x.F_FILE_ARCHIVED_RELATIVE_PATH,
x.F_FILE_ARCHIVED_NAME,
x.K_AIRCRAFT,
y .config_date,
y.aircraft_id,
y.configuration;

这个选项为飞机创建了所有配置,这是显而易见的,因为没有条件检查 conf.config_date< flow.f_file_creation_date

  FROM(
SELECT
F_FILE_CREATION_DATE,
F_FILE_ARCHIVED_RELATIVE_PATH,
F_FILE_ARCHIVED_NAME,
T_FLOW中的K_AIRCRAFT
f)x加入

)从t_conf中选择c.config_date,c.aircraft_id,c.FILEFILTER c
)y on y.aircraft_id = x.K_AIRCRAFT其中y.config_date< x.f_file_creation_date
选择
x.F_FILE_CREATION_DATE,
x.F_FILE_ARCHIVED_RELATIVE_PATH,
x.F_FILE_ARCHIVED_NAME,
x.K_AIRCRAFT,
y.config_date,
y.aircraft_id,
y.filefilter;

这次因错误而失败



<$ p

$ b $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $

有人可以给我一两个提示,让我知道怎么解决这个问题吗?

解决方案

  select f.f_file_creation_date 
,f.f_file_archived_relative_path
,f.f_file_archived_name
,f.k_aircraft

,c。 config_date
,c.aircraft_id
,c.filefilter

from t_flow as f

join(select config_date $ b $,aircraft_id
,filefilter

,lead(config_date,1,date'3000-01-01')over

partition by aircraft_id
by config_date
)as next_config_date

from t_conf
)c

on c.aircraft_id =
f.k_aircraft

where f。 f_file_creation_date> = c.config_date
和f.f_file_creation_date< c.next_config_date






请仔细阅读



发布问题



发布数据相关问题时 -


  1. 提供数据样本:源数据+所需结果。

    这比您给出的任何解释都要清楚。
    它也将为进一步讨论提供一个共同背景,并为您和其他人提供验证给定解决方案正确性的方法。

  2. 提供表格的大小属性(记录/音量)。
    对于性能考虑很重要,可能会影响给定的解决方案。

SQL


  1. 不支持除equijoin之外的任何JOIN条件类型(例如 t1.X = t2.X和t1.Y = t2.Y )。这就是为什么你会得到一个错误。

    如果你正在做一个内部连接(而不是外部连接),那么你可以把非等同条件移动到WHERE子句中。

  2. 坚持ISO SQL标准。有一个传统的SQL子句的顺序:SELECT-FROM-WHERE ...

    除了深奥的错误消息外,您从深奥的语法中没有获得任何东西。

  3. 没有这是为什么使用子查询来缩小列列表的原因。
    只是为了让它完全清楚 - 没有任何性能上的改进。更重要的是,如果它能够按照您的设想工作(并且不会),那么表现可能会 ,而不是更好。


In one of my use case, i have two tables namely flow and conf. The flow table contains list of all flight data. It has columns creationdate,datafilename,aircraftid. The conf table contains configuration information. It has columns configdate, aircraftid, configurationame. There are multiple versions of configurations created for one aircraft type. So, when we process a datafilename, we need to identify the aircraftid from the flow table, and pick up the configuration from conf table that was created just before the datafilename was created. So, i tried this,

FROM (
SELECT  
F_FILE_CREATION_DATE,    
F_FILE_ARCHIVED_RELATIVE_PATH,   
F_FILE_ARCHIVED_NAME,   
K_AIRCRAFT   
from T_FLOW f )x left join 
(
  select c.config_date, c.aircraft_id, c.configurationfrom t_conf c
) y on y.aircraft_id = x.K_AIRCRAFT 
select    
x.F_FILE_CREATION_DATE,  
x.F_FILE_ARCHIVED_RELATIVE_PATH,   
x.F_FILE_ARCHIVED_NAME,   
x.K_AIRCRAFT,   
y.config_date,
y.aircraft_id,
y.configuration;

This picks up all the configurations created for the aircraft which is obvious as there is no condition to check conf.config_date < flow.f_file_creation_date. I tried to include this condition like this,

 FROM (
SELECT  
F_FILE_CREATION_DATE,    
F_FILE_ARCHIVED_RELATIVE_PATH,   
F_FILE_ARCHIVED_NAME,   
K_AIRCRAFT   
from T_FLOW f )x join 
(
  select c.config_date, c.aircraft_id, c.FILEFILTER from t_conf c
) y on y.aircraft_id = x.K_AIRCRAFT  where y.config_date < x.f_file_creation_date    
select    
x.F_FILE_CREATION_DATE,  
x.F_FILE_ARCHIVED_RELATIVE_PATH,   
x.F_FILE_ARCHIVED_NAME,   
x.K_AIRCRAFT,   
y.config_date,
y.aircraft_id,
y.filefilter;

This time failed with the error

 required (...)+ loop did not match anything at input 'where' in statement

Can someone give me a hint or two where i am going wrong and on how to fix this?

解决方案

select  f.f_file_creation_date
       ,f.f_file_archived_relative_path   
       ,f.f_file_archived_name
       ,f.k_aircraft

       ,c.config_date
       ,c.aircraft_id
       ,c.filefilter

from            t_flow  as f

        join   (select  config_date
                       ,aircraft_id
                       ,filefilter

                       ,lead (config_date,1,date '3000-01-01') over 
                        (
                            partition by    aircraft_id
                            order by        config_date  
                        )   as next_config_date                        

                from    t_conf
                ) c

        on      c.aircraft_id = 
                f.k_aircraft

where   f.f_file_creation_date  >=  c.config_date 
    and f.f_file_creation_date  <   c.next_config_date   


Please read carefully

Posting a question

When you post a data related question -

  1. Supply a data sample: source data + required results.
    It is going to be more clear than any explanation you give. It will also supply a common background for further discussions and a way for you and others to verify the correctness of the given solutions.
  2. Supply the size properties (records/volume) of the tables.
    It is important for performance considerations ans might impact the given solution.

SQL

  1. Hive currently does not support any JOIN condition type other than equijoin (e.g. t1.X = t2.X and t1.Y = t2.Y). This is why you get an error.
    If you are doing an inner join (and not outer join) then you can move the non-equijoin conditions to the WHERE clause.
  2. Stick to ISO SQL standard. There is a conventional order for SQL clauses: SELECT-FROM-WHERE...
    You gain nothing from esoteric syntax except for esoteric error messages.
  3. There is no reason what so ever to use sub-queries in order to narrow the columns list.
    Just to make it perfectly clear - There isn't any performance gain doing that. More than that, if it would have work as you assume (and it does not) the performance would have been worse, not better.

这篇关于在hiveQL中嵌套选择的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆