Hive 查询结果中的 NULL 列名 [英] NULL column names in Hive query result

查看:31
本文介绍了Hive 查询结果中的 NULL 列名的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我已经从 NOAA 下载了天气 .txt 文件,看起来喜欢:

I have downloaded the weather .txt files from NOAA, which looks like:

<代码> WBAN,日期,时间,StationType,SkyCondition,SkyConditionFlag,可见性,VisibilityFlag,WeatherType,WeatherTypeFlag,DryBulbFarenheit,DryBulbFarenheitFlag,DryBulbCelsius,DryBulbCelsiusFlag,WetBulbFarenheit,WetBulbFarenheitFlag,WetBulbCelsius,WetBulbCelsiusFlag,DewPointFarenheit,DewPointFarenheitFlag,DewPointCelsius,DewPointCelsiusFlag,相对湿度、相对湿度标志、风速、风速标志、风向、风向标志、风特征值、风特征值值、站压力、站压力标志、压力趋势、压力趋势标志、压力变化、压力变化标志、海平面压力、海平面、海平面、海平面、海平面、海平面、高、高、高、00102,20150101,0001,0,OVC043,,,10.00,,,,,,27,,-2.8,,26,,-3.1,,25,,-3.9,,92,,0,,000,3,0., , , , , 30.36, ,AA, , , , 30.23,00102,20150101,0101,0,OVC045,,,10.00,,,,,,27,,-2.8,,26,,-3.1,,,25,,-3.9,,92,,6,,080,3,0, , , , , 30.37, ,AA, , , , 30.25,00102,20150101,0201,0,OVC047,,,10.00,,,,,,26,,-3.3,,25,,-3.7,,24,,-4.4,,92,,6,,090,3,0, , , , , 30.39, ,AA, , , , 30.26,00102,20150101,0301,0,OVC049,,,10.00,,,,,,26,,-3.3,,25,,-3.7,,,24,,-4.4,,92,,7,,100,3,0, , , , , 30.40, ,AA, , , , 30.27,

然后我创建了下表:

CREATE EXTERNAL TABLE weather(WBAN STRING, `Date` STRING, Time STRING, StationType INT, SkyCondition STRING, SkyConditionFlag STRING, Visibility INT, VisibilityFlag STRING, WeatherType STRING, WeatherTypeFlag STRING, DryBulbFarenheit INT, DryBulbFarenheitFlag STRING, DryBulbCelsius DECIMAL, DryBulbCelsiusFlag INT, WetBulbFarenheit INT, WetBulbFarenheitFlag INT, WetBulbCelsius DECIMAL, WetBulbCelsiusFlag INT, DewPointFarenheit INT, DewPointFarenheitFlag INT, DewPointCelsius DECIMAL, DewPointCelsiusFlag INT, RelativeHumidity INT, RelativeHumidityFlag INT, WindSpeed INT, WindSpeedFlag INT, WindDirection INT, WindDirectionFlag INT, ValueForWindCharacter INT, ValueForWindCharacterFlag INT, StationPressure DECIMAL, StationPressureFlag INT, PressureTendency INT, PressureTendencyFlag INT, PressureChange INT, PressureChangeFlag INT, SeaLevelPressure DECIMAL, SeaLevelPressureFlag INT, RecordType STRING, RecordTypeFlag STRING, HourlyPrecip DECIMAL, HourlyPrecipFlag INT, Altimeter DECIMAL, AltimeterFlag INT) 
    COMMENT 'Our weather table in HIVE!' 
    ROW FORMAT DELIMITED FIELDS TERMINATED BY ','
     LINES TERMINATED BY '
' 
     LOCATION '/data/Weather';

现在,如果我尝试一个简单的查询,例如:

Now if I try a simple query like:

hive> select * from weather limit 10;

我得到如下结果,并用 Null 替换了一些列的名称!

I get a result like the following, and with Null replacing some column's names!

WBAN    Date    Time    NULL    SkyCondition    SkyConditionFlag    NULL    VisibilityFlag  WeatherType WeatherTypeFlag NULL    DryBulbFarenheitFlag    NULL    NULL    NULL    NULL    NULL    NULL    NULL    NULL    NULL    NULL    NULL    NULL    NULL    NULL    NULL    NULL    NULLNULL    NULL    NULL    NULL    NULL    NULL    NULL    NULL    NULL    RecordType  RecordTypeFlag  NULL    NULL    NULL    NULL
00102   20150101    0001    0   OVC043      10              27      -3  NULL    26  NULL    -3  NULL25  NULL    -4  NULL    NULL    NULL    NULL    NULL    0   NULL    NULL    NULL    30  NULL    NULL    NULL    NULL    NULL    30  NULL    AA      NULL    NULL    30  NULL
00102   20150101    0101    0   OVC045      10              27      -3  NULL    26  NULL    -3  NULL25  NULL    -4  NULL    NULL    NULL    NULL    NULL    80  NULL    NULL    NULL    30  NULL    NULL    NULL    NULL    NULL    30  NULL    AA      NULL    NULL    30  NULL
00102   20150101    0201    0   OVC047      10  

您可能已经注意到,第四列和第七列(以及之后的许多列)在它们应该是 StationType 和 Visibility... 等时分别倾斜为 NULL!

as you may noticed, the fourth and the seventh columns (and many after those) are tilted NULL when they should be StationType, and Visibility... etc respectively!

即使我尝试过:

hive> select Visibility from weather limit 10;

我会得到正确的结果,但列标题/名称为空!!!

I will get the correct result, but with NULL column title/name!!!

为什么列名/标题为 NULL?!

推荐答案

有趣的问题,我花了一分钟才意识到发生了什么,但有了正确的 hive 知识,这实际上是显而易见的!

Interesting question, it took me a minute to realize what is going on but with the right knowledge of hive it is actually obvious!

  1. 首先要注意的是,NULL 值出现在非字符串类型的列中.
  2. 要意识到的第二件事是 hive(例如与直线不同)通常不会在您的选择上方打印列标题.

因此,将 1 和 2 放在一起:

So, putting 1 and 2 together:

  • 列名很好,正如您从Describe Weather 之类的查询中看到的那样.
  • 您用作数据源的文件的第一行似乎有列名.这些现在构成了您的配置单元表的第一行.当然,string 类型的列处理这些数据是没有问题的,但是 int 类型的列在被要求处理无法正确转换为 int 的字符串时会显示 NULL.
  • The column names are fine, as you will see from a query like Describe Weather.
  • The file that you use as datasource, appears to have had column names on the first row. These are now making up the first row of your hive table. Of course the columns of type string have no problem dealing with this data, but columns of type int will show NULL when they are asked to handle strings that cannot be cast to int properly.

建议:

尽量去掉第一行,最好在创建外部表之前.

Try to get rid of the first row, preferably before creating the external table.

这篇关于Hive 查询结果中的 NULL 列名的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆