Hive查询结果中的NULL列名 [英] NULL column names in Hive query result

查看:489
本文介绍了Hive查询结果中的NULL列名的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我已经从 NOAA 下载了天气.txt文件,该文件看起来像:

I have downloaded the weather .txt files from NOAA, which looks like:

WBAN,Date,Time,StationType,SkyCondition,SkyConditionFlag,Visibility,VisibilityFlag,WeatherType,WeatherTypeFlag,DryBulbFarenheit,DryBulbFarenheitFlag,DryBulbCelsius,DryBulbCelsiusFlag,WetBulbFarenheit,WetBulbFarenheitFlag,WetBulbCelsius,WetBulbCelsiusFlag,DewPointFarenheit,DewPointFarenheitFlag,DewPointCelsius,DewPointCelsiusFlag,RelativeHumidity,RelativeHumidityFlag,WindSpeed,WindSpeedFlag,WindDirection,WindDirectionFlag,ValueForWindCharacter,ValueForWindCharacterFlag,StationPressure,StationPressureFlag,PressureTendency,PressureTendencyFlag,PressureChange,PressureChangeFlag,SeaLevelPressure,SeaLevelPressureFlag,RecordType,RecordTypeFlag,HourlyPrecip,HourlyPrecipFlag,Altimeter,AltimeterFlag
00102,20150101,0001,0,OVC043, ,10.00, , , ,27, ,-2.8, ,26, ,-3.1, ,25, ,-3.9, , 92, , 0, ,000, , , ,30.05, , , , , ,30.36, ,AA, , , ,30.23, 
00102,20150101,0101,0,OVC045, ,10.00, , , ,27, ,-2.8, ,26, ,-3.1, ,25, ,-3.9, , 92, , 6, ,080, , , ,30.07, , , , , ,30.37, ,AA, , , ,30.25, 
00102,20150101,0201,0,OVC047, ,10.00, , , ,26, ,-3.3, ,25, ,-3.7, ,24, ,-4.4, , 92, , 6, ,090, , , ,30.08, , , , , ,30.39, ,AA, , , ,30.26, 
00102,20150101,0301,0,OVC049, ,10.00, , , ,26, ,-3.3, ,25, ,-3.7, ,24, ,-4.4, , 92, , 7, ,100, , , ,30.09, , , , , ,30.40, ,AA, , , ,30.27, 

然后我创建了下表:

CREATE EXTERNAL TABLE weather(WBAN STRING, `Date` STRING, Time STRING, StationType INT, SkyCondition STRING, SkyConditionFlag STRING, Visibility INT, VisibilityFlag STRING, WeatherType STRING, WeatherTypeFlag STRING, DryBulbFarenheit INT, DryBulbFarenheitFlag STRING, DryBulbCelsius DECIMAL, DryBulbCelsiusFlag INT, WetBulbFarenheit INT, WetBulbFarenheitFlag INT, WetBulbCelsius DECIMAL, WetBulbCelsiusFlag INT, DewPointFarenheit INT, DewPointFarenheitFlag INT, DewPointCelsius DECIMAL, DewPointCelsiusFlag INT, RelativeHumidity INT, RelativeHumidityFlag INT, WindSpeed INT, WindSpeedFlag INT, WindDirection INT, WindDirectionFlag INT, ValueForWindCharacter INT, ValueForWindCharacterFlag INT, StationPressure DECIMAL, StationPressureFlag INT, PressureTendency INT, PressureTendencyFlag INT, PressureChange INT, PressureChangeFlag INT, SeaLevelPressure DECIMAL, SeaLevelPressureFlag INT, RecordType STRING, RecordTypeFlag STRING, HourlyPrecip DECIMAL, HourlyPrecipFlag INT, Altimeter DECIMAL, AltimeterFlag INT) 
    COMMENT 'Our weather table in HIVE!' 
    ROW FORMAT DELIMITED FIELDS TERMINATED BY ','
     LINES TERMINATED BY '\n' 
     LOCATION '/data/Weather';

现在,如果我尝试一个简单的查询,例如:

Now if I try a simple query like:

hive> select * from weather limit 10;

我得到如下结果,并且用Null替换某些列的名称!

I get a result like the following, and with Null replacing some column's names!

WBAN    Date    Time    NULL    SkyCondition    SkyConditionFlag    NULL    VisibilityFlag  WeatherType WeatherTypeFlag NULL    DryBulbFarenheitFlag    NULL    NULL    NULL    NULL    NULL    NULL    NULL    NULL    NULL    NULL    NULL    NULL    NULL    NULL    NULL    NULL    NULLNULL    NULL    NULL    NULL    NULL    NULL    NULL    NULL    NULL    RecordType  RecordTypeFlag  NULL    NULL    NULL    NULL
00102   20150101    0001    0   OVC043      10              27      -3  NULL    26  NULL    -3  NULL25  NULL    -4  NULL    NULL    NULL    NULL    NULL    0   NULL    NULL    NULL    30  NULL    NULL    NULL    NULL    NULL    30  NULL    AA      NULL    NULL    30  NULL
00102   20150101    0101    0   OVC045      10              27      -3  NULL    26  NULL    -3  NULL25  NULL    -4  NULL    NULL    NULL    NULL    NULL    80  NULL    NULL    NULL    30  NULL    NULL    NULL    NULL    NULL    30  NULL    AA      NULL    NULL    30  NULL
00102   20150101    0201    0   OVC047      10  

您可能会注意到,当第四列和第七列(以及后面的许多列)分别为StationType和Visibility ...等时,它们会倾斜为NULL!

as you may noticed, the fourth and the seventh columns (and many after those) are tilted NULL when they should be StationType, and Visibility... etc respectively!

即使我尝试过:

hive> select Visibility from weather limit 10;

我会得到正确的结果,但是列标题/名称为NULL!

I will get the correct result, but with NULL column title/name!!!

为什么NULL列名/标题为空?!

推荐答案

有趣的问题,我花了一分钟才意识到正在发生的事情,但是对蜂巢的正确了解实际上是显而易见的!

Interesting question, it took me a minute to realize what is going on but with the right knowledge of hive it is actually obvious!

  1. 这里首先要注意的是,NULL值出现在非字符串类型的列中.
  2. 要了解的第二件事是,配置单元(例如,不同于直线)通常不会在所选内容上方打印列标题.

因此,将1和2放在一起:

So, putting 1 and 2 together:

  • 您可以从诸如Describe Weather之类的查询中看到列名.
  • 用作数据源的文件似乎在第一行具有列名.这些现在构成了配置单元表的第一行.当然,字符串类型的列在处理该数据时没有问题,但是当要求类型int的列处理无法正确转换为字符串的字符串时,它们将显示NULL.
  • The column names are fine, as you will see from a query like Describe Weather.
  • The file that you use as datasource, appears to have had column names on the first row. These are now making up the first row of your hive table. Of course the columns of type string have no problem dealing with this data, but columns of type int will show NULL when they are asked to handle strings that cannot be cast to int properly.

建议:

尝试摆脱第一行,最好在创建外部表之前.

Try to get rid of the first row, preferably before creating the external table.

这篇关于Hive查询结果中的NULL列名的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆