Hive查询结果中的NULL列名 [英] NULL column names in Hive query result
问题描述
我已经从 NOAA 下载了天气.txt文件,该文件看起来像:
I have downloaded the weather .txt files from NOAA, which looks like:
WBAN,Date,Time,StationType,SkyCondition,SkyConditionFlag,Visibility,VisibilityFlag,WeatherType,WeatherTypeFlag,DryBulbFarenheit,DryBulbFarenheitFlag,DryBulbCelsius,DryBulbCelsiusFlag,WetBulbFarenheit,WetBulbFarenheitFlag,WetBulbCelsius,WetBulbCelsiusFlag,DewPointFarenheit,DewPointFarenheitFlag,DewPointCelsius,DewPointCelsiusFlag,RelativeHumidity,RelativeHumidityFlag,WindSpeed,WindSpeedFlag,WindDirection,WindDirectionFlag,ValueForWindCharacter,ValueForWindCharacterFlag,StationPressure,StationPressureFlag,PressureTendency,PressureTendencyFlag,PressureChange,PressureChangeFlag,SeaLevelPressure,SeaLevelPressureFlag,RecordType,RecordTypeFlag,HourlyPrecip,HourlyPrecipFlag,Altimeter,AltimeterFlag
00102,20150101,0001,0,OVC043, ,10.00, , , ,27, ,-2.8, ,26, ,-3.1, ,25, ,-3.9, , 92, , 0, ,000, , , ,30.05, , , , , ,30.36, ,AA, , , ,30.23,
00102,20150101,0101,0,OVC045, ,10.00, , , ,27, ,-2.8, ,26, ,-3.1, ,25, ,-3.9, , 92, , 6, ,080, , , ,30.07, , , , , ,30.37, ,AA, , , ,30.25,
00102,20150101,0201,0,OVC047, ,10.00, , , ,26, ,-3.3, ,25, ,-3.7, ,24, ,-4.4, , 92, , 6, ,090, , , ,30.08, , , , , ,30.39, ,AA, , , ,30.26,
00102,20150101,0301,0,OVC049, ,10.00, , , ,26, ,-3.3, ,25, ,-3.7, ,24, ,-4.4, , 92, , 7, ,100, , , ,30.09, , , , , ,30.40, ,AA, , , ,30.27,
然后我创建了下表:
CREATE EXTERNAL TABLE weather(WBAN STRING, `Date` STRING, Time STRING, StationType INT, SkyCondition STRING, SkyConditionFlag STRING, Visibility INT, VisibilityFlag STRING, WeatherType STRING, WeatherTypeFlag STRING, DryBulbFarenheit INT, DryBulbFarenheitFlag STRING, DryBulbCelsius DECIMAL, DryBulbCelsiusFlag INT, WetBulbFarenheit INT, WetBulbFarenheitFlag INT, WetBulbCelsius DECIMAL, WetBulbCelsiusFlag INT, DewPointFarenheit INT, DewPointFarenheitFlag INT, DewPointCelsius DECIMAL, DewPointCelsiusFlag INT, RelativeHumidity INT, RelativeHumidityFlag INT, WindSpeed INT, WindSpeedFlag INT, WindDirection INT, WindDirectionFlag INT, ValueForWindCharacter INT, ValueForWindCharacterFlag INT, StationPressure DECIMAL, StationPressureFlag INT, PressureTendency INT, PressureTendencyFlag INT, PressureChange INT, PressureChangeFlag INT, SeaLevelPressure DECIMAL, SeaLevelPressureFlag INT, RecordType STRING, RecordTypeFlag STRING, HourlyPrecip DECIMAL, HourlyPrecipFlag INT, Altimeter DECIMAL, AltimeterFlag INT)
COMMENT 'Our weather table in HIVE!'
ROW FORMAT DELIMITED FIELDS TERMINATED BY ','
LINES TERMINATED BY '\n'
LOCATION '/data/Weather';
现在,如果我尝试一个简单的查询,例如:
Now if I try a simple query like:
hive> select * from weather limit 10;
我得到如下结果,并且用Null替换某些列的名称!
I get a result like the following, and with Null replacing some column's names!
WBAN Date Time NULL SkyCondition SkyConditionFlag NULL VisibilityFlag WeatherType WeatherTypeFlag NULL DryBulbFarenheitFlag NULL NULL NULL NULL NULL NULL NULL NULL NULL NULL NULL NULL NULL NULL NULL NULL NULLNULL NULL NULL NULL NULL NULL NULL NULL NULL RecordType RecordTypeFlag NULL NULL NULL NULL
00102 20150101 0001 0 OVC043 10 27 -3 NULL 26 NULL -3 NULL25 NULL -4 NULL NULL NULL NULL NULL 0 NULL NULL NULL 30 NULL NULL NULL NULL NULL 30 NULL AA NULL NULL 30 NULL
00102 20150101 0101 0 OVC045 10 27 -3 NULL 26 NULL -3 NULL25 NULL -4 NULL NULL NULL NULL NULL 80 NULL NULL NULL 30 NULL NULL NULL NULL NULL 30 NULL AA NULL NULL 30 NULL
00102 20150101 0201 0 OVC047 10
您可能会注意到,当第四列和第七列(以及后面的许多列)分别为StationType和Visibility ...等时,它们会倾斜为NULL!
as you may noticed, the fourth and the seventh columns (and many after those) are tilted NULL when they should be StationType, and Visibility... etc respectively!
即使我尝试过:
hive> select Visibility from weather limit 10;
我会得到正确的结果,但是列标题/名称为NULL!
I will get the correct result, but with NULL column title/name!!!
为什么NULL列名/标题为空?!
推荐答案
有趣的问题,我花了一分钟才意识到正在发生的事情,但是对蜂巢的正确了解实际上是显而易见的!
Interesting question, it took me a minute to realize what is going on but with the right knowledge of hive it is actually obvious!
- 这里首先要注意的是,NULL值出现在非字符串类型的列中.
- 要了解的第二件事是,配置单元(例如,不同于直线)通常不会在所选内容上方打印列标题.
因此,将1和2放在一起:
So, putting 1 and 2 together:
- 您可以从诸如
Describe Weather
之类的查询中看到列名. - 用作数据源的文件似乎在第一行具有列名.这些现在构成了配置单元表的第一行.当然,字符串类型的列在处理该数据时没有问题,但是当要求类型int的列处理无法正确转换为字符串的字符串时,它们将显示NULL.
- The column names are fine, as you will see from a query like
Describe Weather
. - The file that you use as datasource, appears to have had column names on the first row. These are now making up the first row of your hive table. Of course the columns of type string have no problem dealing with this data, but columns of type int will show NULL when they are asked to handle strings that cannot be cast to int properly.
建议:
尝试摆脱第一行,最好在创建外部表之前.
Try to get rid of the first row, preferably before creating the external table.
这篇关于Hive查询结果中的NULL列名的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!