当针对保存在S3中的csv创建Hive表时,我是否绝对必须对csv中的行按逗号分隔值的顺序对字段进行排序? [英] when creating Hive table against csv saved in S3, do I absolutely have to order fields in the order of comma separated values for rows in csv?
问题描述
在针对S3中保存的csv创建Hive表时,我是否绝对必须对csv中的行按逗号分隔值的顺序对字段进行排序? csv将第一行作为标头.我知道csv是基于行的而不是列式的,但是我想知道是否有一种方法可以将标头的值与配置单元表的字段名称匹配,并以不同的顺序对列进行排序.
when creating Hive table against csv saved in S3, do I absolutely have to order fields in the order of comma separated values for rows in csv? the csv has the first row as header. I understand that csv is row based not columnar, but was wondering if there is a way to match the value of the header with the field name of the hive table and order columns differently.
推荐答案
是的,表定义(DDL)中的列应与基础csv文件中的列顺序相同.您可以使用tblproperties("skip.header.line.count"="1")
跳过标题的选择.
Yes, columns in the table definition (DDL) should be in the same order as in the underlying csv files. You can skip header from being selected using tblproperties("skip.header.line.count"="1")
.
另请参见创建表"手册.
这篇关于当针对保存在S3中的csv创建Hive表时,我是否绝对必须对csv中的行按逗号分隔值的顺序对字段进行排序?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!