当我们从s3中的csv文件读取数据并在aws athena中创建表时,如何跳过标头。 [英] How to skip headers when we are reading data from a csv file in s3 and creating a table in aws athena.
问题描述
我正在尝试从s3存储桶读取csv数据并在AWS Athena中创建一个表。创建后,我的表无法跳过CSV文件的标题信息。
I am trying to read csv data from s3 bucket and creating a table in AWS Athena. My table when created was unable to skip the header information of my CSV file.
查询示例:
CREATE EXTERNAL TABLE IF NOT EXISTS table_name ( `event_type_id`
string, `customer_id` string, `date` string, `email` string )
ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.OpenCSVSerde'
WITH
SERDEPROPERTIES ( "separatorChar" = "|", "quoteChar" = "\"" )
LOCATION 's3://location/'
TBLPROPERTIES ("skip.header.line.count"="1");
skip.header.line.count似乎不起作用。
但这不起作用。我认为Aws对此存在一些问题。还有其他方法可以解决吗?
skip.header.line.count doesn't seem to work. But this does not work out. I think Aws has some issue with this.Is there any other way that I could get through this?
推荐答案
这是Redshift的工作原理:
This is what works in Redshift:
您要使用表属性('skip.header.line.count'='1')
如果需要,还可以与其他属性一起使用,例如'numRows '='100'
。
这是一个示例:
You want to use table properties ('skip.header.line.count'='1')
Along with other properties if you want, e.g. 'numRows'='100'
.
Here's a sample:
create external table exreddb1.test_table
(ID BIGINT
,NAME VARCHAR
)
row format delimited
fields terminated by ','
stored as textfile
location 's3://mybucket/myfolder/'
table properties ('numRows'='100', 'skip.header.line.count'='1');
这篇关于当我们从s3中的csv文件读取数据并在aws athena中创建表时,如何跳过标头。的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!