使用COPY导入时,Redshift添加列 [英] Redshift add column when importing with COPY
问题描述
在Amazon Redshift中我有一个表格,我需要从多个CSV文件加载数据:
In Amazon Redshift I have a table where I need to load data from multiple CSV files:
create table my_table (
id integer,
name varchar(50) NULL
email varchar(50) NULL,
processed_file varchar(256) NULL
);
前三列是指文件中的数据。最后一列 processed_filed
表示从哪个文件导入的记录。
The first three columns refer to data from the files. The last column processed_filed
indicates from which file was the record imported.
我有Amazon S3和I中的文件不要使用 COPY
命令导入它们。类似于:
I have the files in Amazon S3 and I wan't to import them with the COPY
command. Something like:
COPY {table_name} FROM 's3://file-key'
WITH CREDENTIALS 'aws_access_key_id=xxxx;aws_secret_access_key=xxxxx'
DATEFORMAT 'auto' TIMEFORMAT 'auto' MAXERROR 0 ACCEPTINVCHARS '*' DELIMITER '\t' GZIP;
有没有办法填充第四个 processed_file
使用COPY命令自动列,插入文件名。
Is there a way to populate the fourth processed_file
column automatically with the COPY command, to insert the name of the file.
我可以在COPY后执行UPDATE语句,但我处理的是大量数据,理想情况下我想尽可能避免这种情况。
I can do an UPDATE statement after the COPY, but I am dealing with huge amounts of data, so ideally I would like to avoid that if possible.
推荐答案
实际上它是可能的。我正在创建和加载数据,而没有额外的 processed_file_name
列,然后添加具有默认值的列。这是完整的过程
Actually it is possible. I am creating and loading the data without the extra processed_file_name
column and afterwards adding the column with a default value. Here is the full process
create table my_table (
id integer,
name varchar(50) NULL
email varchar(50) NULL,
);
COPY {table_name} FROM 's3://file-key'
WITH CREDENTIALS 'aws_access_key_id=xxxx;aws_secret_access_key=xxxxx'
DATEFORMAT 'auto' TIMEFORMAT 'auto' MAXERROR 0 ACCEPTINVCHARS '*' DELIMITER '\t' GZIP;
ALTER TABLE my_table ADD COLUMN processed_file_name varchar(256) NOT NULL DEFAULT '{file-name}';
这篇关于使用COPY导入时,Redshift添加列的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!