“错误:最后期望列之后的额外数据”当使用PostgreSQL COPY [英] "ERROR: extra data after last expected column" when using PostgreSQL COPY

查看:4580
本文介绍了“错误:最后期望列之后的额外数据”当使用PostgreSQL COPY的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

请耐心等待,因为这是我的第一篇文章。



我试图运行 COPY 命令在PostgreSQL-9.2中添加一个从.txt文件中的制表符分隔表到PostgreSQL数据库,例如:

  COPY raw_data FROM'/home/Projects/TestData/raw_data.txt'WITH(DELIMITER''); 

我已经使用SQL命令在数据库中创建了一个名为raw_data的空表: / p>

  CREATE TABLE raw_data(); 

尝试运行 COPY时,我会收到以下错误消息。 code>命令:

 错误:最后一个预期列后的额外数据
CONTEXT:COPY raw_data,第1行:1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 ...

(这里的数字应该是列标题)



m不知道是不是因为我在创建数据库表时没有指定表列,但我试图避免手动输入800或列。



任何关于如何解决这个问题的建议?



这是一个.txt文件的例子:

  1 2 3 4 5 6 7 8 9 
binary1 1 1 0 1 1 1 1 1 1
binary2 1 0 0 1 0 1 1 0 0
binary3 1 0 1 1 1 0 0 1 0
binary4 1 1 1 1 0 1 0 1 0


  创建表raw_data ();   

您需要与导入数据结构匹配的表。像

  CREATE TABLE raw_data(
col1 int
,col2 int
...
);

您不需要声明选项卡作为 DELIMITER ,因为它是默认值:

  COPY raw_data FROM' /home/Projects/TestData/raw_data.txt'; 

800列你说?在10个案例中的9个案例中,这将表明您的设计有问题。通常你不会有那么多列。无论如何,肯定有办法半自动创建 CREATE TABLE 脚本。



自动化



假设简化原始数据

  1 2 3 4  - 第一行是指作为列名
1 1 0 1 - 标签分隔
1 0 0 1
1 0 1 1

定义不同的 DELIMITER (一个不会在导入数据中出现的),并导入临时临时表使用单个文本列:

  CREATE TEMP TABLE tmp_data(raw text ); 

COPY tmp_data FROM'/home/Projects/TestData/raw_data.txt'WITH(DELIMITER'§');

此查询创建CREATE TABLE脚本:

  SELECT'CREATE TABLE tbl(col'|| replace(raw,E'\t','bool,col')||'bool)'
FROM tmp_data
WHERE raw ~~ E'1\t2\t3\t4%'; - 标识行名的标准

返回:

  CREATE TABLE tbl(col1 bool,col2 bool,col3 bool,col4 bool)

执行(验证有效性后)。

然后 INSERT 该查询的数据:

  INSERT INTO tbl 
SELECT(('('|| replace)(替换(替换(
raw
, 'b',''')$ t $ b,'0','f')
,E'\t' )*
FROM tmp_data
WHERE raw!~~ E'1\t2\t3\t4%'; - 排除排名的条件

或者,使用 translate()

  INSERT INTO tbl 
SELECT(('('|| translate(raw,E'10 \t','tf,')||')'):: tbl)*
FROM tmp_data
WHERE raw!~~ E'1\t2\t3\t4% ';

将字符串转换为兼容格式,将其转换为新创建的表类型,然后将其爆炸与(行)。*



全部完成。



你可以将所有这些都放在一个plpgsql函数中,但是你需要防止SQL注入。我在这里发布了一些类似的解决方案。尝试搜索。

我必须离开四个你要解决的事情。



- > SQLfiddle演示


Please bear with me as this is my first post.

I'm trying to run the COPY command in PostgreSQL-9.2 to add a tab delimited table from a .txt file to a PostgreSQL database such as:

COPY raw_data FROM '/home/Projects/TestData/raw_data.txt' WITH (DELIMITER ' ');

I've already created an empty table called "raw_data" in the database using the SQL command:

CREATE TABLE raw_data ();

I keep getting the following error message when trying to run the COPY command:

ERROR:  extra data after last expected column
CONTEXT:  COPY raw_data, line 1: "  1   2   3   4   5   6   7   8   9   10  11  12  13  14  15  16  17  18  19  20  21  22  23  24  25  26  27  28  29  30  31  32  33  34  35  36  ..."

(The numbers here are supposed to be the column headings)

I'm not sure if its because I didn't specify table columns when creating the db table but I'm trying to avoid having to manually enter in 800 or columns.

Any suggestions on how to fix this?

Here's an example of what the .txt file looks like:

        1   2   3   4   5   6   7   8   9
binary1 1   1   0   1   1   1   1   1   1
binary2 1   0   0   1   0   1   1   0   0
binary3 1   0   1   1   1   0   0   1   0
binary4 1   1   1   1   0   1   0   1   0

解决方案

An empty table won't do.

CREATE TABLE raw_data ();

You need table that matches the the structure of the import data. Something like

CREATE TABLE raw_data (
  col1 int
 ,col2 int
 ...
 );

You don't need to declare tab as DELIMITER, since it is the default:

COPY raw_data FROM '/home/Projects/TestData/raw_data.txt';

800 columns you say? In 9 out of 10 cases this would indicate a problem with your design. Normally you wouldn't have that many columns. Anyway, there certainly are ways to half-automate the creation of the CREATE TABLE script.

Automation

Assuming simplified raw data

1   2   3   4  -- first row is meant as "column name"
1   1   0   1  -- tab separated
1   0   0   1
1   0   1   1

Define a different DELIMITER (one that does not occur in the import data at all), and import to a temporary staging table with a single text column:

CREATE TEMP TABLE tmp_data (raw text);

COPY tmp_data FROM '/home/Projects/TestData/raw_data.txt' WITH (DELIMITER '§');

This query creates the CREATE TABLE script:

SELECT 'CREATE TABLE tbl (col' || replace (raw, E'\t',' bool, col') || ' bool)'
FROM   tmp_data
WHERE  raw ~~ E'1\t2\t3\t4%';   -- criteria to identify row with col names

Returns:

CREATE TABLE tbl (col1 bool, col2 bool, col3 bool, col4 bool)

Execute that (after verifying validity).
Then INSERT the data with this query:

INSERT INTO tbl
SELECT (('(' || replace(replace(replace(
                  raw
                , '1',   't')
                , '0',   'f')
                , E'\t', ',')
             || ')')::tbl).*
FROM   tmp_data
WHERE  raw !~~ E'1\t2\t3\t4%';  -- criteria to exclude row with col names

Or, shorter / faster with translate():

INSERT INTO tbl
SELECT (('(' || translate(raw, E'10\t', 'tf,') || ')')::tbl).*
FROM   tmp_data
WHERE  raw !~~ E'1\t2\t3\t4%';

This converts the string into a compatible format to cast it to the newly created table type and then explodes it with (row).*.

All done.

You could put all of that into a plpgsql function, but you'd need to safeguard against SQL injection. I've posted a number of similar solutions here on SO. Try a search.
I'll have to leave something four you to solve ...

->SQLfiddle demo

这篇关于“错误:最后期望列之后的额外数据”当使用PostgreSQL COPY的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆