从CSV文件加载数据,其中doublequote用作转义字符 [英] LOAD DATA from CSV file where doublequote was used as the escape character

查看:1083
本文介绍了从CSV文件加载数据,其中doublequote用作转义字符的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一堆CSV数据,我需要加载到MySQL数据库。好吧,CSV-ish,也许。 (编辑实际上,它看起来像RFC 4180中描述的东西



每一行都是逗号分隔的双引号字符串列表。要转义出现在列值中的任何双引号,请使用双精度双引号。反斜杠可以代表自己。



例如,行:

 ,\wave\,hello,牧师说,什么是恐吓报价 

如果解析成JSON应为:

  [,\\wave\\,\hello,\牧师,什么是\scare-quotes \\good for?,I'm reading \Bossypants\] 

我试图使用 LOAD DATA 来读取CSV,但我遇到了一些奇怪的行为。



例如,假设我有一个简单的两列表:

  shell%mysql exampledb -edescribe person
+ ------- + ----------- + ------ + ----- + - -------- + ------- +
|字段|类型| Null |键|默认|额外|
+ ------- + --- -------- + ------ + ----- + --------- + ------- +
| ID | int(11) )| YES | | NULL | |
| UID | char(255)| YES | | NULL | |
+ ------- + ----------- + ------ + ----- + --------- + - ----- +
shell%

如果我的第一个非标题行输入文件结束于

  shell%cat temp-1 .csv 
ID,UID
9,
0,Steve the Pirate
1,\Alpha
2,HobanWashWashburne
3,Pastor Veal
4,Tucker
10 b $ b5,Simon
6,Sonny
7,Wat\

我可以加载每个非标题行,但第一个:

  mysql>删除从人; 
查询OK,0行受影响(0.00秒)

mysql> LOAD DATA
LOCAL INFILE'temp-1.csv'
INTO TABLE person
FIELDS
TERMINATED BY','
ENCLOSED BY''
ESCAPED BY''
LINES
终止于'\\\
'
IGNORE 1 LINES
;
Query OK,9 rows affected(0.00 sec)
Records:9 Deleted:0 Skipped:0 Warnings:0

mysql> SELECT * FROM person;
+ ------ + ------------------------ +
| ID | UID |
+ ------ + ------------------------ +
| 0 |史蒂夫海盗|
| 10 | |
| 1 | \Alpha |
| 2 | HobanWashWashburne |
| 3 |牧师小牛|
| 4 | Tucker |
| 5 |西蒙|
| 6 | Sonny |
| 7 | Wat\ |
+ ------ + ------------------------ +
集合中的9行(0.00秒)

或者我可以加载包括标题的所有行:

  mysql>删除从人; 
查询OK,9行受影响(0.00秒)

mysql> LOAD DATA
LOCAL INFILE'temp-1.csv'
INTO TABLE person
FIELDS
TERMINATED BY','
ENCLOSED BY''
ESCAPED BY''
LINES
终止于'\\\
'
IGNORE 0 LINES
;
查询OK,11行受影响,1警告(0.01秒)
记录:11删除:0跳过:0警告:1

mysql>显示警告;
+ --------- + ------ + ---------------------------- ---------------------------- +
|级别|代码|消息|
+ --------- + ------ + ---------------------------- ---------------------------- +
|警告| 1366 |整数值不正确:第1行的ID列为ID
+ --------- + ------ + ---------------------------- ---------------------------- +
集合中的1行(0.00秒)

mysql> ; SELECT * FROM person;
+ ------ + ------------------------ +
| ID | UID |
+ ------ + ------------------------ +
| 0 | UID |
| 9 | |
| 0 |史蒂夫海盗|
| 10 | |
| 1 | \Alpha |
| 2 | HobanWashWashburne |
| 3 |牧师小牛|
| 4 | Tucker |
| 5 |西蒙|
| 6 | Sonny |
| 7 | Wat\ |
+ ------ + ------------------------ +
集合中的11行(0.00秒)

如果我的输入文件的任何行都不在

  shell%cat temp-2.csv 
ID,UID
0,Steve the Pirate
1,\Alpha
2,HobanWashWashburne
3,Pastor Veal
4,Tucker
5,Simon
6,Sonny
7,Wat\\ b

那么我可以不加载行:

  mysql>删除从人; 
查询OK,11行受影响(0.00秒)

mysql> LOAD DATA
LOCAL INFILE'temp-2.csv'
INTO TABLE person
FIELDS
TERMINATED BY','
ENCLOSED BY''
ESCAPED BY''
LINES
终止于'\\\
'
IGNORE 1 LINES
;
查询OK,0行受影响(0.00秒)
记录:0已删除:0已跳过:0警告:0

mysql> SELECT * FROM person;
空集(0.00秒)

或者我可以加载包括标题的所有行:

  mysql>删除从人; 
查询OK,0行受影响(0.00秒)

mysql> LOAD DATA
LOCAL INFILE'temp-2.csv'
INTO TABLE person
FIELDS
TERMINATED BY','
ENCLOSED BY''
ESCAPED BY''
LINES
终止于'\\\
'
IGNORE 0 LINES
;
Query OK,9 rows affected,1 warning(0.03 sec)
Records:9 Deleted:0 Skipped:0 Warnings:1

mysql>显示警告;
+ --------- + ------ + ---------------------------- ---------------------------- +
|级别|代码|消息|
+ --------- + ------ + ---------------------------- ---------------------------- +
|警告| 1366 |整数值不正确:第1行的ID列为ID
+ --------- + ------ + ---------------------------- ---------------------------- +
集合中的1行(0.00秒)

mysql> ; SELECT * FROM person;
+ ------ + ------------------------ +
| ID | UID |
+ ------ + ------------------------ +
| 0 | UID |
| 0 |史蒂夫海盗|
| 1 | \Alpha |
| 2 | HobanWashWashburne |
| 3 |牧师小牛|
| 4 | Tucker |
| 5 |西蒙|
| 6 | Sonny |
| 7 | Wat\ |
+ ------ + ------------------------ +
集合中的9行(0.00秒)

现在我发现了很多办法做错了,我如何使用<$ c $

解决方案

根据

a href =http://dev.mysql.com/doc/refman/5.0/en/load-data.html =noreferrer> LOAD DATA ,将双重双引号视为双引号是默认值:


如果字段以ENCLOSED BY字符,该字符的实例只有在后跟字段或行TERMINATED BY序列时才被识别为终止字段值。为了避免歧义,字段值中的ENCLOSED BY字符的出现可以加倍,并且被解释为字符的单个实例。例如,如果指定ENCLOSED BY'',引号将按如下所示处理:

 BIG boss - >BIG老板
BIG老板 - >BIG老板
BIG老板 - >BIG b $ b


所以我需要做的是禁用解释 c>



<$ p>使用 $ p> LOAD DATA
LOCAL INFILE'temp-1.csv'
INTO TABLE person
FIELDS
TERMINATED BY','
BY''
ESCAPED BY''
LINES
TERMINATED BY'\\\
'
IGNORE 1 LINES
;


I have a bunch of CSV data that I need to load into a MySQL database. Well, CSV-ish, perhaps. (edit: actually, it looks like the stuff described in RFC 4180)

Each row is a list of comma-separated doublequoted strings. To escape any doublequotes that appear within a column value, double doublequotes are used. Backslashes are allowed to represent themselves.

For example, the line:

"", "\wave\", ""hello,"" said the vicar", "what are ""scare-quotes"" good for?", "I'm reading ""Bossypants"""

if parsed into JSON should be:

[ "", "\\wave\\", "\"hello,\" said the vicar", "what are \"scare-quotes\" good for?", "I'm reading \"Bossypants\"" ]

I'm trying to use the LOAD DATA to read the CSV in, but I'm running into some weird behaviour.


As an example, consider if I have a simple two column table

shell% mysql exampledb -e "describe person"
+-------+-----------+------+-----+---------+-------+
| Field | Type      | Null | Key | Default | Extra |
+-------+-----------+------+-----+---------+-------+
| ID    | int(11)   | YES  |     | NULL    |       |
| UID   | char(255) | YES  |     | NULL    |       |
+-------+-----------+------+-----+---------+-------+
shell%

If the first non-header line of my input file ends on "":

shell% cat temp-1.csv
"ID","UID"
"9",""
"0","Steve the Pirate"
"1","\Alpha"
"2","Hoban ""Wash"" Washburne"
"3","Pastor Veal"
"4","Tucker"
"10",""
"5","Simon"
"6","Sonny"
"7","Wat\"

I can either load every non-header line but the first:

mysql> DELETE FROM person;
Query OK, 0 rows affected (0.00 sec)

mysql> LOAD DATA
          LOCAL INFILE 'temp-1.csv'
          INTO TABLE person
          FIELDS
            TERMINATED BY ','
            ENCLOSED BY '"'
            ESCAPED BY '"'
          LINES
            TERMINATED BY '\n'
          IGNORE 1 LINES
       ;
Query OK, 9 rows affected (0.00 sec)
Records: 9  Deleted: 0  Skipped: 0  Warnings: 0

mysql> SELECT * FROM person;
+------+------------------------+
| ID   | UID                    |
+------+------------------------+
|    0 | Steve the Pirate       |
|   10 |                        |
|    1 | \Alpha                 |
|    2 | Hoban "Wash" Washburne |
|    3 | Pastor Veal            |
|    4 | Tucker                 |
|    5 | Simon                  |
|    6 | Sonny                  |
|    7 | Wat\                   |
+------+------------------------+
9 rows in set (0.00 sec)

Or I can load all lines including the header:

mysql> DELETE FROM person;
Query OK, 9 rows affected (0.00 sec)

mysql> LOAD DATA
          LOCAL INFILE 'temp-1.csv'
          INTO TABLE person
          FIELDS
            TERMINATED BY ','
            ENCLOSED BY '"'
            ESCAPED BY '"'
          LINES
            TERMINATED BY '\n'
          IGNORE 0 LINES
       ;
Query OK, 11 rows affected, 1 warning (0.01 sec)
Records: 11  Deleted: 0  Skipped: 0  Warnings: 1

mysql> show warnings;
+---------+------+--------------------------------------------------------+
| Level   | Code | Message                                                |
+---------+------+--------------------------------------------------------+
| Warning | 1366 | Incorrect integer value: 'ID' for column 'ID' at row 1 |
+---------+------+--------------------------------------------------------+
1 row in set (0.00 sec)

mysql> SELECT * FROM person;
+------+------------------------+
| ID   | UID                    |
+------+------------------------+
|    0 | UID                    |
|    9 |                        |
|    0 | Steve the Pirate       |
|   10 |                        |
|    1 | \Alpha                 |
|    2 | Hoban "Wash" Washburne |
|    3 | Pastor Veal            |
|    4 | Tucker                 |
|    5 | Simon                  |
|    6 | Sonny                  |
|    7 | Wat\                   |
+------+------------------------+
11 rows in set (0.00 sec)

If no lines of my input file end on "":

shell% cat temp-2.csv
"ID","UID"
"0","Steve the Pirate"
"1","\Alpha"
"2","Hoban ""Wash"" Washburne"
"3","Pastor Veal"
"4","Tucker"
"5","Simon"
"6","Sonny"
"7","Wat\"

then I can either load no lines:

mysql> DELETE FROM person;
Query OK, 11 rows affected (0.00 sec)

mysql> LOAD DATA
          LOCAL INFILE 'temp-2.csv'
          INTO TABLE person
          FIELDS
            TERMINATED BY ','
            ENCLOSED BY '"'
            ESCAPED BY '"'
          LINES
            TERMINATED BY '\n'
          IGNORE 1 LINES
       ;
Query OK, 0 rows affected (0.00 sec)
Records: 0  Deleted: 0  Skipped: 0  Warnings: 0

mysql> SELECT * FROM person;
Empty set (0.00 sec)

Or I can load all the lines including the header:

mysql> DELETE FROM person;
Query OK, 0 rows affected (0.00 sec)

mysql> LOAD DATA
          LOCAL INFILE 'temp-2.csv'
          INTO TABLE person
          FIELDS
            TERMINATED BY ','
            ENCLOSED BY '"'
            ESCAPED BY '"'
          LINES
            TERMINATED BY '\n'
          IGNORE 0 LINES
       ;
Query OK, 9 rows affected, 1 warning (0.03 sec)
Records: 9  Deleted: 0  Skipped: 0  Warnings: 1

mysql> show warnings;
+---------+------+--------------------------------------------------------+
| Level   | Code | Message                                                |
+---------+------+--------------------------------------------------------+
| Warning | 1366 | Incorrect integer value: 'ID' for column 'ID' at row 1 |
+---------+------+--------------------------------------------------------+
1 row in set (0.00 sec)

mysql> SELECT * FROM person;
+------+------------------------+
| ID   | UID                    |
+------+------------------------+
|    0 | UID                    |
|    0 | Steve the Pirate       |
|    1 | \Alpha                 |
|    2 | Hoban "Wash" Washburne |
|    3 | Pastor Veal            |
|    4 | Tucker                 |
|    5 | Simon                  |
|    6 | Sonny                  |
|    7 | Wat\                   |
+------+------------------------+
9 rows in set (0.00 sec)

So now that I've discovered many ways to do it wrong, how can I use LOAD DATA to import the data from these files into my database?

解决方案

According to the documentation for LOAD DATA, treating doubled double quotes as a double quote is the default:

If the field begins with the ENCLOSED BY character, instances of that character are recognized as terminating a field value only if followed by the field or line TERMINATED BY sequence. To avoid ambiguity, occurrences of the ENCLOSED BY character within a field value can be doubled and are interpreted as a single instance of the character. For example, if ENCLOSED BY '"' is specified, quotation marks are handled as shown here:

"The ""BIG"" boss"  -> The "BIG" boss
The "BIG" boss      -> The "BIG" boss
The ""BIG"" boss    -> The ""BIG"" boss

So all I need to do is disable interpreting \ as an escape character, by using ESCAPED BY ''.

LOAD DATA
  LOCAL INFILE 'temp-1.csv'
  INTO TABLE person
  FIELDS
    TERMINATED BY ','
    ENCLOSED BY '"'
    ESCAPED BY ''
  LINES
    TERMINATED BY '\n'
  IGNORE 1 LINES
;

这篇关于从CSV文件加载数据,其中doublequote用作转义字符的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆