将文本导入MySQL:奇怪的格式 [英] Importing text to MySQL: strange format
问题描述
我正在使用mysqlimport将.txt文件中的某些数据导入MySQL数据库表中.似乎可以导入OK(没有错误消息),但是显示时看起来很奇怪,并且无法按预期进行搜索.
I'm importing some data from a .txt file into a MySQL database table, using mysqlimport. It seems to import OK (no error messages) but looks very odd when displayed, and can't be searched as expected.
这是详细信息.原始文本文件保存在UTF-8中,其记录看起来(在文本编辑器中)是这样的.第二个字段包括换行符:
Here are the details. The original text file is saved in UTF-8, with records that look (in a text editor) like this. The second field includes line breaks:
WAR-16,52 ~~~~~ Lorem ipsum dolor sit.
Lorem ipsum dolor sit.
~~~~~ ENDOFRECORD
WAR-16,53~~~~~Lorem ipsum dolor sit.
Lorem ipsum dolor sit.
Lorem ipsum dolor sit.
Lorem ipsum dolor sit.
~~~~~ ENDOFRECORD
我要导入的数据库表非常简单:
The database table into which I am importing is very simple:
+-------+---------------+------+-----+---------+-------+
| Field | Type | Null | Key | Default | Extra |
+-------+---------------+------+-----+---------+-------+
| id | varchar(100) | YES | | NULL | |
| text | varchar(5000) | YES | | NULL | |
+-------+---------------+------+-----+---------+-------+
2 rows in set (0.00 sec)
当我导入文件时,这是我使用的命令:
When I import the file, here's the command I use:
$ mysqlimport -u root db textfile.txt --fields-terminated-by="~~~~~" --lines-terminated-by="ENDOFTHELINE" --default-character-set='utf8'
db.records_list: Records: 18778 Deleted: 0 Skipped: 0 Warnings: 18787
这就是我随后要求MySQL显示记录的情况:
Here's what I see if I then ask MySQL to display the records:
mysql> select * from textfile;
|
W A R - 1 6 , 5 2 | L o r e m i p s u m d o l o r s i t .
L o r e m i p s u m d o l o r s i t .
(etc)
因此,看起来好像在文本中添加了空格或一些奇怪的编码附加功能.
So, it looks as though spaces, or some strange encoding extras, are being added to the text.
这是数据库查询的问题:
And here's the problem with the database query:
mysql> select * from textfile where id like "%WAR%";
什么也不返回;也不添加空格:
returns nothing; nor does adding spaces:
mysql> select * from textfile where id like "%W A R%";
仅此命令返回任何内容
mysql> select * from textfile where id like "%W%";
有人能猜出会发生什么吗?我觉得这一定是编码问题,但我无法解决.
Can anyone guess what might be happening? I feel like it must be an encoding problem, but I can't work it out.
------更新--------
------ UPDATE --------
好的,我已经检查了数据库和连接编码.
OK, I've checked the database and connection encoding.
mysql> show variables like "character_set_%";
+--------------------------+----------------------------------------+
| Variable_name | Value |
+--------------------------+----------------------------------------+
| character_set_client | latin1 |
| character_set_connection | latin1 |
| character_set_database | latin1 |
| character_set_filesystem | binary |
| character_set_results | latin1 |
| character_set_server | latin1 |
| character_set_system | utf8 |
| character_sets_dir | /usr/local/mysql/share/mysql/charsets/ |
+--------------------------+----------------------------------------+
8 rows in set (0.01 sec)
show table status
说表是latin1_swedish_ci
.
我已经在西方(Windows Latin 1)"中重新保存了文本文件(使用Snow Leopard上的TextEdit),并尝试使用与上述相同的命令将其导入.但是我仍然有相同的编码问题.
I have re-saved the text file in "Western (Windows Latin 1)" (using TextEdit on Snow Leopard) and tried to import it using the same command as above. However I still have the same encoding problem.
我也尝试过,再次没有运气:
I also tried, again with no luck:
- 使用UTF-8创建新表并导入现有文件
- 复制和将文本粘贴到我之前已经导入好的另一个文本文件中,然后尝试导入.
还是完全困惑:((((
Still totally baffled :(((
推荐答案
如您重复的问题所述,请确保您的表和连接都使用UTF-8.这是此问题的常见原因之一.
As noted in your duplicate question, make sure that your table and connection are both using UTF-8. That is one of the common sources of this issue.
http://dev.mysql.com/doc/refman/5.0/en/charset-unicode.html
这篇关于将文本导入MySQL:奇怪的格式的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!