将未正确编码的数据的MySQL表转换为UTF-8 [英] Converting MySQL table with incorrectly encoded data to UTF-8

查看:247
本文介绍了将未正确编码的数据的MySQL表转换为UTF-8的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个大的MySQL 5.1数据库,并且由于各种愚蠢的原因,我一直存储,UTF8表中编码为LATIN1的UTF8字符。真奇怪。



MySQL - 将UTF8表上的latin1字符转换为UTF8 问题似乎工作 - 一次一列。但我有24表和几十列转换。我真的在寻找一个能够立即转换至少一个表的解决方案。



作为参考,适用于我的单列解决方案是:

  UPDATE foo SET col1 = CONVERT(CAST(CONVERT(col1 USING latin1)AS binary)USING utf8); 

对于表格,我可以:

  ALTER TABLE foo转换为字符集latin1; 
ALTER TABLE foo CONVERT TO CHARACTER SET binary;
ALTER TABLE foo CHARACTER SET utf8 COLLATE utf8_unicode_ci;

这使我非常接近 - 但是, CONVERT TO CHARACTER SET binary 步骤将我的所有VARCHAR列转换为VARBINARY,将我的TEXT列转换为BLOB。我可以通过,改变他们回来,所有看起来都很好...但是我回到我们修改所有列的单独世界 - 在这种情况下,我也可以



我已经在这些SQL语句上尝试了大约50个变体,但是我找不到一个会将我的列留在字符数据类型并正确编码数据的变体。



有任何建议吗?



更新:决定只修正资料栏, table-wise solution,我想出了:

 #!/ usr / bin / env ruby 
require'rubygems'
require'mysql2'

CONNECT_OPTS = {}#任何你想要的东西
Mysql2 :: Client.default_query_options.merge!(:as => ;:array)
conn = Mysql2 :: Client.new(CONNECT_OPTS)

tables = conn.query(SHOW TABLES)。 row [0]}

#参见http://dev.mysql.com/doc/refman/5.0/en/charset-column.html
#可能想包括枚举和设置列;我没有他们
TYPES_TO_CONVERT =%w(char varchar text)
tables.each do | table |
putsconversion#{table}
#获取所有列,我们将筛选出需要的列
columns = conn.query(DESCRIBE#{table})
columns_to_convert = columns.find_all {| row |
TYPES_TO_CONVERT.include? row [1] .gsub(/ \(\d + \)/,'')
} .map {| row | row [0]}
next if columns_to_convert.empty?

query =UPDATE`#{table}`SET
query + = columns_to_convert.map {| col |
`#{col}`= convert(cast(convert(`#{col}`using latin1)as binary)using utf8)
} .join,
puts查询
conn.query query
end

... 。有趣的是,这在我的数据库运行在36秒,而不是ALTER TABLE路由花了13分钟(和VARBINARY问题)或mysqldump解决方案,将花费二十多,假设我可以让他们运行。

如果有人知道一个优雅的方式来为一个整个数据库或表执行此操作,我仍然会接受一个答案。

解决方案

这个方法看起来真的很有前途,更好,但在它的简单美丽。这个想法是你mysqldump整个数据库为latin1,&然后将其重新编码为utf-8。



导出


mysqldump -u [user] -p --opt --quote-names --skip-set-charset --default-character-set = latin1 数据库] > dump.sql


导入


mysql -u [user] -p --default-character-set = utf8 [database] dump.sql


我对这个解决方案不感冒,它完全来自于 Gareth Price的博客。它已经为每个谁给他留下了一个评论迄今为止:哇,你刚刚拯救了我的生活,我没有花2小时,但2天引起了我的注意。 >

更新#1:看起来像Gareth 不是第一次发现这一点。



更新#2: 我刚刚试过这个&它为我的UTF8-stored-as-latin1数据库工作得很漂亮。只需确保在导入数据库之前将数据库上的默认字符集切换为utf8 ,否则,最终会出现特殊字符所在的纯问号。当然,这可能有很多其他的影响,所以测试像地狱首先。


ALTER SCHEMA [database] CHARACTER SET utf8;


如果您有任何没有设置为架构默认值的表:


ALTER TABLE [table] CHARACTER SET = DEFAULT;


(同样的想法,如果您有任何列专用的字符集设置,您将不得不做一个ALTER TABLE [table] CHANGE COLUMN [settings]

I've got a big ol' MySQL 5.1 database, and for a variety of stupid reasons, I've been storing, I believe, UTF8 characters encoded as LATIN1 in a UTF8 table. It's... strange. And I'd like to fix it.

The MySQL - Convert latin1 characters on a UTF8 table into UTF8 question seems to work -- a column at a time. But I have 24 tables and dozens of columns to convert. I'm really looking for a solution that'll convert at least a table at once.

For reference, the single-column solution that works for me is:

UPDATE foo SET col1 = CONVERT(CAST(CONVERT(col1 USING latin1) AS binary) USING utf8);

For tables, I can do:

ALTER TABLE foo CONVERT TO CHARACTER SET latin1;
ALTER TABLE foo CONVERT TO CHARACTER SET binary;
ALTER TABLE foo CHARACTER SET utf8  COLLATE utf8_unicode_ci;

which gets me very close -- however, the CONVERT TO CHARACTER SET binary step turns all my VARCHAR columns into VARBINARY and my TEXT columns into BLOBs in one fell swoop. I can go through and change them back and all appears to be well... but then I'm back in the "let's modify all the columns individually" world -- in which case, I may just as well

I've tried about 50 variations on those SQL statements, but I can't find one that both leaves my columns in character data types and encodes the data properly.

Any suggestions?

Update: Deciding to just fix the columns rather than waiting for a database- or table-wise solution, I came up with:

#!/usr/bin/env ruby
require 'rubygems'
require 'mysql2'

CONNECT_OPTS = {} # whatever you want
Mysql2::Client.default_query_options.merge!(:as => :array)
conn = Mysql2::Client.new(CONNECT_OPTS)

tables = conn.query("SHOW TABLES").map {|row| row[0] }

# See http://dev.mysql.com/doc/refman/5.0/en/charset-column.html
# One might want to include enum and set columns; I don't have them
TYPES_TO_CONVERT = %w(char varchar text)
tables.each do |table|
  puts "converting #{table}"
  # Get all the columns and we'll filter for the ones we want
  columns = conn.query("DESCRIBE #{table}")
  columns_to_convert = columns.find_all {|row|
    TYPES_TO_CONVERT.include? row[1].gsub(/\(\d+\)/, '')
  }.map {|row| row[0]}
  next if columns_to_convert.empty?

  query = "UPDATE `#{table}` SET "
  query += columns_to_convert.map {|col|
    "`#{col}` = convert(cast(convert(`#{col}` using latin1) as binary) using utf8)"
  }.join ", "
  puts query
  conn.query query
end

... which gets the job done. Amusingly, this runs on my database in 36 seconds, rather than the ALTER TABLE route which took 13 minutes (and had the VARBINARY problem) or the mysqldump solutions which would take upwards of twenty assuming I could get them to run.

I'll still accept an answer if someone knows an elegant way to do this for a whole database or table in one step.

解决方案

This method below looks really promising & better yet, beautiful in its simplicity. The idea is you mysqldump your entire database as latin1, & then import it re-encoded as utf-8.

Export:

mysqldump -u [user] -p --opt --quote-names --skip-set-charset --default-character-set=latin1 [database] > dump.sql

Import:

mysql -u [user] -p --default-character-set=utf8 [database] < dump.sql

I take no credit for this solution, it's completely from Gareth Price's blog. It has worked for everyone who has left him a comment so far: "Wow man you just saved my life. I did not spent 2 hours on this, but 2 days" caught my attention.

Update #1: Looks like Gareth wasn't the first to discover this.

Update #2: I just tried this & it worked beautifully for my UTF8-stored-as-latin1 database. Just make sure you switch the default charset on your database to utf8 before importing, or else you'll end up with plain question marks where the special characters were. Of course this might have plenty of other ramifications so test like hell first.

ALTER SCHEMA [database] DEFAULT CHARACTER SET utf8;

And if you have any tables that aren't set to the schema default:

ALTER TABLE [table] CHARACTER SET = DEFAULT;

(same idea if you have any column-specific charset settings, you'll have to do a ALTER TABLE [table] CHANGE COLUMN [settings] without specifying CHARACTER SET so it goes back to the table default)

这篇关于将未正确编码的数据的MySQL表转换为UTF-8的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆