如何处理数据以避免MySQL“不正确的字符串值”错误? [英] How can I process data to avoid MySQL "incorrect string value" error?
问题描述
我试图使用Rake任务将一些旧数据从MS Access迁移到MySQL。我在Windows XP上使用Ruby 1.8.6。
I am trying to use a Rake task to migrate some legacy data from MS Access to MySQL. I'm working on Windows XP, using Ruby 1.8.6.
我在中设置了Rails的编码为utf8database.yml
I have the encoding for Rails set as "utf8" in database.yml
.
此外,MySQL的默认字符集是utf8。
Also, the default character set for MySQL is utf8.
99 %的数据是正常的,但每一次,我会得到一个列值,给我一个错误,像这样:
99% of the data is coming in fine, but every now and then I'll get a column value that gives me a error something like this:
Mysql::Error: Incorrect string value: '\x92 Comm...' for column 'name'
at row 1:
INSERT INTO `organizations` ( [...] )
VALUES('Lawyers’ Committee', [...] )
似乎给MySQL的麻烦的事情是紧接在律师一词之后的撇号。
It looks as though the thing that's giving MySQL trouble is the apostrophe immediately after the "s" in the word "Lawyers".
这是另一个...
Mysql::Error: Incorrect string value: '\x99 aoc' for column 'department'
at row 1:
INSERT INTO `addresses`
[...]
'TRInfo™ aoc'
[....]
看起来它在TRInfo之后的TM上窒息。
Looks like it's choking on the "TM" after "TRInfo".
有没有Ruby或Rails方法我可以运行数据通过从它清除任何字符MySQL将窒息?
Is there any Ruby or Rails method that I can run the data through to cleanse from it any characters that MySQL will choke on?
理想情况下,它将是伟大的更换可口的字符替换撇号与单引号和TM符号与字符串(TM)。
Ideally, it would be great to replace them with more palatable characters -- replace the apostrophe with a single quote and the TM symbol with the string "(TM)".
或者,如果我能以某种方式配置MySQL存储这些字符没有错误
Or, if I could somehow configure MySQL to store those characters as-is without errors that would be great too.
推荐答案
看起来你的输入数据不是utf-8。
It looks like your input data is not in utf-8.
我做了一个小调查,在Lawyer's中使用的样式报价在Windows-1252编码中编码为\x92,但对UTF-8我解码它并将其编码为utf8,我有\xe2 \x80 \x99)。
I did a little investigating and the styled quote used in Lawyer's is encoded as \x92 in the Windows-1252 encoding, but would be nonsense for utf-8 (when I decoded it and encoded it into utf8, I got \xe2\x80\x99).
因此,您将需要转换输入字符串从窗口-1252到utf-8(或到unicode)。
Thus you will need to convert the input strings from windows-1252 to utf-8 (or to unicode).
这篇关于如何处理数据以避免MySQL“不正确的字符串值”错误?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!