替换mysql中的垃圾字符 [英] replace garbage characters within mysql

查看:158
本文介绍了替换mysql中的垃圾字符的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我的数据库位于 latin1 中,并且拥有â& quot; & quot;'(取决于我的终端是否分别设置为latin1或unicode)。从上下文,我认为他们应该是emdash。它们似乎是在IE中渲染(或不渲染)时导致错误的错误。我想找到并替换它们。问题是,â和 字符都不匹配 replace 。运行查询:

  update TABLE set COLUMN = replace(COLUMN,'  & quot;','--- '); 

执行没有错误,但不执行任何操作我明白,当我在终端中复制钻石中的问号字符时,并不匹配。有没有办法找出它的代码,并匹配它或那些东西? mysql 控制台非常接近能够在一行中做到这一点,所以我宁愿不在终端外面脚本,如果我可以避免它。



数据库托管在Amazon RDS上,因此我无法安装我在其他问题中引用的regexp udf。从长远来看,我必须正确地将整个数据库转换为utf8,但我需要马上解决这个渲染问题。



编辑:



我用 hexdump 隔离了坏字符,它是e2 80(我不认为这对应任何unicode字符)。

  update TABLE set COLUMN = replace(COLUMN,char(0xe2,0x80), '---'); 

不执行任何操作。

解决方案

我想出来了。我使用mysql的内置 hex 函数来转储我知道是坏的条目。

 从表中选择hex(列),其中id = 666; 

然后挑出这些词(夹在20之间的那些数字)的字节实际上是 x'C3A2E282AC2671756F743B'。如何对应于我看到它在PHP和我的系统(如 e2 80 )编码的方式我不知道,在这一点上,我真的不在乎。



要在销毁数据之前验证,请将其插回到mysql:

  select x'C3A2E282AC2671756F743B'; 
+ --------------------------- +
| x'C3A2E282AC2671756F743B'|
+ --------------------------- +
| †|
+ --------------------------- +
集合中的1行(0.00秒)

所以,使用上面的替换查询,我能够一次摆脱所有的坏数据。 >

对于记录,它是:

  update TABLE set COLUMN = replace ,x'C3A2E282AC2671756F743B',' - '); 

我真的希望这对某人有用。虽然编码snafus似乎在mysql很常见,我搜索到处,我找不到这个最终相当简单的过程的解释。


My db is in latin1 and is full of â" or '��"' (depending on whether my terminal is set to latin1 or unicode, respectively). From context, I think they should be emdashes. They appear to be causing nasty bugs when rendered (or not rendered) in IE. I'd like to find and replace them. The problem is that neither the â nor � character match with replace. Running the query:

    update TABLE set COLUMN = replace(COLUMN,'��"','---');

Executes without error but doesn't do anything (0 rows changed). It's clear to me that the "question mark in the diamond" character is not being matched when I copy it in the terminal. Is there a way to find out its code and match it by that or something? The mysql console is tantalizingly close to being able to do this in one line so I'd rather not script it outside the terminal if I can avoid it.

The db is hosted Amazon RDS so I can't install the regexp udf that I've seen referenced in other questions here. In the long term, I'm going to have to properly convert the whole db to utf8 but I need to fix this rendering problem right away.

EDIT:

I've isolated the bad character with hexdump, it's e2 80 (I don't think this corresponds to any unicode character). How can I feed that to the replace function?

    update TABLE set COLUMN = replace(COLUMN, char(0xe2,0x80),'---');

does not do anything.

解决方案

I figured it out. I used mysql's builtin hex function to dump an entry that I knew was bad.

    select hex(column) from table where id=666;

Then picked out the words (those numbers sandwiched between "20"s) and discovered that my offending set of bytes was in fact x'C3A2E282AC2671756F743B'. How this corresponds to the way I saw it encoded in PHP and by my system (as e2 80) I don't know and at this point, I don't really care.

To verify, before destroying the data, you plug that back in to mysql:

    select x'C3A2E282AC2671756F743B';
    +---------------------------+
    | x'C3A2E282AC2671756F743B' |
    +---------------------------+
    | â€"               |
    +---------------------------+
    1 row in set (0.00 sec)

So, using the replace query like above, I was able to get rid of all the bad data at once.

For the record it was:

    update TABLE set COLUMN = replace(COLUMN, x'C3A2E282AC2671756F743B','--');

I really hope this is useful for someone. Though encoding snafus appear to be pretty common in mysql, I searched everywhere and I couldn't find an explanation for this ultimately rather simple process.

这篇关于替换mysql中的垃圾字符的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆