如何在MySQL中检查二进制字符串是否为UTF-8? [英] How can I check if a binary string is UTF-8 in mysql?

查看:64
本文介绍了如何在MySQL中检查二进制字符串是否为UTF-8?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我发现了一个Perl正则表达式,可以检查字符串是否为UTF-8(该正则表达式来自

I've found a Perl regexp that can check if a string is UTF-8 (the regexp is from w3c site).

$field =~
  m/\A(
     [\x09\x0A\x0D\x20-\x7E]            # ASCII
   | [\xC2-\xDF][\x80-\xBF]             # non-overlong 2-byte
   |  \xE0[\xA0-\xBF][\x80-\xBF]        # excluding overlongs
   | [\xE1-\xEC\xEE\xEF][\x80-\xBF]{2}  # straight 3-byte
   |  \xED[\x80-\x9F][\x80-\xBF]        # excluding surrogates
   |  \xF0[\x90-\xBF][\x80-\xBF]{2}     # planes 1-3
   | [\xF1-\xF3][\x80-\xBF]{3}          # planes 4-15
   |  \xF4[\x80-\x8F][\x80-\xBF]{2}     # plane 16
  )*\z/x;

但是我不确定如何将其移植到MySQL,因为MySQL不支持字符的十六进制表示,请参见

But I'm not sure how to port it to MySQL as it seems that MySQL don't support hex representation of characters see this question.

任何人都想如何将regexp移植到MySQL? 或者,也许您知道其他任何方法来检查字符串是否为有效的UTF-8?

Any thoughts how to port the regexp to MySQL? Or maybe you know any other way to check if the string is valid UTF-8?

更新: 我需要在MySQL上进行此检查,因为我需要在服务器上运行它以更正损坏的表.我无法通过脚本传递数据,因为数据库大约为1TB.

UPDATE: I need this check working on the MySQL as I need to run it on the server to correct broken tables. I can't pass the data through a script as the database is around 1TB.

推荐答案

我设法使用一种测试来修复数据库,该测试仅在您的数据可以使用一字节编码表示的情况下才有效,在我的情况下是latin1

I've managed to repair my database using a test that works only if your data can be represented using a one-byte encoding in my case it was a latin1.

我已经使用了mysql将不是utf-8的字节更改为'?'的事实.转换为latin1时.

I've used the fact that mysql changes the bytes that aren't utf-8 to '?' when converting to latin1.

这是支票的样子:

SELECT (
         CONVERT(
           CONVERT(
              potentially_broken_column 
           USING latin1) 
         USING utf8))
       != 
       potentially_broken_column) AS INVALID ....

这篇关于如何在MySQL中检查二进制字符串是否为UTF-8?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆