mysql替换html特殊字符与UTF等效 [英] mysql substitute html special chars with UTF equivalents

查看:113
本文介绍了mysql替换html特殊字符与UTF等效的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个数据库,其中一些元素由HTML特殊字符组成:

  Universidad Tecnoló gica Nacional  -  UTN | 
| Tecnoló gico de Buenos Aires |
|高等教育学院院长博士。 Joaquí n。Gonzá lez& amp; amp; |
| Escuela Nacional de Ná utica& amp;Manuel Belgrano& amp; amp; amp; |
|国家自然保护协会#250; sica& amp;Carlos L& amp;#243; pez Buchardo& amp; |
|阿根廷计算机学院 - IAC |
|保守主义者法官Mú sica& amp;Manuel de Falla& amp; amp; |

我需要将其转换为正确的UTF格式。 我可以做的更好,而不仅仅是遍历数据库,并从每个代码映射到等效的符号?

 &放大器;放大器;#225; - > 'á'
& amp; amp - > '''
...


解决方案

我上面的评论中提及,这是非常不清楚的你试图在自己的情况下做。


我可以做的更好,而不仅仅是遍历数据库,并从每个代码到等效的符号?


嗯,是的,你可以替换字符代码实体(例如&# 123; ƫ ),而不必在映射中查找字符代码,但命名实体例如& quot; )总是需要查找。



这是我试图解决一般情况:


  1. 创建一个表来存储HTML中定义的命名字符实体:

      CREATE TABLE ents(
    ref VARCHAR(8)NOT NULL COLLATE utf8_bin,
    rep CHAR(1)NOT NULL,
    PRIMARY KEY(ref)
    ) ;


  2. 填写此表 - 我建议使用脚本,例如PHP:

      $ dbh = new PDO(mysql:dbname = $ dbname,$ username,$ password); 
    $ dbh-> setAttribute(PDO :: ATTR_EMULATE_PREPARES,FALSE);
    $ ins = $ dbh-> prepare('INSERT INTO ents(ref,rep)VALUES(?,?)');
    $ t = get_html_translation_table(HTML_ENTITIES);
    foreach($ t as $ k => $ v)$ ins-> execute([substr($ v,1,-1),$ k]);


  3. 定义一个SQL函数来执行实体替换(在适用的情况下使用此表,或者由字符代码):

      DELIMITER ;; 

    CREATE FUNCTION dhe(s TEXT)RETURNS TEXT
    BEGIN
    DECLARE n,p,i,t INT DEFAULT 0;
    DECLARE r VARCHAR(12);
    entity_search:LOOP
    SET n:= LOCATE('&',s,n + 1);
    IF(!n)THEN
    LEAVE entity_search;
    END IF;

    IF(SUBSTRING(s,n + 1,1)='#')THEN
    CASE
    WHEN SUBSTRING(s,n + 2,1)RLIKE'[[ :digit:]]'THEN
    SET t:= 2,p:= n + 2,r:='[[:digit:]]';
    WHEN SUBSTRING(s,n + 2,1)='x'THEN
    SET t:= 3,p:= n + 3,r:='[[:xdigit:]]
    ELSE ITERATE entity_search;
    END CASE;
    ELSE
    SET t:= 1,p:= n + 1,r:='[[:alnum:] _]';
    END IF;

    SET i:= 0;
    参考:LOOP
    如果SUBSTRING(s,p + i,1)NOT RLIKE r THEN
    如果SUBSTRING(s,p + i,1)RLIKE'[[:alnum:] _ ]'THEN
    ITERATE entity_search;
    END IF;
    LEAVE参考;
    END IF;
    IF i = 8 THEN ITERATE entity_search;万一;
    SET i:= i + 1;
    END LOOP引用;

    SET s:= CONCAT(
    LEFT(s,n-1),
    CASE t
    WHEN 1 THEN COALESCE(
    (SELECT rep FROM ents WHERE ref = SUBSTRING(s,p,i))
    ,SUBSTRING(s,n,i + IF(SUBSTRING(s,p + i,1)=';',1,0))

    WHEN 2 THEN CHAR(SUBSTRING(s,p,i))
    WHEN 3 THEN CHAR(CONV(SUBSTRING(s,p,i),16,10))
    END,
    SUBSTRING(s,p + i + IF(SUBSTRING(s,p + i,1)=';',1,0))
    );
    END LOOP entity_search;
    返回
    END ;;

    DELIMITER;


  4. 应用此功能两次来解码(显然)编码表:

     更新my_table SET my_column = dhe(dhe(my_column)); 



I have a database where some of the elements consist of HTML special characters:

| Universidad Tecnológica Nacional - UTN                                                  |
| Instituto Tecnológico de Buenos Aires                                                   |
| Instituto Superior del Profesorado "Dr. Joaquín V. González" |
| Escuela Nacional de Náutica "Manuel Belgrano"                         |
| Conservatorio Nacional de Música "Carlos López Buchardo"     |
| Instituto Argentino de Computacion - IAC                                                         |
| Conservatorio de Superior de Música "Manuel de Falla"                 |

I need to convert it to a proper UTF format. Can I do better than just iterating through the database, and having a mapping from each code to the equivalent symbol?

á -> 'á'
" -> '"'
...

解决方案

As mentioned in my comment above, it's terribly unclear what you're trying to do in your own case.

Can I do better than just iterating through the database, and having a mapping from each code to the equivalent symbol?

Well, yes. You can replace character code entities (e.g. { and ƫ) with their replacement characters without having to lookup the character code in a "mapping". But named entities (e.g. ") will always need to be looked up.

Here's my attempt to solve the general case:

  1. Create a table to store named character entities defined in HTML:

    CREATE TABLE ents (
      ref VARCHAR(8) NOT NULL COLLATE utf8_bin,
      rep CHAR(1)    NOT NULL,
      PRIMARY KEY (ref)
    );
    

  2. Populate this table - I suggest using a script, for example from PHP:

    $dbh = new PDO("mysql:dbname=$dbname", $username, $password);
    $dbh->setAttribute(PDO::ATTR_EMULATE_PREPARES, FALSE);
    $ins = $dbh->prepare('INSERT INTO ents (ref, rep) VALUES (?, ?)');
    $t = get_html_translation_table(HTML_ENTITIES);
    foreach ($t as $k => $v) $ins->execute([substr($v, 1, -1), $k]);
    

  3. Define an SQL function to perform entity replacements (using this table where applicable, or else by character code):

    DELIMITER ;;
    
    CREATE FUNCTION dhe(s TEXT) RETURNS TEXT
    BEGIN
      DECLARE n, p, i, t INT DEFAULT 0;
      DECLARE r VARCHAR(12);
      entity_search: LOOP
        SET n := LOCATE('&', s, n+1);
        IF (!n) THEN
          LEAVE entity_search;
        END IF;
    
        IF (SUBSTRING(s, n+1, 1) = '#') THEN
          CASE
            WHEN SUBSTRING(s, n+2, 1) RLIKE '[[:digit:]]' THEN
              SET t := 2, p := n+2, r := '[[:digit:]]';
            WHEN SUBSTRING(s, n+2, 1) = 'x' THEN
              SET t := 3, p := n+3, r := '[[:xdigit:]]';
            ELSE ITERATE entity_search;
          END CASE;
        ELSE
          SET t := 1, p := n+1, r := '[[:alnum:]_]';
        END IF;
    
        SET i := 0;
        reference: LOOP
          IF SUBSTRING(s, p+i, 1) NOT RLIKE r THEN
            IF SUBSTRING(s, p+i, 1) RLIKE '[[:alnum:]_]' THEN
              ITERATE entity_search;
            END IF;
            LEAVE reference;
          END IF;
          IF i = 8 THEN ITERATE entity_search; END IF;
          SET i := i + 1;
        END LOOP reference;
    
        SET s := CONCAT(
          LEFT(s, n-1),
          CASE t
            WHEN 1 THEN COALESCE(
              (SELECT rep FROM ents WHERE ref = SUBSTRING(s, p, i))
            , SUBSTRING(s, n, i + IF(SUBSTRING(s, p+i, 1)=';',1,0))
            )
            WHEN 2 THEN CHAR(SUBSTRING(s, p, i))
            WHEN 3 THEN CHAR(CONV(SUBSTRING(s, p, i), 16, 10))
          END,
          SUBSTRING(s, p + i + IF(SUBSTRING(s, p+i, 1)=';',1,0))
        );
      END LOOP entity_search;
      RETURN s;
    END;;
    
    DELIMITER ;
    

  4. Apply this function twice to decode your (apparently) doubly-encoded table:

    UPDATE my_table SET my_column = dhe(dhe(my_column));
    

这篇关于mysql替换html特殊字符与UTF等效的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆